Fast & Easy Wordnet Java

July 21st, 2008 · 6 Comments

One of the techniques that Surf Canyon uses to determine if a search result is relevant to your query is to examine synonyms. Princeton University provides Wordnet, a structured lexicon that acts as a dictionary and thesaurus. There are a few open source Java libraries that provide a Java API to Wordnet. We tried using both JAWS and JWNL, but neither of them provided the response time that the Surf Canyon algorithm requires. JWNL supposedly provides an in-memory map version of the lexicon, but it seems that no one, including us, has been able to get it to work.

These open source libraries implement much more functionality than what we required. All we want to do is pass in a word and get back a Set of Sets of synonyms for that word. For example, if you pass in the word “fair,” the returned Sets should include the Set of synonyms that mean “evenhanded,” another Set of synonyms that mean “carnival,” and another set of synonyms that mean “attractive.”

Following is a 100-line Java class that implements this functionality very quickly. First, make sure your CLASSPATH includes the directory that contains the Wordnet database files. The class reads in 4 of the Wordnet database files. (The file names it uses are the file names from the UNIX distribution. The Windows distribution uses different file names. For example, the UNIX data.verb file is called verb.dat on Windows.)

package com.surfcanyon.common;

import java.io.*;
import java.util.*;

/**
 * This class gets synonym sets from the Wordnet dictionary files.
 */
public abstract class Synonyms {
    private static final Map<String, Set<Set<String>>> WORD_TO_SYNOYMYM_SETS = new HashMap<String, Set<Set<String>>>(15000);
    private static final Set<Set<String>> EMPTY_SET = Collections.unmodifiableSet(new HashSet<Set<String>>());

    static {
        synchronized (WORD_TO_SYNOYMYM_SETS) {
            load("data.adj");
            load("data.adv");
            load("data.verb");
            load("data.noun");
        }
    }

    private static void load(String path) {
        InputStream inputStream = null;
        BufferedReader reader = null;
        try {
            ClassLoader classLoader = Synonyms.class.getClassLoader();
            inputStream = classLoader.getResourceAsStream(path);
            reader = new BufferedReader(new InputStreamReader(inputStream));

            String line = null;
            while ((line = reader.readLine()) != null) {
                processLine(line);
            }
        } catch (IOException ioe) {
            throw new RuntimeException(ioe);
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException ioe) {
                }
            }

            if (inputStream != null) {
                try {
                    inputStream.close();
                } catch (IOException ioe) {
                }
            }
        }
    }

    private static void processLine(String line) {
        if ((line.length() > 17) && (line.charAt(0) == '0')) {
            // the data we want starts at the 17th character
            line = line.substring(17);

            Set<String> synonymSet = new HashSet<String>();
            StringTokenizer st = new StringTokenizer(line, " ");
            while (st.hasMoreElements()) {
                String word = st.nextToken();

                if (word.startsWith("00")) {
                    break;
                }

                if (word.length() > 2 && Character.isLetter(word.charAt(0))) {
                    synonymSet.add(word);
                }
            }

            if (synonymSet.size() > 1) {
                synonymSet = Collections.unmodifiableSet(synonymSet);
                for (String word : synonymSet) {
                    Set<Set<String>> synonymSetsForThisWord = WORD_TO_SYNOYMYM_SETS.get(word);
                    if (synonymSetsForThisWord == null) {
                        synonymSetsForThisWord = new HashSet<Set<String>>();
                        WORD_TO_SYNOYMYM_SETS.put(word, synonymSetsForThisWord);
                    }
                    synonymSetsForThisWord.add(synonymSet);
                }
            }
        }
    }

    public static Set<Set<String>> getSynonymSets(String word) {
        synchronized (WORD_TO_SYNOYMYM_SETS) {
            Set<Set<String>> synonymSets = WORD_TO_SYNOYMYM_SETS.get(word);
            return ((synonymSets != null) ? Collections.unmodifiableSet(synonymSets) : EMPTY_SET);
        }
    }
}

Tags: Code

  • Sumved Shami

    Hi,

    I tried using your class. I provided one word “poor” to getSynonymSets(String word) method. But, its not providing any similiar word for that particular word. But, If I use JWNL, it gives me a list of similiar words. Am i doing something wrong?

    Can you please help me in this regard?

    Regards,
    Sumved Shami

    Editor’s Comment: Thank you, Sumved, for the feedback! We have identified the problem and the code above has been modified. We changed the line “if (line.startsWith(“00″) && (line.length() > 17)) {” to “if ((line.length() > 17) && (line.charAt(0) == ’0′)) {“.

  • Louis

    Hi,

    I tried this class with word “panadol” with the command like this “System.out.println(synonym.getSynonymSets(“panadol”));” and it gives me nothing .What i am doing wrong?

    [Editor's reply] The word seems to be case-sensitive, so please try “Panadol” with an upper-case “P”.

  • Juuno

    Hi

    When I try this with the word “Pandol”, it gives me the result. But when I try with other words, it doesn’t give me anything. What’s wrong with it? Or am I doing something wrong?
    Thanks!!

    [Editor's response - Please contact us at http://www.surfcanyon.com/contact.jsp ]

  • Ken

    JWI has RamDictionary (http://projects.csail.mit.edu/jwi/) which loads the entire dictionary in memory. They demonstrated the speed improvement in their PDF user manual.

  • Sushil

    Hi,

    I was trying to use your code but not able to… can you help me.. All i want to do is to pass a keyword and get 5 synonym from wordnet in a console application.
    Sushil

  • Mark Cramer

    Hello Sushil – Thank you for contacting us and we’d be happy to help. It’ll be difficult, however, to provide assistance in the comments of this blog, so please send us a note using the email on http://www.surfcanyon.com/contact.jsp and we’ll be glad to reply.