Fast & Easy Wordnet Java

July 21st, 2008 · 6 Comments

One of the techniques that Surf Canyon uses to determine if a search result is relevant to your query is to examine synonyms. Princeton University provides Wordnet, a structured lexicon that acts as a dictionary and thesaurus. There are a few open source Java libraries that provide a Java API to Wordnet. We tried using both JAWS and JWNL, but neither of them provided the response time that the Surf Canyon algorithm requires. JWNL supposedly provides an in-memory map version of the lexicon, but it seems that no one, including us, has been able to get it to work.

These open source libraries implement much more functionality than what we required. All we want to do is pass in a word and get back a Set of Sets of synonyms for that word. For example, if you pass in the word “fair,” the returned Sets should include the Set of synonyms that mean “evenhanded,” another Set of synonyms that mean “carnival,” and another set of synonyms that mean “attractive.”

Following is a 100-line Java class that implements this functionality very quickly. First, make sure your CLASSPATH includes the directory that contains the Wordnet database files. The class reads in 4 of the Wordnet database files. (The file names it uses are the file names from the UNIX distribution. The Windows distribution uses different file names. For example, the UNIX data.verb file is called verb.dat on Windows.)

package com.surfcanyon.common;

import java.io.*;
import java.util.*;

/**
 * This class gets synonym sets from the Wordnet dictionary files.
 */
public abstract class Synonyms {
    private static final Map<String, Set<Set<String>>> WORD_TO_SYNOYMYM_SETS = new HashMap<String, Set<Set<String>>>(15000);
    private static final Set<Set<String>> EMPTY_SET = Collections.unmodifiableSet(new HashSet<Set<String>>());

    static {
        synchronized (WORD_TO_SYNOYMYM_SETS) {
            load("data.adj");
            load("data.adv");
            load("data.verb");
            load("data.noun");
        }
    }

    private static void load(String path) {
        InputStream inputStream = null;
        BufferedReader reader = null;
        try {
            ClassLoader classLoader = Synonyms.class.getClassLoader();
            inputStream = classLoader.getResourceAsStream(path);
            reader = new BufferedReader(new InputStreamReader(inputStream));

            String line = null;
            while ((line = reader.readLine()) != null) {
                processLine(line);
            }
        } catch (IOException ioe) {
            throw new RuntimeException(ioe);
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException ioe) {
                }
            }

            if (inputStream != null) {
                try {
                    inputStream.close();
                } catch (IOException ioe) {
                }
            }
        }
    }

    private static void processLine(String line) {
        if ((line.length() > 17) && (line.charAt(0) == '0')) {
            // the data we want starts at the 17th character
            line = line.substring(17);

            Set<String> synonymSet = new HashSet<String>();
            StringTokenizer st = new StringTokenizer(line, " ");
            while (st.hasMoreElements()) {
                String word = st.nextToken();

                if (word.startsWith("00")) {
                    break;
                }

                if (word.length() > 2 && Character.isLetter(word.charAt(0))) {
                    synonymSet.add(word);
                }
            }

            if (synonymSet.size() > 1) {
                synonymSet = Collections.unmodifiableSet(synonymSet);
                for (String word : synonymSet) {
                    Set<Set<String>> synonymSetsForThisWord = WORD_TO_SYNOYMYM_SETS.get(word);
                    if (synonymSetsForThisWord == null) {
                        synonymSetsForThisWord = new HashSet<Set<String>>();
                        WORD_TO_SYNOYMYM_SETS.put(word, synonymSetsForThisWord);
                    }
                    synonymSetsForThisWord.add(synonymSet);
                }
            }
        }
    }

    public static Set<Set<String>> getSynonymSets(String word) {
        synchronized (WORD_TO_SYNOYMYM_SETS) {
            Set<Set<String>> synonymSets = WORD_TO_SYNOYMYM_SETS.get(word);
            return ((synonymSets != null) ? Collections.unmodifiableSet(synonymSets) : EMPTY_SET);
        }
    }
}

Tags: Code

6 responses so far ↓

  • 1 Sumved Shami // Aug 7, 2008 at 5:23 am

    Hi,

    I tried using your class. I provided one word “poor” to getSynonymSets(String word) method. But, its not providing any similiar word for that particular word. But, If I use JWNL, it gives me a list of similiar words. Am i doing something wrong?

    Can you please help me in this regard?

    Regards,
    Sumved Shami

    Editor’s Comment: Thank you, Sumved, for the feedback! We have identified the problem and the code above has been modified. We changed the line “if (line.startsWith(“00″) && (line.length() > 17)) {” to “if ((line.length() > 17) && (line.charAt(0) == ’0′)) {“.

  • 2 Louis // Dec 13, 2008 at 5:11 am

    Hi,

    I tried this class with word “panadol” with the command like this “System.out.println(synonym.getSynonymSets(“panadol”));” and it gives me nothing .What i am doing wrong?

    [Editor's reply] The word seems to be case-sensitive, so please try “Panadol” with an upper-case “P”.

  • 3 Juuno // Jun 21, 2009 at 6:45 am

    Hi

    When I try this with the word “Pandol”, it gives me the result. But when I try with other words, it doesn’t give me anything. What’s wrong with it? Or am I doing something wrong?
    Thanks!!

    [Editor's response - Please contact us at http://www.surfcanyon.com/contact.jsp ]

  • 4 Ken // May 29, 2011 at 2:48 am

    JWI has RamDictionary (http://projects.csail.mit.edu/jwi/) which loads the entire dictionary in memory. They demonstrated the speed improvement in their PDF user manual.

  • 5 Sushil // Mar 16, 2012 at 12:50 pm

    Hi,

    I was trying to use your code but not able to… can you help me.. All i want to do is to pass a keyword and get 5 synonym from wordnet in a console application.
    Sushil

  • 6 Mark Cramer // Mar 16, 2012 at 1:26 pm

    Hello Sushil – Thank you for contacting us and we’d be happy to help. It’ll be difficult, however, to provide assistance in the comments of this blog, so please send us a note using the email on http://www.surfcanyon.com/contact.jsp and we’ll be glad to reply.

Leave a Comment

Protected with IP Blacklist CloudIP Blacklist Cloud