Disambiguation

May 30th, 2010, By talk

Which is Which?

As you probably know by now, Headup specializes in understanding the meaning of text in web pages, extracting the important objects, and bringing complementary content about them from around the web.  One of the most important requirements in carrying out this process is correctly identifying the meaning of words on the page, especially those that have more than one meaning.  If you fail to do this, you can fetch a lot of high-quality, real-time and personalized information – about the wrong topic…

In technical jargon, identifying the correct meaning of words is called “Word Sense Disambiguation”.  Or in the words of Led Zeppelin (from “Stairway to Heaven”):

There’s a sign on the wall

But she wants to be sure

‘Cause you know sometimes words

Have two meanings

Some words have different meanings depending on their context.  For example, the word “Apple” can mean the fruit, the technology company, the record label or the band.  “John Mack” can refer to the Chairman of Morgan Stanley, the musician, or the psychiatrist who specialized in alien abduction experiences. The term “Enterprise” can mean a   company, a city, a ship, a starship, a space shuttle, and much more.

On the other hand, the same term can appear in the text in many different formats.  For example, if you write a blog post that mentions Barack Obama, you might refer to him as Barack Obama, President Obama, the President, the U.S. President, President of the U.S.A., Barack, Obama, Mr. Obama, etc.  All of these terms refer to the same person, so an automated system seeking to understand the text should resolve all of them to the same entity.

There are various approaches to word sense disambiguation.  Some rely on the statistics of surrounding words; others require a training stage utilizing large pieces of text in which the meaning of words has been marked manually.  Headup’s approach to disambiguation is based on its knowledge graph, the ever-expanding collection of topics, attributes, and semantic relationships between them.  Combining information derived from the knowledge graph with analysis of syntax (they way word are combined into sentences), enables Headup to reach a very high rate of precision (the percentage of terms that are correctly identified) in its disambiguation process.

The examples below show how Headup can correctly identify terms that appear in plain text, even when the term has more than one meaning, or appears only partially in the text.  All of the examples are based on actual posts in blogs that are using Headup.

First, let’s take a look at an example from the film blog “HeyUGuys”.  Here you can see how Headup correctly identifies the word “Abrams” as referring to the writer and producer J. J. Abrams.

Disambiguation Image

And here’s another example: It is typical in gossip media to refer to celebrities using their first name only, to induce a sense of familiarity.  In the post below, taken from the blog “HitDanBack”, singer Mariah Carey is referred to only as “Mariah”.  This doesn’t stop Headup from correctly identifying her, based on understanding the topic of the blog post and the context in which the name appears.

Disambiguation Image

And in the final example below, you can see how Headup interprets the term “European Championship” in an article from the blog Jewlicious.  This term can mean a championship in any sport, but based on the context of the article and related terms that are identified in the text, Headup correctly interprets “European Championship” as referring to the European Figure Skating Championship.

Disambiguation Image

If you want to see more examples of Headup in action, visit www.headup.com and explore the various blogs that are already using Headup.  You can also test drive the engine for yourself in our Entity Extraction Playground.  Enjoy!

Bookmark and Share

Comments feed TrackBack URL

1 Comment »

  1. How accurate is the disambiguation? Any empirical studies on it?

     
    Comment by WSD_FREAK — November 29, 2010 @ 8:03 pm

Leave a comment