Disambiguation

May 30th, 2010, By talk

Which is Which?

As you probably know by now, Headup specializes in understanding the meaning of text in web pages, extracting the important objects, and bringing complementary content about them from around the web.  One of the most important requirements in carrying out this process is correctly identifying the meaning of words on the page, especially those that have more than one meaning.  If you fail to do this, you can fetch a lot of high-quality, real-time and personalized information – about the wrong topic…

In technical jargon, identifying the correct meaning of words is called “Word Sense Disambiguation”.  Or in the words of Led Zeppelin (from “Stairway to Heaven”):

There’s a sign on the wall

But she wants to be sure

‘Cause you know sometimes words

Have two meanings

Some words have different meanings depending on their context.  For example, the word “Apple” can mean the fruit, the technology company, the record label or the band.  “John Mack” can refer to the Chairman of Morgan Stanley, the musician, or the psychiatrist who specialized in alien abduction experiences. The term “Enterprise” can mean a   company, a city, a ship, a starship, a space shuttle, and much more.

On the other hand, the same term can appear in the text in many different formats.  For example, if you write a blog post that mentions Barack Obama, you might refer to him as Barack Obama, President Obama, the President, the U.S. President, President of the U.S.A., Barack, Obama, Mr. Obama, etc.  All of these terms refer to the same person, so an automated system seeking to understand the text should resolve all of them to the same entity.

There are various approaches to word sense disambiguation.  Some rely on the statistics of surrounding words; others require a training stage utilizing large pieces of text in which the meaning of words has been marked manually.  Headup’s approach to disambiguation is based on its knowledge graph, the ever-expanding collection of topics, attributes, and semantic relationships between them.  Combining information derived from the knowledge graph with analysis of syntax (they way word are combined into sentences), enables Headup to reach a very high rate of precision (the percentage of terms that are correctly identified) in its disambiguation process.

The examples below show how Headup can correctly identify terms that appear in plain text, even when the term has more than one meaning, or appears only partially in the text.  All of the examples are based on actual posts in blogs that are using Headup.

First, let’s take a look at an example from the film blog “HeyUGuys”.  Here you can see how Headup correctly identifies the word “Abrams” as referring to the writer and producer J. J. Abrams.

Disambiguation Image

And here’s another example: It is typical in gossip media to refer to celebrities using their first name only, to induce a sense of familiarity.  In the post below, taken from the blog “HitDanBack”, singer Mariah Carey is referred to only as “Mariah”.  This doesn’t stop Headup from correctly identifying her, based on understanding the topic of the blog post and the context in which the name appears.

Disambiguation Image

And in the final example below, you can see how Headup interprets the term “European Championship” in an article from the blog Jewlicious.  This term can mean a championship in any sport, but based on the context of the article and related terms that are identified in the text, Headup correctly interprets “European Championship” as referring to the European Figure Skating Championship.

Disambiguation Image

If you want to see more examples of Headup in action, visit www.headup.com and explore the various blogs that are already using Headup.  You can also test drive the engine for yourself in our Entity Extraction Playground.  Enjoy!

Smart Search Finding Things in Groups

May 16th, 2010, By talk

Searching for stuff is sometimes tough.  If you know what you’re looking for, and you phrased your search term just right, then you usually get good results.  But if not, you’re in big trouble, doomed to endless sifting through the results, page by page until you find the thing that you were really looking for.

Search engines are good at finding terms, expressions, and pieces of text.  But that’s where their world ends: They don’t understand the meaning of the text they are searching for, and they know nothing about objects, entities or relationships.  In addition, they are not designed to find stuff in groups, but search for a single object each time.

For example, let’s say you are interested in seeing video clips of songs from the Dire Straits album “Brothers in Arms”.  If you search for “Dire Straits Brothers in Arms Album” on YouTube, you will get many links to video clips of the song “Brothers in Arms”, and some links to other songs in that album (if the album name appears in the clip description).  If you are lucky, you’ll get a link to a playlist called “Dire Straits Brother in Arms Album” prepared by some user who manually searched for these tracks by name.

YouTube search results for "Dire Straits Brothers in Arms Album"

But now look what happens if you execute the same query through Headup: Headup automatically digs into its database to find the tracks in the album, and searches for specific video clips of these tracks.  Then, it returns a nice “video wall” where each thumbnail links to a different track in the “Brothers in Arms” album.  The key here is that Headup “knows” what an album is, associates it with its tracks, and is smart enough to understand that YouTube hosts mainly videos of tracks, not full albums.  This type of reasoning and “smart search” implementation is way beyond the power of other “topic search” engines that do nothing more than search forwarding.

Headup video results for "Dire Straits Brothers in Arms Album"

Let’s take another example.  What if you are searching for a certain type of product by a certain brand – such as Samsung LED-backlit LCD TVs, or Sony Flash-based HD camcorders.  If you try these search terms in a regular search engines, you will get scattered results of news announcements, product reviews, and maybe a link to a specific product page.  But you’ll never get a list of actual TVs or camcorders that match these criteria, since the search engines can only search for the text you supplied, but don’t understand it.

When such a search is conducted through Headup, it queries its knowledge graph for items that match the requested criteria.  Since in Headup objects have meaning, properties and relations to other objects, it is quite easy to go through all the “Products” by the “Company” Sony, find the “Camcorder” type products, and filter only those items that have “Memory Type” equal Flash, and “Resolution” equal “HD”.  So executing such a query through Headup may result, for example, in a neat list of links to specific product pages, which may include media reviews, user reviews and price comparison with purchasing links.

Note that even though Headup currently does not support direct search, the “smart search” method is already implemented in the current pop-up widget and topic pages.  When you look at images, news or videos of a certain object or topic, Headup’s “smart search” works behind the scenes to bring you the most relevant content for that object, by understanding and utilizing its relationship to other objects.

Semantic Web Marketing – Part 2

April 19th, 2009, By talk

Continued from Part 1

Why Now?

Understanding the basic difference between the web-that-is and the web-to-be supplies a few clues as to how this change is happening and why it’s happening now:

  1. The democratization of online publishing in the past few years has done a lot to contribute to the Totality of the web and has without doubt been the key to its unprecedented growth. The Web still has a long way to go before it encompasses everything, but it already contains enough data to allow generating limited Semantic Web experiences, especially in “UGC-rich” fields.
    UGC is one of the major catalysts to ascent of the Semantic Web (Image by James Cridland)

    UGC is a major contributor to the ascent of Semantic Web (Image by James Cridland)

  2. The ascent of APIs as the de-facto method for structuring inter-service communications is creating an ever increasing degree of Accessibility. Every day now greater swaths of the web are made accessible and “understandable” to automated services.
  3. Tagging, Natural Language Processing and other forms of hi-tech voodoo are all coming of age around now. Their evolution is having an increasingly positive impact on computers’ ability to “understand” the Web.

How will this affect me?

By now (April 2009) it’s already clear that the next evolution of the Web is right around the corner.
The first generation of companies pioneering this field, including Evri, Apture and of course ourselves, have already been active for 2-3 years. This in itself should be enough to convince you that it doesn’t matter whether you call it “Semantic Web”, “Web 3.0” or “Super Duper Web with Sprinkles” you should get your act together and start preparing for it NOW!

Rapid evolution creates opportunity

It’s worth remembering that the transformation we’re experiencing from Web to Semantic Web is a gradual one. Changes of this magnitude always are. Even so I strongly advise against complacency – “gradual” is a relevant term. I don’t remember how long it took for all of us to start using Google but I remember it wasn’t long, and I know that Yahoo and Microsoft are still trying to figure out where they lost us.

The Semantic Web revoultion will probably begin in UGC rich segments (Image by Franco Folini)

The revolution will probably begin in UGC rich segments (Image by Franco Folini)

Where will the Semantic Web revolution begin?

Although prophecy is dangerous business I think it’s safe to wager that those fields where more has been done to improve the availability of data and its accessibility to computers, are those that will enjoy the boons of the Semantic Web first.
User-generated-content heavy segments like social networking, music and photo sharing sites are some of the first places where it’s already possible to enjoy genuine Semantic Web experiences. In fact our own Headup has already been complimented by blogging heavyweights Robert Scoble and Jeff Pulver for its Twitter boosting capabilities.
Product sites like Amazon are another good place to experience Semantic Web. Their ability to offer products based on their relevance to users’ needs, intentions and social circles, is another Semantic Web early bird, albeit a rather primitive and limited one.

Looking out for your business (Image by Kevin Dooley)

Looking out for your business (Image by Kevin Dooley)

How can I best prepare my business?

The best easy-to-adopt-today tips I can suggest marketers who want to prepare for the Semantic Web are all based on the points I’ve mentioned earlier:

  1. Be aware of the coming change, keep your ears and eyes open for developments and deepen your understanding by reading blog posts like this one. I personally recommend checking out the excellent repository of Semantic Web articles that’s been published on the ReadWriteWeb blog.
  2. Tag the widgets you’re marketing comprehensively so that they are readily identifiable by computers. For example: If you’re selling football jerseys make sure to tag your inventory not only with the relevant team names but also with tags defining your merchandise as “clothing”, “shirt”, “jersey” and/or “fan merchandise”. As far as the technical details of “how-to-tag” are concerned I suggest using Microformats if at all possible but linking as described below is shaping up to be a viable option as well.
  3. Link widgets meaningfully to assist in there identification. This is especially true for ambiguous terms. For example by linking this instance of the word “Pink” to the last.fm page dedicated to the singer of the same name I’ve effectively removed all possible ambiguity as to which “Pink” I meant.
  4. Connect your site to relevant APIs wherever possible. If you deal in real-estate try integrating a map service like Google Maps. Music your thing? Integrate Last.fm or Deezer, etc. A great source for following available APIs and the innovative mashups created with them is ProgrammableWeb.com.

Epilogue

In many cases timing one’s adoption of a new technology can make all the difference. The Goddess of Economics tends to bestow her blessing upon those few nimble early adapters savvy enough to identify and take advantage of the changing marketplace in order to create a unique advantage for themselves and/or their businesses. Being prepared for the Semantic Web will require you and your business to embrace the coming change. The good news is that if you do it right then this time round it’ll be the machines doing the heavy lifting…

I hope you’ve found this useful. Your comments would be much appreciated…
: )

The Gentleman’s guide to Facebook, Friend requests & Netiquette

March 31st, 2009, By talk

The wildcard friend request conundrum

Facebook friend requests come in all shapes and sizes, anything from a flirtatious “Hey Gorgeous” to a blast from your kindergarten past. As online friendship becomes more socially acceptable, so do friend requests from people you’ve never met or even heard of. Wildcard Facebook friend requests represent a social conundrum and raise a prickly issue: Is there a polite way to ask someone:
“Excuse me but WHO THE !@#$ ARE YOU?”

Who the !@#$ are you Takeru Kobayashi?!?

I’m personally facing this very issue with the aforementioned Mr. Takeru Kobayashi, who has requested my Facebook friendship and whom, to the best of my knowledge, I’ve never met or heard of before. Fortunately for me Headup can assist me to avoid this potential netiquette disaster.

Headup – more omniscient* than Deep Thought, cooler than HAL

Headup’s unique ability to identify people and collect their profiles from a range of social services makes it an ideal tool for snooping out friendship candidates, flirtatious paramours, and self proclaimed potty pals, prior to approving them as your Facebook friends.

Stalking, snooping and spying – the Headup way

Headup will often be able to show you some photos of the flirt along with some of your common friends, tell you a bit about what your former kindergarten confederate is up to, and reveal that Takeru Kobayashi, AKA “The Tsunami”, is an illustrious member of the most prestigious of clubs: former winners of Nathan’s Famous International Hot Dog Eating Competition.

Armed with this smorgasbord of information I can now make an informed choice on whether to welcome Takeru into my circle of friends, or leave him out in the cold and watch the videos Headup provides of his hotdog eating antics instead.

Takeru Kobayashi AKA "The Tsunami"

Takeru "The Tsunami" Kobayashi doing his thing

This ability to check out potential friends beforehand makes Headup a powerful boost to your Facebook activities, enabling you to filter out identity thieves, serial ‘befrienders‘ and other social hazards.

How you can get this:

If you don’t have it yet, download the Headup plug-in for Firefox and follow the instructions, make sure you connect the addon to your Facebook account (at least).

If you already have Headup, make sure you’re logged into your Facebook account and click the little Headup icon on your browser’s status bar (bottom right hand corner of the window).

Headup settings - bottom right of your browser

Headup settings - bottom right of your browser

This will bring you to the personalization screen

Connect Headup to your Facebook account

Connect Headup to your Facebook account

Make sure that when you’re popped over to the Facebook authorization window you agree to give access to Headup.

Click the Finish button.

Finish connecting Headup to your services to save your settings

Finish connecting Headup to your services to save your settings

That’s it, you’re done!

Next time you visit Facebook peoples’ names will be underlined with Headup’s signature orange dashed line and hovering over them will prompt Headup to provide you with whatever details it is able to retrieve for them.

Let me know if you come up with something juicier than a hotdog…

Enjoyed this post?
You might like  “Yo Tweeps! Check Headup on Twitter…” too.
It explains how to use Headup to boost your Tweeting…

*  Thought about this after writing the post:
By definition it’s impossible to be “more omniscient”


Older Posts »