Introducing SemantiNet’s API

January 16th, 2011, By eitanb

We’re excited to release an alpha version of our new API. This is the first post in a series of blog posts, describing ways you can rapidly build impressive data intensive applications.
SemantiNet’s new API is designed for easy querying of a selected collection of useful Web Services, Wikipedia, Linked-Data, and the unstructured web. The API also provides a flexible templating language, for easy creation of semantic mashups and data intensive applications, directly from your browser.

To see what we mean – let’s start with a very simple querying of DBPedia (click the link to see this query in the playground):
The API returns a node that represents Bar Refaeli in DBPedia, with some of the data that is connected to this node. Quite simple so far. So, let’s devise a template that takes this node, and present information from this node, as an HTML:
<html>
  <body>
    <!-- label provides a nice representation of the node's name -->
    <h1><%= label%/></h1>
    <!-- personage calculates the age, based on information we have from the birth-date and the current time -->
    Age: <%= personage/round(1)%/><br/>
    <!-- We want a nice representation of the birthPlace, so we take dbpedia-owl:birthPlace/label -->
    Born in: <%= dbpedia-owl:birthPlace/label%/><br/>
    <!-- Take the first image returned from YahooBoss's images search -->
    <img src="<%= yahooboss:images/first%/>" width="100px">
  </body>
</html>
Click here to see it live in the playground


What’s returned from a couple of examples:

Nice. We’ve seen how to query both LinkedData (from DBPedia in this case) and the web (through Yahoo Boss) – to get the picture.
Just to get a feeling of what’s possible, let’s play with these models’ place of birth.
This query, will return the 3 birth places:
/multy('dbpedia:Carolyn_Murphy','dbpedia:Marisa_Miller','dbpedia:Bar_Refaeli')/dbpedia-owl:birthPlace
Using this template – we will take the places, and put them on a map:
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <script src="http://maps.google.com/maps?file=api&amp;v=2&amp;key=ABQIAAAAkMzYJpqzT4X0Hj0W-xMFIhTkBdPb1_Y7shJWGA4g7zFU4DbUwRSRxPPUjb7uuS8U3pAZlGMUGn5Vww" type="text/javascript"></script>
  </head>
  <body>
    <div id="mapdivid" style="float: left; height: 100%; width: 100%"></div>
    <script type="text/javascript">
    if (GBrowserIsCompatible()) {
     var map = new GMap2(document.getElementById("mapdivid"));
     map.setCenter(new GLatLng(0, 0), 1);
     <%foreach .[location]%>
       map.addOverlay(new GMarker(new GLatLng(<%= /location/geo:lat%/>, <%= /location/geo:long%/>),{title:"<%= /label%/>"}));
     </%foreach%>
    }
    </script>
  </body>
</html>
Check it out “live” here. In this example, we iterate over the birthplaces, using a ‘foreach’ directive, and for each place – we take the location’s latitude, longtitude and label.

The following query, will return a list of female models, include only those that we have data about their height, and order them by their height:
category:Female_models/deepinstances(3)[dbpedia-owl:height]/orderdesc(dbpedia-owl:height)
To show a table of the tallest 10 models, we’ll use this template:
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <table style="float: left; clear: both">
      <%foreach ./take(10)%>
        <tr>
          <td><img width='100px' src='<%= /yahooboss:images/first/%>'
          onerror='this.onerror = null; this.src="http://static1.headup.com/images/placeholder.jpg"' />
          </td>
          <td><%= /label%/><br/>
          <%= /dbpedia-owl:height%/>
          </td>
        </tr>
      </%foreach%>
    </table>
  </body>
</html>
Check it out in the playground here, or – for a more generic version of this table, check this out:

Scraping pages from accross the web – is quite sweet as well. Let’s do a call to this page:
http://sportsillustrated.cnn.com/2010_swimsuit/models/
And now, let’s scrape the images from this page, using SemantiNet’s API:
/fetch('http://sportsillustrated.cnn.com/2010_swimsuit/models/')/htmlxpath('//div[@class="cnnIndexList"]//img/@src')/*
This gets us a list of photos. And if we put them in this template:
<%foreach .%>
  <img src='<%= .%/>'/>
</%foreach%>
What this query results is a scraping of the site’s images:


In the following blogposts we’ll write about more characteristics of the SemantiNet’s API and language:

  • Extensible – anyone can easily extend this language (we internally call it CSlang), building on top of existing predicates
  • A rich NLP library for Entity extraction, contextual disambiguation, access to WordNet
  • Powerful first-class-citizens of the language – graph querying primitives, lambda calculus
  • Very short development feedback loop
  • Piping
  • Fuzzy ontology support – freeform queries allow quick trials and discovery
  • Inference rules one-liners
  • Enrich the ontholgy based on web-links
  • Data mining primitives – map-reduce, grouping, histograms, order-by, max on lists

Want to get started? Take a spin in the playground, and check out our wiki: http://wiki.headup.com/index.php?title=Knowledge_Graph_API

Disambiguation

May 30th, 2010, By talk

Which is Which?

As you probably know by now, Headup specializes in understanding the meaning of text in web pages, extracting the important objects, and bringing complementary content about them from around the web.  One of the most important requirements in carrying out this process is correctly identifying the meaning of words on the page, especially those that have more than one meaning.  If you fail to do this, you can fetch a lot of high-quality, real-time and personalized information – about the wrong topic…

In technical jargon, identifying the correct meaning of words is called “Word Sense Disambiguation”.  Or in the words of Led Zeppelin (from “Stairway to Heaven”):

There’s a sign on the wall

But she wants to be sure

‘Cause you know sometimes words

Have two meanings

Some words have different meanings depending on their context.  For example, the word “Apple” can mean the fruit, the technology company, the record label or the band.  “John Mack” can refer to the Chairman of Morgan Stanley, the musician, or the psychiatrist who specialized in alien abduction experiences. The term “Enterprise” can mean a   company, a city, a ship, a starship, a space shuttle, and much more.

On the other hand, the same term can appear in the text in many different formats.  For example, if you write a blog post that mentions Barack Obama, you might refer to him as Barack Obama, President Obama, the President, the U.S. President, President of the U.S.A., Barack, Obama, Mr. Obama, etc.  All of these terms refer to the same person, so an automated system seeking to understand the text should resolve all of them to the same entity.

There are various approaches to word sense disambiguation.  Some rely on the statistics of surrounding words; others require a training stage utilizing large pieces of text in which the meaning of words has been marked manually.  Headup’s approach to disambiguation is based on its knowledge graph, the ever-expanding collection of topics, attributes, and semantic relationships between them.  Combining information derived from the knowledge graph with analysis of syntax (they way word are combined into sentences), enables Headup to reach a very high rate of precision (the percentage of terms that are correctly identified) in its disambiguation process.

The examples below show how Headup can correctly identify terms that appear in plain text, even when the term has more than one meaning, or appears only partially in the text.  All of the examples are based on actual posts in blogs that are using Headup.

First, let’s take a look at an example from the film blog “HeyUGuys”.  Here you can see how Headup correctly identifies the word “Abrams” as referring to the writer and producer J. J. Abrams.

Disambiguation Image

And here’s another example: It is typical in gossip media to refer to celebrities using their first name only, to induce a sense of familiarity.  In the post below, taken from the blog “HitDanBack”, singer Mariah Carey is referred to only as “Mariah”.  This doesn’t stop Headup from correctly identifying her, based on understanding the topic of the blog post and the context in which the name appears.

Disambiguation Image

And in the final example below, you can see how Headup interprets the term “European Championship” in an article from the blog Jewlicious.  This term can mean a championship in any sport, but based on the context of the article and related terms that are identified in the text, Headup correctly interprets “European Championship” as referring to the European Figure Skating Championship.

Disambiguation Image

If you want to see more examples of Headup in action, visit www.headup.com and explore the various blogs that are already using Headup.  You can also test drive the engine for yourself in our Entity Extraction Playground.  Enjoy!

Why we understand you

January 18th, 2010, By talk

Apples, Yoda and NLP – introducing how we understand Language

One of our missions as a company has always been providing our users with the most useful, accurate and interesting content the web has to offer.
Early on in our existence we realized that in order to do this we’d need the ability to understand the web in a way that closely mimics the way humans do. In techno-babble this is often referred to as NLP, which stands for “Natural Language Processing“, and even more specifically as NLU, which stands for “Natural Language Understanding”
As trivial as this may, it’s far from being a simple task.

Here’s why:

Language is deceptive, especially if you’re an Apple

Ambiguous Apple

Language, as we encounter it on the web and use it in our daily exchanges, is often ambiguous, and therefore deceptive, tricky and difficult to understand. So much so that often enough understanding it is challenging even for a human.
Consider for example the following sentence:

“Apple, answered Steve, is my favorite”

The possibilities for interpreting it vary widely depending on which “Apple” and which “Steve” we’re talking about.

If you happen to follow technology news, you’re likely to infer that the “Apple” in question is in fact the company, and therefore “Steve”, is in fact Steve Jobs, its founder.

If however you lack this knowledge your interpretation of the sentence must be limited to the understanding that someone named Steve is making commentary about his preferences in fruits.

Understanding language – How do we humans do it?

Humans, even very young ones, understand ambiguous usage of language almost effortlessly thanks to an innate ability to infer an ambiguous term’s context by using a wide range of cues and clues:

  • We refer to the identity of the person speaking – And therefore don’t necessarily treat every person who calls us “Bro” like family.
  • We take cues from the intonation used - It helps us differentiate between “what?” and “WHAT?!”
  • We depend on sentences’ grammatical structure - Now you know why Master Yoda is so hard to understand (and annoying – or is that just me?)
  • We refer to our knowledge of the world – As in the example above – knowing a certain “Steve” is closely related to a company called “Apple” allows us to infer that this is a possible meaning for the sentence.
  • We refer to the context in which an ambiguous term is used - In the example above it’s precisely the lack of context that leaves the ambiguity unresolved.

“Ok”, you might say, “People can understand the languages they use. So what?”.

The answer is that understanding how people understand language provides the clues necessary to teach computers to do the same.

Understanding how people do stuff is the key to teaching machines to do it too

Once we’d analyzed in depth the various cues humans utilize to understand language we examined which of them could be accomplished in real-time by using an affordably scalable system. The cues we chose to use are the bottom three from the list above: grammatical structure, knowledge of the world and context.

How do you teach a computer about the world?

In order to grant our platform the ability to understand language we set out to teach it about the world. We accomplished this by equipping it with a graph mapping out the billions of connections that exist between over 30 million nouns, topics, terms and things.

One might well ask: “What do you define as a connection?”

The answer is we defined any possible relationship between two things as a “connection”. For example:

The person “Steve” is connected to the company “Apple” because he “founded” it.

As I showed above, merely knowing that a possible connection may exist between two things goes a long way to assisting one in understanding the context in which they’re mentioned.
The more one’s aware of possible connections between the things mentioned in any given text the greater one’s confidence in selecting the correct context for each.

Returning to the example above, had the sentence been:

“Apple’s iPhone, answered Steve, is my favorite accomplishment”

We’d have little doubt as to which “Steve” and which “Apple” the sentence refers to. The mention of the iPhone grants the context we were missing before.

One of our platform’s greatest assets is that it approaches text holistically.

Rather than examine every word or sentence separately, we look at much larger segments of text in order to search for the clues that will enable us to find the correct context for all the ambiguous terms encountered.

Grammar is a powerful tool for understanding meaning

“Grammar, Yoda teaches, a powerful tool for obscuring meaning is.”

Conversely, when used correctly (i.e. not in Star Wars), it’s a very helpful tool in helping us understand context and meaning.

By incorporating a model of the English language’s grammar into our platform we were able to improve its understanding of context even further.

Content matching

We use our ability to accurately understand what a “thing”, not only to choose its appropriate meaning in the context of the text in which it’s mentioned but also, to help us retrieve matching and useful content accurately. To date we’ve already mapped hundreds of sources for:

  • News
  • Articles
  • Images
  • Videos
  • Realtime data
  • Facts
  • Events
  • Geo-data
  • Product specs

We use all these sources to retrieve the content we match for every entity. Our platform enables to extend this content repository easily, and we continue to add more data to it on a regular basis. We’re even capable of utilizing content provided by our publishers as a source (see the screenshot below for an example of how this is implemented on film review blog HeyUGuys).

What’s next?

It’s our intention to develop our content syndication abilities further in future so as to assist our bloggers and publishers in achieving distribution and getting more traffic to their sites.

As more sites and blogs install our widget we believe that the content they hold will gradually become one of the most interesting content assets we have access to, but that’s already material for another post…

The editors of the HeyUGuys blog use Headup as a "related post" widget to drive traffic internally to other posts on their blog

The editors of the HeyUGuys blog use Headup as a "related post" widget to drive traffic internally to other posts on their blog

Image Credits: Spencer E Holtaway, johannes pape

Semantic Web Shopping – a "how to" for the immediate future – Part 2

April 26th, 2009, By talk

Continued from Part 1

How to prepare for Shopping 3.0

A key element of preparing for the Semantic Web is to remember that the best Semantic Web technologies are only as good as the data they can access. If you want to enjoy the best that Semantic Web technologies have to offer be prepared to make A LOT of  information about yourself available online. A great place to start is your Facebook profile. If you want to get the most from the future of Semantic Web shopping I suggest you begin by flesing out your profile as much as you possibly can. The reason I suggest you begin with Facebook in particular is because you’ve probably already got a profile there already and, whether you like or not, Facebook is already making your information available to other services via it’s API. The Facebook API grants access to the following details about any and every Facebook member (this is a very partial list):

  • Location
  • Gender
  • Sexual preference
  • Marital status
  • Employment history
  • Likes – books, films, music, etc.
  • Fan pages the member belongs to

Despite current criticism over Facebook’s failings in regards to monetizing, the information they have is without doubt a veritable treasure trove of personal information just waiting to be commercialized. Whatever the future has in store for us in terms of Semantic Web there can be little doubt that the Facebook API will have an important part to play in it.

Hunting for bargains - a thing of the past? (image by avlxyz)

Hunting for bargains - a thing of the past? (image by avlxyz)

What about privacy?

I’m fully aware that those of you who are touchy about privacy are probably scandalized by my last suggestion. Right about now you’re probably thinking: “What? make stuff about me publicly available online? What are you nuts?!?”.Luckily while writing this post I ran into an excellent article titled “How much is your privacy worth?“. The article, written by Eric Harber, does an excellent job of presenting the Semantic Web consumers’ paradigm, and moreover illuminates that there’s little that’s new about it. The articles main premise is that we’ve been trading our privacy in for perks and benefits for years and therefore there can be little doubt that we’ll continue to do so in the future. Mr. Harber argues that every loyalty club we’ve ever subscribed to, every coupon we’ve ever cashed and every marketing survey we’ve ever particiapted in all stand as examples of cases where we’ve voluntarily surrendered some of our privacy for a perk offered by a marketer.

Our wish to safeguard our privacy is understandable but the simple truth is that in this data driven day and age privacy is increasingly an illusion. More and more of our daily activities are monitored, individually or in aggregate, whether we’re aware of it or not. The data collected is already being put to use in advertising whether obviously or less so. This trend will increase as the quality of data and the ability to analyze it continue to improve.

Semantic Web shopping will be Opt-in

To me there’s something very comforting about the knowledge that this process of cashing in my privacy for perks isn’t new. It means that the practices and policies that need to be developed in order to enable and regulate marketing on the Semantic Web have a solid base for reference, one that not only takes consumers’ privacy into account, but also gives it a paramount importance. There can be no doubt that the Semantic Web will usher in a new age that will change not only our understanding of what consists of “private information” but also what may be done with it. As was the case with this same dilemma in the past, ultimately legal frameworks will be created that will ensure that a consumers right to privacy is protected and that receiving marketing offers remain an opt-in experience (Anti spam legislation being a prime example).

If you’re skeptic that the legal aspect alone won’t be enough to enforce the sanctity of consumer privacy I submit to you the following argument – companies that abuse privacy will suffer such a backlash from consumers and create such splitting PR headaches for themselves that the practice will quickly become unprofitable. At worst we’ll have to deal with the Semantic Web version of Viagra spam…

Epilogue

Progress is inevitable therefore it becomes the collective responsibility of both marketers and consumers to define to what extent the trade-off between privacy and purchasing perks creates value for all the stakeholders involved. The laws of economics will eventually guarantee that imbalanced models will slowly die out leaving us with those that we not only can, but also want to,  live with.  After experiencing first hand how inefficient online shopping really is I personally would be happy to divulge information about myself if it would save me all the time I spent searching for that perfect stroller… ; )

Older Posts »