Keith Richards’ guitar gallery in 4 lines of code

January 24th, 2011, By eitanb

The first line of “code” would be this query, that returns a list of Keith Richards’s guitars (click to check it out):
http://api.headup.com/v1?raw=true&q=Keith Richards/popularmeaning/`instrument`/render(“videolist.html”)
Let’s break down the query to its parts:

  • Keith Richards/popularmeaning – gives us the URI (a unique ID) for Keith Richards, dbpedia:Keith_Richards.
  • `instrument` – that’s a fuzzy matching of the free-form text “instrument” with a predicate of dbpedia:Keith_Richards. We get a list of the instruments that Keith played.
  • Then we render this list of instruments as using a template (that we’ve prepared in advance) called videolist.html.
This is actually a more “fuzzy” way of querying the graph. The strict way of querying it would have been:
http://api.headup.com/v1?raw=true&q=dbpedia:Keith_Richards/dbpedia-owl:instrument/render(“videolist.html”)

The rest of the code resides inside the videolist.html template. Let’s have a look inside:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>SemantiNet's Video Portal</title>
  <link rel="stylesheet" href="http://www.blueprintcss.org/blueprint/screen.css">
  <style type="text/css" media="screen">
  body {background-color: #FAFAFF}
  </style>
</head>
<body>
 <div class="container">
 <div class="span-24">
  <table style="font-family: Consolas; font-size: x-small; background-color: white" class="span-24">
   <%foreach ./take(20)%>
  <tr width="100%">
   <td>
    <%select keytermsforquery/youtube:getplayers/first%>
   <iframe title="YouTube video player" class="youtube-player" type="text/html" width="300" height="230" src="http://www.youtube.com/embed/<%= f:v1/split('/')/*/at(4)%/>" frameborder="0">
   </iframe>
    </%select%>
   </td>
   <td>
   <h2><%= label%/></h2>
    <%= abstract/str:unescapeunicode%/>
   </td>
  </tr>
   </%foreach%>
  </table>
 </div>
 </div>
</body>
</html>

Most of the template is made of simple HTML markup. The query lines are:

  • keytermsforquery/youtube:getplayers/first
    • keytermsforquery - query refinements, for making calls to search APIs (YouTube in this case) –  adds semantic “cues” when querying APIs to get more accurate results.
    • youtube:getplayers – gets YouTube players for the entities.
    • Take only the first video, we want 1 for each instrument.
  • label – brings a nice readable name.
  • abstract/str:unescapeunicode – brings the Wikipedia abstract – and removes Unicode escaping.
In the same way, we can display a nice YouTube gallery of any list we’d like:

Introducing SemantiNet’s API

January 16th, 2011, By eitanb

We’re excited to release an alpha version of our new API. This is the first post in a series of blog posts, describing ways you can rapidly build impressive data intensive applications.
SemantiNet’s new API is designed for easy querying of a selected collection of useful Web Services, Wikipedia, Linked-Data, and the unstructured web. The API also provides a flexible templating language, for easy creation of semantic mashups and data intensive applications, directly from your browser.

To see what we mean – let’s start with a very simple querying of DBPedia (click the link to see this query in the playground):
The API returns a node that represents Bar Refaeli in DBPedia, with some of the data that is connected to this node. Quite simple so far. So, let’s devise a template that takes this node, and present information from this node, as an HTML:
<html>
  <body>
    <!-- label provides a nice representation of the node's name -->
    <h1><%= label%/></h1>
    <!-- personage calculates the age, based on information we have from the birth-date and the current time -->
    Age: <%= personage/round(1)%/><br/>
    <!-- We want a nice representation of the birthPlace, so we take dbpedia-owl:birthPlace/label -->
    Born in: <%= dbpedia-owl:birthPlace/label%/><br/>
    <!-- Take the first image returned from YahooBoss's images search -->
    <img src="<%= yahooboss:images/first%/>" width="100px">
  </body>
</html>
Click here to see it live in the playground


What’s returned from a couple of examples:

Nice. We’ve seen how to query both LinkedData (from DBPedia in this case) and the web (through Yahoo Boss) – to get the picture.
Just to get a feeling of what’s possible, let’s play with these models’ place of birth.
This query, will return the 3 birth places:
/multy('dbpedia:Carolyn_Murphy','dbpedia:Marisa_Miller','dbpedia:Bar_Refaeli')/dbpedia-owl:birthPlace
Using this template – we will take the places, and put them on a map:
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <script src="http://maps.google.com/maps?file=api&amp;v=2&amp;key=ABQIAAAAkMzYJpqzT4X0Hj0W-xMFIhTkBdPb1_Y7shJWGA4g7zFU4DbUwRSRxPPUjb7uuS8U3pAZlGMUGn5Vww" type="text/javascript"></script>
  </head>
  <body>
    <div id="mapdivid" style="float: left; height: 100%; width: 100%"></div>
    <script type="text/javascript">
    if (GBrowserIsCompatible()) {
     var map = new GMap2(document.getElementById("mapdivid"));
     map.setCenter(new GLatLng(0, 0), 1);
     <%foreach .[location]%>
       map.addOverlay(new GMarker(new GLatLng(<%= /location/geo:lat%/>, <%= /location/geo:long%/>),{title:"<%= /label%/>"}));
     </%foreach%>
    }
    </script>
  </body>
</html>
Check it out “live” here. In this example, we iterate over the birthplaces, using a ‘foreach’ directive, and for each place – we take the location’s latitude, longtitude and label.

The following query, will return a list of female models, include only those that we have data about their height, and order them by their height:
category:Female_models/deepinstances(3)[dbpedia-owl:height]/orderdesc(dbpedia-owl:height)
To show a table of the tallest 10 models, we’ll use this template:
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <table style="float: left; clear: both">
      <%foreach ./take(10)%>
        <tr>
          <td><img width='100px' src='<%= /yahooboss:images/first/%>'
          onerror='this.onerror = null; this.src="http://static1.headup.com/images/placeholder.jpg"' />
          </td>
          <td><%= /label%/><br/>
          <%= /dbpedia-owl:height%/>
          </td>
        </tr>
      </%foreach%>
    </table>
  </body>
</html>
Check it out in the playground here, or – for a more generic version of this table, check this out:

Scraping pages from accross the web – is quite sweet as well. Let’s do a call to this page:
http://sportsillustrated.cnn.com/2010_swimsuit/models/
And now, let’s scrape the images from this page, using SemantiNet’s API:
/fetch('http://sportsillustrated.cnn.com/2010_swimsuit/models/')/htmlxpath('//div[@class="cnnIndexList"]//img/@src')/*
This gets us a list of photos. And if we put them in this template:
<%foreach .%>
  <img src='<%= .%/>'/>
</%foreach%>
What this query results is a scraping of the site’s images:


In the following blogposts we’ll write about more characteristics of the SemantiNet’s API and language:

  • Extensible – anyone can easily extend this language (we internally call it CSlang), building on top of existing predicates
  • A rich NLP library for Entity extraction, contextual disambiguation, access to WordNet
  • Powerful first-class-citizens of the language – graph querying primitives, lambda calculus
  • Very short development feedback loop
  • Piping
  • Fuzzy ontology support – freeform queries allow quick trials and discovery
  • Inference rules one-liners
  • Enrich the ontholgy based on web-links
  • Data mining primitives – map-reduce, grouping, histograms, order-by, max on lists

Want to get started? Take a spin in the playground, and check out our wiki: http://wiki.headup.com/index.php?title=Knowledge_Graph_API

Disambiguation

May 30th, 2010, By talk

Which is Which?

As you probably know by now, Headup specializes in understanding the meaning of text in web pages, extracting the important objects, and bringing complementary content about them from around the web.  One of the most important requirements in carrying out this process is correctly identifying the meaning of words on the page, especially those that have more than one meaning.  If you fail to do this, you can fetch a lot of high-quality, real-time and personalized information – about the wrong topic…

In technical jargon, identifying the correct meaning of words is called “Word Sense Disambiguation”.  Or in the words of Led Zeppelin (from “Stairway to Heaven”):

There’s a sign on the wall

But she wants to be sure

‘Cause you know sometimes words

Have two meanings

Some words have different meanings depending on their context.  For example, the word “Apple” can mean the fruit, the technology company, the record label or the band.  “John Mack” can refer to the Chairman of Morgan Stanley, the musician, or the psychiatrist who specialized in alien abduction experiences. The term “Enterprise” can mean a   company, a city, a ship, a starship, a space shuttle, and much more.

On the other hand, the same term can appear in the text in many different formats.  For example, if you write a blog post that mentions Barack Obama, you might refer to him as Barack Obama, President Obama, the President, the U.S. President, President of the U.S.A., Barack, Obama, Mr. Obama, etc.  All of these terms refer to the same person, so an automated system seeking to understand the text should resolve all of them to the same entity.

There are various approaches to word sense disambiguation.  Some rely on the statistics of surrounding words; others require a training stage utilizing large pieces of text in which the meaning of words has been marked manually.  Headup’s approach to disambiguation is based on its knowledge graph, the ever-expanding collection of topics, attributes, and semantic relationships between them.  Combining information derived from the knowledge graph with analysis of syntax (they way word are combined into sentences), enables Headup to reach a very high rate of precision (the percentage of terms that are correctly identified) in its disambiguation process.

The examples below show how Headup can correctly identify terms that appear in plain text, even when the term has more than one meaning, or appears only partially in the text.  All of the examples are based on actual posts in blogs that are using Headup.

First, let’s take a look at an example from the film blog “HeyUGuys”.  Here you can see how Headup correctly identifies the word “Abrams” as referring to the writer and producer J. J. Abrams.

Disambiguation Image

And here’s another example: It is typical in gossip media to refer to celebrities using their first name only, to induce a sense of familiarity.  In the post below, taken from the blog “HitDanBack”, singer Mariah Carey is referred to only as “Mariah”.  This doesn’t stop Headup from correctly identifying her, based on understanding the topic of the blog post and the context in which the name appears.

Disambiguation Image

And in the final example below, you can see how Headup interprets the term “European Championship” in an article from the blog Jewlicious.  This term can mean a championship in any sport, but based on the context of the article and related terms that are identified in the text, Headup correctly interprets “European Championship” as referring to the European Figure Skating Championship.

Disambiguation Image

If you want to see more examples of Headup in action, visit www.headup.com and explore the various blogs that are already using Headup.  You can also test drive the engine for yourself in our Entity Extraction Playground.  Enjoy!

Smart Search Finding Things in Groups

May 16th, 2010, By talk

Searching for stuff is sometimes tough.  If you know what you’re looking for, and you phrased your search term just right, then you usually get good results.  But if not, you’re in big trouble, doomed to endless sifting through the results, page by page until you find the thing that you were really looking for.

Search engines are good at finding terms, expressions, and pieces of text.  But that’s where their world ends: They don’t understand the meaning of the text they are searching for, and they know nothing about objects, entities or relationships.  In addition, they are not designed to find stuff in groups, but search for a single object each time.

For example, let’s say you are interested in seeing video clips of songs from the Dire Straits album “Brothers in Arms”.  If you search for “Dire Straits Brothers in Arms Album” on YouTube, you will get many links to video clips of the song “Brothers in Arms”, and some links to other songs in that album (if the album name appears in the clip description).  If you are lucky, you’ll get a link to a playlist called “Dire Straits Brother in Arms Album” prepared by some user who manually searched for these tracks by name.

YouTube search results for "Dire Straits Brothers in Arms Album"

But now look what happens if you execute the same query through Headup: Headup automatically digs into its database to find the tracks in the album, and searches for specific video clips of these tracks.  Then, it returns a nice “video wall” where each thumbnail links to a different track in the “Brothers in Arms” album.  The key here is that Headup “knows” what an album is, associates it with its tracks, and is smart enough to understand that YouTube hosts mainly videos of tracks, not full albums.  This type of reasoning and “smart search” implementation is way beyond the power of other “topic search” engines that do nothing more than search forwarding.

Headup video results for "Dire Straits Brothers in Arms Album"

Let’s take another example.  What if you are searching for a certain type of product by a certain brand – such as Samsung LED-backlit LCD TVs, or Sony Flash-based HD camcorders.  If you try these search terms in a regular search engines, you will get scattered results of news announcements, product reviews, and maybe a link to a specific product page.  But you’ll never get a list of actual TVs or camcorders that match these criteria, since the search engines can only search for the text you supplied, but don’t understand it.

When such a search is conducted through Headup, it queries its knowledge graph for items that match the requested criteria.  Since in Headup objects have meaning, properties and relations to other objects, it is quite easy to go through all the “Products” by the “Company” Sony, find the “Camcorder” type products, and filter only those items that have “Memory Type” equal Flash, and “Resolution” equal “HD”.  So executing such a query through Headup may result, for example, in a neat list of links to specific product pages, which may include media reviews, user reviews and price comparison with purchasing links.

Note that even though Headup currently does not support direct search, the “smart search” method is already implemented in the current pop-up widget and topic pages.  When you look at images, news or videos of a certain object or topic, Headup’s “smart search” works behind the scenes to bring you the most relevant content for that object, by understanding and utilizing its relationship to other objects.

Older Posts »