Keith Richards’ guitar gallery in 4 lines of code

January 24th, 2011, By eitanb

The first line of “code” would be this query, that returns a list of Keith Richards’s guitars (click to check it out):
http://api.headup.com/v1?raw=true&q=Keith Richards/popularmeaning/`instrument`/render(“videolist.html”)
Let’s break down the query to its parts:

  • Keith Richards/popularmeaning – gives us the URI (a unique ID) for Keith Richards, dbpedia:Keith_Richards.
  • `instrument` – that’s a fuzzy matching of the free-form text “instrument” with a predicate of dbpedia:Keith_Richards. We get a list of the instruments that Keith played.
  • Then we render this list of instruments as using a template (that we’ve prepared in advance) called videolist.html.
This is actually a more “fuzzy” way of querying the graph. The strict way of querying it would have been:
http://api.headup.com/v1?raw=true&q=dbpedia:Keith_Richards/dbpedia-owl:instrument/render(“videolist.html”)

The rest of the code resides inside the videolist.html template. Let’s have a look inside:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title>SemantiNet's Video Portal</title>
  <link rel="stylesheet" href="http://www.blueprintcss.org/blueprint/screen.css">
  <style type="text/css" media="screen">
  body {background-color: #FAFAFF}
  </style>
</head>
<body>
 <div class="container">
 <div class="span-24">
  <table style="font-family: Consolas; font-size: x-small; background-color: white" class="span-24">
   <%foreach ./take(20)%>
  <tr width="100%">
   <td>
    <%select keytermsforquery/youtube:getplayers/first%>
   <iframe title="YouTube video player" class="youtube-player" type="text/html" width="300" height="230" src="http://www.youtube.com/embed/<%= f:v1/split('/')/*/at(4)%/>" frameborder="0">
   </iframe>
    </%select%>
   </td>
   <td>
   <h2><%= label%/></h2>
    <%= abstract/str:unescapeunicode%/>
   </td>
  </tr>
   </%foreach%>
  </table>
 </div>
 </div>
</body>
</html>

Most of the template is made of simple HTML markup. The query lines are:

  • keytermsforquery/youtube:getplayers/first
    • keytermsforquery - query refinements, for making calls to search APIs (YouTube in this case) –  adds semantic “cues” when querying APIs to get more accurate results.
    • youtube:getplayers – gets YouTube players for the entities.
    • Take only the first video, we want 1 for each instrument.
  • label – brings a nice readable name.
  • abstract/str:unescapeunicode – brings the Wikipedia abstract – and removes Unicode escaping.
In the same way, we can display a nice YouTube gallery of any list we’d like:

How to Fit the Whole Web in a Small Box

April 4th, 2010, By admin

I remember that day clearly.  As I entered SemantiNet’s offices, Sagie (our director of R&D) approached me holding a thin black box in his hand.

“We did it!” he said.

“Did what?” I asked, looking at the black box which resembled a hard disk drive on a diet.

“We managed to fit all of our Knowledge Graph on a 20GB Solid State Drive.  This small box holds all of the information in Wikipedia and dozens of other web sources”.

“Wow”, I said, “Unbelievable!  They say the world is getting smaller, but I never imagined that the web is getting smaller, too…”

To understand the significance of this achievement, you need to realize that Headup stores a lot of information.  And I mean a lot.  Headup knows about more than 100 million topics, spanning diverse fields from movies to microchips, and religion to rollerblades.  Each topic has several attributes, and the topics are connected to each other through semantic relationships.  For example, a band is connected to its albums, each album is connected to its tracks, a company is connected to its products, etc.

To get a better grasp on this, let’s look at a very small piece of the Headup knowledge graph described in the diagram below.  This piece of the graph is connected to the actress Angelina Jolie, one of the entities or objects that Headup knows about.  It includes the different pieces of information that Headup knows about her, gathered from various web sources.  General information about Angelina Jolie, such as her birth date, husband, and city of birth, is taken from Wikipedia.  The movies she appeared in, such as “Wanted”, “Bewulf” etc. are taken from IMDB.  Ratings for those movies are taken from RottenTomatoes,   Information about people who like each of these movies is taken from personal preferences that are exposed on social networks such as Facebook and MySpace.


Now imagine that Headup needs to store such detailed information about each one of millions of topics that appear in Wikipedia and in numerous other data sources, and constantly manage and update all the different relationships between them.  Currently we have over 300 million nodes (objects and attributes) in our graph, with over 2 billion connections.

To store this data and enable scalable, reliable and efficient access to it, we needed a highly-optimized data store, with a small footprint and super-fast performance.  Traditional off-the-shelf databases such as MySQL and Oracle are optimized for documents and transactions, but not for efficiently traversing relationships and properties of objects.   Recently, several dedicated Graph Databases (also called “Triplestores”) have become available, which are more optimized for semantic web applications.  However, as we kept adding more and more knowledge to Headup, we came to the conclusion that none of these off-the-shelf solutions were suitable for Headup.  So we had no choice but to develop our own data store, that is optimized for our needs.

It turned out that by creating a unique data store and optimizing it for the structure of our knowledge graph, we were able to achieve an order of magnitude boost in performance over existing solutions, both for building the graph and for accessing it.  Our data store can currently support up to 1 billion nodes (topics and attributes), and dozens of billions of edges (relationships between topics and attributes), so we still have plenty of room to grow.

Building such a huge graph in itself is a far from trivial task, since it requires processing amounts of data which cannot be stored in the computer’s main memory (RAM).  Hard drives are also not a good choice due to their limited access speeds and data transfer rates.  Therefore, we approached this challenge by utilizing the Hadoop software framework (inspired by Google’s MapReduce), which supports huge-scale, data-intensive distributed applications.  Using Hadoop enables us to build the graph in a completely distributed manner, so we can easily deal with this vast amount of data.

The raw data of our current graph spans about 500 Gigabytes.  Using numerous optimization techniques, we managed to compress this amount of data to just 15 GB, meaning that we can hold the whole graph in RAM. As we add more sources, the graph grows to a point where it is no longer cost effective to store it in RAM. For this reason, the graph can be easily stored on a commodity SSD, which costs around $100. Furthermore, the graph is designed for optimal utilization of the drive’s internal cache, block sizes and the fast random access.

The good news is that compressing the graph is done without compromising performance.  In fact, using a compressed graph actually increases its performance, since much less data has to be accessed and processed. Using our graph data store, we can find every piece of information in about 10 milliseconds using a hard disk drive, 0.1 milliseconds using a solid state drive, and just 0.1 microseconds when the information is  in RAM.

To demonstrate the performance of Headup’s graph database, let’s look at a typical Headup-powered web page which contains 50 terms and 700 candidates for disambiguating them.  Using our unique data sore, such a page can be processed in only 2 seconds using an HDD, and about 200 milliseconds using an SSD.  With such performance, we can easily support sites with millions of unique page views without overloading its computing resources.

“Can I have the graph for one night?” I asked Sagie.

“What do you need it for?  Do you have anything to add to this immense knowledge repository?” Sagie was totally surprised.

“Not really”, I said.  “I want to put it under my pillow when I sleep, and hopefully all the Angelina Jolie stuff you showed me will inspire my dreams…”

As the Jewish New Year Passed…

September 23rd, 2009, By admin

The Apple Drops…

You know it’s the holiday season in Israel when half of the country comes back from what seems to have been a perpetual vacation… Nothing like coming off of summer break and going straight into one full month of holiday festivities.

Rather than giant apples made of glass and lights being dropped in Time Square, real apples and honey are passed around and families have big holiday dinners.

New Kid on the Block

Companies customarily throw holiday parties around this time of year.  I was graciously invited to our company BBQ in the backyard of Tal’s (our CEO) home.

Honestly, I was apprehensive about the invite because I am the ”new girl”.   It was only my third day at the job, not to mention that I am fresh off the boat from New York City with only 2 months of living in Israel under my belt.

My idea of New Year’s, is, in fact the big giant apple, freezing cold weather, stupid hats and Champagne toasts. BBQs on a warm pleasant night is a foreign concept to me.

Burgers or Kebabs?

All of us Again
The BBQ was… a BBQ – like any other BBQ you’ve ever been too. A bunch of good friends schmoozin’-n-boozin’ in someone’s backyard.

Meat was grilled and devoured (although it was Kebabs and “Naknikiot” rather then burgers and hot dogs). Beer and wine were poured, music was playing, and stories were told.

Harel Keinan
Adding to the ambiance were 3 new babies. SemantiNet’s team seems to have been a bit busy developing last year ; ) …
As the camera passed from hand to hand so too were the babies, as almost all the girls wanted a chance to coo at them.

Without a doubt, the star of the evening was Che, the English Bulldog with his lovable smile.

Che

All-in-all it is what one expects of the 4th of July except it was September 20th.
I think, despite being “The New Girl” I held my own, at least judging by the amount of Kebab juice, baby drool and Che drool stains I collected during the evening.

Not a bad way to ring in a new year ;)