Why we understand you

January 18th, 2010, By talk

Apples, Yoda and NLP – introducing how we understand Language

One of our missions as a company has always been providing our users with the most useful, accurate and interesting content the web has to offer.
Early on in our existence we realized that in order to do this we’d need the ability to understand the web in a way that closely mimics the way humans do. In techno-babble this is often referred to as NLP, which stands for “Natural Language Processing“, and even more specifically as NLU, which stands for “Natural Language Understanding”
As trivial as this may, it’s far from being a simple task.

Here’s why:

Language is deceptive, especially if you’re an Apple

Ambiguous Apple

Language, as we encounter it on the web and use it in our daily exchanges, is often ambiguous, and therefore deceptive, tricky and difficult to understand. So much so that often enough understanding it is challenging even for a human.
Consider for example the following sentence:

“Apple, answered Steve, is my favorite”

The possibilities for interpreting it vary widely depending on which “Apple” and which “Steve” we’re talking about.

If you happen to follow technology news, you’re likely to infer that the “Apple” in question is in fact the company, and therefore “Steve”, is in fact Steve Jobs, its founder.

If however you lack this knowledge your interpretation of the sentence must be limited to the understanding that someone named Steve is making commentary about his preferences in fruits.

Understanding language – How do we humans do it?

Humans, even very young ones, understand ambiguous usage of language almost effortlessly thanks to an innate ability to infer an ambiguous term’s context by using a wide range of cues and clues:

  • We refer to the identity of the person speaking – And therefore don’t necessarily treat every person who calls us “Bro” like family.
  • We take cues from the intonation used - It helps us differentiate between “what?” and “WHAT?!”
  • We depend on sentences’ grammatical structure - Now you know why Master Yoda is so hard to understand (and annoying – or is that just me?)
  • We refer to our knowledge of the world – As in the example above – knowing a certain “Steve” is closely related to a company called “Apple” allows us to infer that this is a possible meaning for the sentence.
  • We refer to the context in which an ambiguous term is used - In the example above it’s precisely the lack of context that leaves the ambiguity unresolved.

“Ok”, you might say, “People can understand the languages they use. So what?”.

The answer is that understanding how people understand language provides the clues necessary to teach computers to do the same.

Understanding how people do stuff is the key to teaching machines to do it too

Once we’d analyzed in depth the various cues humans utilize to understand language we examined which of them could be accomplished in real-time by using an affordably scalable system. The cues we chose to use are the bottom three from the list above: grammatical structure, knowledge of the world and context.

How do you teach a computer about the world?

In order to grant our platform the ability to understand language we set out to teach it about the world. We accomplished this by equipping it with a graph mapping out the billions of connections that exist between over 30 million nouns, topics, terms and things.

One might well ask: “What do you define as a connection?”

The answer is we defined any possible relationship between two things as a “connection”. For example:

The person “Steve” is connected to the company “Apple” because he “founded” it.

As I showed above, merely knowing that a possible connection may exist between two things goes a long way to assisting one in understanding the context in which they’re mentioned.
The more one’s aware of possible connections between the things mentioned in any given text the greater one’s confidence in selecting the correct context for each.

Returning to the example above, had the sentence been:

“Apple’s iPhone, answered Steve, is my favorite accomplishment”

We’d have little doubt as to which “Steve” and which “Apple” the sentence refers to. The mention of the iPhone grants the context we were missing before.

One of our platform’s greatest assets is that it approaches text holistically.

Rather than examine every word or sentence separately, we look at much larger segments of text in order to search for the clues that will enable us to find the correct context for all the ambiguous terms encountered.

Grammar is a powerful tool for understanding meaning

“Grammar, Yoda teaches, a powerful tool for obscuring meaning is.”

Conversely, when used correctly (i.e. not in Star Wars), it’s a very helpful tool in helping us understand context and meaning.

By incorporating a model of the English language’s grammar into our platform we were able to improve its understanding of context even further.

Content matching

We use our ability to accurately understand what a “thing”, not only to choose its appropriate meaning in the context of the text in which it’s mentioned but also, to help us retrieve matching and useful content accurately. To date we’ve already mapped hundreds of sources for:

  • News
  • Articles
  • Images
  • Videos
  • Realtime data
  • Facts
  • Events
  • Geo-data
  • Product specs

We use all these sources to retrieve the content we match for every entity. Our platform enables to extend this content repository easily, and we continue to add more data to it on a regular basis. We’re even capable of utilizing content provided by our publishers as a source (see the screenshot below for an example of how this is implemented on film review blog HeyUGuys).

What’s next?

It’s our intention to develop our content syndication abilities further in future so as to assist our bloggers and publishers in achieving distribution and getting more traffic to their sites.

As more sites and blogs install our widget we believe that the content they hold will gradually become one of the most interesting content assets we have access to, but that’s already material for another post…

The editors of the HeyUGuys blog use Headup as a "related post" widget to drive traffic internally to other posts on their blog

The editors of the HeyUGuys blog use Headup as a "related post" widget to drive traffic internally to other posts on their blog

Image Credits: Spencer E Holtaway, johannes pape