Natural Language in Search Engines

There are new search engines coming out every day using different algorithms to serve web surfers. Among them, a new breed of search engines is emerging, using so-called “natural language inputs” to activate searches.

The latest one noted is a search engine called Powerset (not an exciting name) developed at Palo Alto Research Center (PARC). Its purpose: “to build a search engine that could someday rival Google.”

Having typed in the URL of this search engine (http://www.powerset.com), my Firefox browser brings me to its entry page. A first glance of my Google Toolbar shows that its page rank is six. Not bad, eh?!

I decided to put the natural language search to the test, and typed in “I don’t know Albert Einstein.” It returned a list of related Wikipedia articles related to Albert Einstein. It seems that the search results ignored the words “I don’t know” in my search query and directly returned the “answers” to my “question.”

I tried the same search sentence in Google, and the search results were quite different. It returned many results containing the words “Albert Einstein”as well as “don’t know” trying to match what I was looking for. Essentially it returned “contextual” search results.

So which one is better?

Before we answer that question, let’s understand the working mechanism behind natural language technology. Natural language technology comes from a branch of computer science called Artificial Intelligence (AI) study.

I first became acquainted with AI while in college studying information engineering twenty years ago. At that time, we were very optimistic about the development of AI as we had the perception that the computer would replace every possible activity done by humans sooner or later by incorporating better information processing capability in programming. We studied neural network, voice recognition, word recognition, and robotic technology, hoping that one day we could build a computer that talks and thinks and acts like a human being.

Twenty years have passed, and computing scientists have made nothing similar to this hope, and are not even closer to the goal. Why? In my opinion, it’s because the basic computer architecture has not changed, and this is a great hindrance to making computers think like a human being.

Computers, from the time of their invention to now, inherit a basic architecture of memory, input, output, and a central processing unit. The central processing unit is designed to receive programming instructions and execute them linearly, one by one. It is not “thinking”, it is just “executing” a set of instructions.

A normal human being, having a brain containing more than 100 billion brain cells (neurons), each of them forming complex networks with each other with their dendrites, is much more powerful than even the most advanced computers in the world.

The brain’s complex network can function non-linearly in thinking. It allows us to have the intellectual processing power of imagination, association, feeling, emotion, and every kind of complex cognitive behaviour that is difficult for computers to imitate.

Among these capabilities, the power to associate things in order to build new knowledge is unprecedented. We humans don’t process information linearly, as a computer does. We process information non-linearly, by considering many inputs at the same time using our five senses and all the possible associations of knowledge and memory. If you have ever daydreamed before (and I’m quite sure you have), you will understand what I mean here. A tiny little trigger such as a special smell or a particular visual sight could propel your thoughts far from where you are or what you are doing.

Without this complex processing network we can hardly recognize a human face instantly, understand the subtle meaning of languages, or express a high level of emotional responses.

Unless a computer can totally revolutionalize its basic architecture to be more like a human brain, it can never approach these high-level cognitive activities that we humans are doing every day.

We do have research in this area, such as biological computer study – the study dealing with building a computer that can “learn” by using RNA duplication in micro-biology. Some newer studies also mention the use of nanotechnology to aid in this process. But we hardly have anything close to that reality now.

So I believe that, owing to the limitation of present computer architecture, we can hardly build a machine that can “talk” and “listen” like a three-year-old boy does. That means the aim to incorporate natural language capability in search queries is likely to fail.

If you try more search questions in Powerset, you will find that the results are not impressive at all. The fundamental drawback is the search results are rather unpredictable. We do not know what will be returned, and therefore cannot trust it to be consistent.

Unlike contextual search results, when you type in “I don’t know Albert Einstein”, you are quite sure the search engine will return search results containing these keywords. After all, the contextual search results are based on database technology, in which, within the past twenty years, we have gained a lot of improvement. And database technology is much more scalable in terms of capability to handle large volumes of data, whereas I am not sure if the natural language technology of Powerset is scalable or not. What will happen if it needs to process more and more data in the front end (natural language web surfers) and the back end (more and more data that it needs to use natural language technology to “understand”)? This is still unknown.

Google acquired Applied Semantics in 2003 to introduce the capability of handling better “interpretation”of what a webpage is about. I think this is a more straightforward approach, bringing in the natural language processing capability on the back-end to process with the indexed web pages in a search engine’s database. As I said in one of my 2006 posts about Applied Semantics (http://jdcnet.com/seo-and-applied-semantics-the-future-trend-of-calculating-the-keyword-density-of-a-webpage.html), this acquisition brings a revolutionary enhancement to a search engine’s web page ranking capability.

As for front-end, I think search engines should let the users decide the search words, and return what users are looking for. Perhaps this a more guaranteed and predictable way of serving web searchers.

Tags: AI, Natural Language Search Engine, Contextual Search Engine, human brain structure, neural network

JdcNet.com

Get connected by Data

Natural Language in Search Engines