Sathya Srinivasan's Blog: Searching for a way to Web 3.0

For the last few years, I have been involved professionally in the "Information Management" space. Much has been written, blogged, and tweeted about managing information, ironically just resulting in more information to manage (job security!). Having been involved with computers since the late 80s, I have witnessed the explosive growth of computing power, the Internet, and the information that goes with the Internet. It makes sense since as social beings, we seem to have the fundamental tendency to share, regardless of whether someone is willing to receive or not.

One of the most significant changes that has happened during the last couple of decades is the widespread adoption of the Internet. While we haven't found a parallel physical universe, it looks like we have been able to create a parallel Earth in a sense with the Internet. I had blogged earlier about how the one key feature that seems to distinguish human behavior against God-like behavior is our limits of perception. To me, the Internet is our attempt to break the sensory limitations that we have and go beyond. Because of the Internet, our overall memory has increased and our inherent limitations on what we can understand and comprehend have started diminishing. Ironically the same break in the perception barrier has also imposed new challenges. We now have too much information to deal with and it's becoming harder to distinguish meaningful information (such as this blog) and meaningless drivel (such as this blog). Maybe it's God's way of punishing us for trying to break the barrier!

So, for sake of better understanding this universe, it may be worth understanding how it came into being in the first place.

Web 1.0
Around the beginning of Internet time, most content processed through the computers was structured data, such as tables of numbers. We were primarily using the computers as number crunching machines and it served our purpose just fine. Then someone had the bright idea to also share the documentation that goes with it (something that never happens in the real world - have you ever written good documentation for your code so that others can understand your mess?!) and thus started the great Internet revolution. The intent was mainly to push the information one-way, more like shouting from a soap box or giving a lecture, probably because the initial intent was to share scientific work.

This suited many people just fine and hence we had a proliferation of websites or soap boxes and everyone was voicing their opinion happily. This is probably the highest form of democracy - you can voice your opinion as much as you want without fear of censorship (except in some places) or getting reprimanded. However, as we have seen in many democracies (especially in India), democratic behavior can also lead to chaos if not controlled properly. You might end up having too many people voicing too many opinions resulting in more noise than music. For a country, this manifests in lack of decisive action (such as Obama not being able to pass a health care bill or India not being able to focus on long-term goals). For the Internet, this manifests in lack of focus from the user's perspective, such as not being able to find the right piece of information on time.

Web 2.0
As the information being shared grew larger and larger, so did the difficulty in finding the right information quickly.

Search engines came along to help alleviate this problem. They can probably be equated to a coordinating committee that tries to pass a bill. The purpose was to get the information and present the findings in some sort of an order so as to make the job of finding information easier. Most search engines were focused primarily on getting to the information based on the search being performed. It is kind of like finding a book in the library if you know the title (or at best, the category).

Due to the increasing reliance and relative ease of use of finding the information that was needed on the Internet rather than making a personal visit or a phone call, we seem to have started spending more and more time in this parallel universe than the real one. Maybe this increased presence led to the need for personalization, kind of like how your doctor or grocer starts greets you personally if you go to him more often. This increased presence also potentially led the "soap box" shouters to realize that there actually is an audience to hear their rant. This new audience also indicated potential opportunities, which in turn, led to increased advertisements and increased need to personalize the message to attract more audience towards one soap box as opposed to another.

Then Google came along to create a shift in the search. Instead of providing results based on just what the users entered, the search engine giant pulled parameters related to the information that could help contextualize the result. This shift in thinking converted Internet from being an information source to being a knowledge source, as knowledge is primarily information that is put in context. We are currently here.

Knowledge can be obtained in one of three ways - analyzing and understanding the content itself and putting it in perspective, prioritizing the information based on the users who read it, and comparing the information with other related information in order to create a context. Google has succeeded in the second and third areas primarily - it prioritizes the result not just by what the user is looking for but also by how others perceive that information (by way of inbound and outbound likes and supposedly 200 other parameters). However, I think we are still a bit off on the other aspect - understanding the information itself.

The parallel progress of increased communication in the Internet space as opposed to the physical space seems to have helped Google as well. Thanks to Social Networking such as Facebook, Twitter, and the like, we are actively providing more feedback to Google by means of linking content. In essence, we are the minions of Google!

Web 3.0
The next evolution potentially lies in the first aspect of getting knowledge - understanding and analyzing the information to find meaning. This issue has been complicated primarily due to the inherent unstructured nature of the primary language of the Internet itself - English. English by nature is more unstructured than structured, potentially due to its organic evolution. In a sense, it is kind of like the Internet in that it has grown in all directions with a simple core grammar. The grammar itself has a lot of rules and exceptions. To compound the problem, the words are heavily dependent on context, with a single word having multiple meanings depending on the context. Such ambiguity is bad for a structured lookup, which is why finding results by just keywords in a search engine is difficult.

Restructuring the web
Since this is a hard nut to crack, we have to look for alternatives. There are two. First is to convert the way information is stored into something more structured. The most concerted effort being made in this space is that of Semantic web, or web of meaning. In principle, semantic web attempts to redo how information is stored and shared. Current document formats, be it HTML that powers the Internet or Word, PDF, etc. use a convention that is more focused on presentation rather than information.

Semantic web attempts to change this by means of new standards such as RDF (Resource Description Framework) and OWL (Ontology Web Language) aim to get users to provide structure around otherwise unstructured information. This is great, except that it is difficult to get existing content providers to convert to this format quickly. The push can potentially happen if major content providers conform to these standards and eventually others are forced to follow suit for fear of being left out.

The second option is to obtain this contextual information from the attempts made to get to the information itself, by understanding the intent of the user's request. I feel this area is probably more immediately solvable than the previous one and hence can potentially provide a more short-term solution and augment the long-term solution later. For this to work, we need to go one step beyond looking at converting information into knowledge and start focusing on converting knowledge into wisdom. Wisdom, to me, is knowledge distilled over time. Thus instead of the current search behavior of returning results based on a single request, what is required is to understand user's behavior over time.

Knowledge by filter
This can happen in one of two ways - first is to store all the searches the user performs, and essentially glean information off that information provide relevant results. This sort of a brute force method is probably underway in Google already, since it pretty much keeps track of all the searches stored. This method has decent benefits. For example, if I search for the word "ball" and my last 100 website visits or searches were around football, the system can make an assumption that "ball" means "football" (or "soccer" - this is where semantic nets come in as well). Given the current scenario, this is probably the next step in search.

This concept was even attempted by a few systems. Yahoo! for example, introduced a beta version of its website for a short period where one could provide some clues to the search engine on whether they are shopping for an item or doing research. Unfortunately, it looks like it never made it to the main search engine and the site is currently inactive (as far as I know).

Google seems to have incorporated the same concept in its website as well, although it is not readily apparent (you have to click on the "More Options" link in the search results window to see the option.

Wisdom by conversation
Another way to understand user's intent is to have a conversation with the user. In real life, understanding occurs in this fashion. We have a conversation with the other person and by asking questions in different ways, we eventually figure out the underlying intention. It's kind of like the "blind men and the elephant" story but with a twist that there is a 7th person who pulls in the input from the six blind men to come to the realization that they are talking about an elephant. Similarly, it would be interesting if a search engine has a conversation rather than ask a single keyword and then analyze the information to get to the right result.

A few years back, there used to be a website called Active Buyer's Guide. Using the Wayback machine, I found that it became inactive around 2006 or so. Before Google and Amazon became mainstream, I used to use this site (probably around 2000 - 2002) to drill down to a specific product.

Source: http://faculty.ksu.edu.sa/2921/Lectures%202/Uses%20of%20Digital%20Camera.ppt

The site was different from other typical product sites (like Newegg, which by the way, has a much better filtering mechanism than Amazon). Instead of just giving a bunch of options that I may or may not know anything about, it used to strike a conversation with me in plain English.

Source: http://faculty.ksu.edu.sa/2921/Lectures%202/Uses%20of%20Digital%20Camera.ppt

On top of that, it also used to ask me how important one feature was compared to another, which was cool because not all features are given the same priority.

Source: http://faculty.ksu.edu.sa/2921/Lectures%202/Uses%20of%20Digital%20Camera.ppt

Thus, the results would be a combination of what I want in a product and how important I consider different features to be with respect to one another. This resulted in an amazingly close approximation to what I wanted.

Source: http://faculty.ksu.edu.sa/2921/Lectures%202/Uses%20of%20Digital%20Camera.ppt

The searches we perform our searches are similar to this behavior. We first do a search, look at the results, and if we are not happy with the results, perform another search potentially with some keywords found in the first search, till we get to what we desire. It should not be that difficult for the search engine to emulate this behavior. I believe this is where the next revolution in search engine lies and it will be interesting to see if this becomes a reality soon.

Sathya Srinivasan's Blog

Search This Blog

Thursday, March 04, 2010

Searching for a way to Web 3.0

No comments: