The Semantic Web
Today we all use the web, but Tim Berners-Lee drives it. It’s his vision, uses his protocols and he presides over it at the World Wide Web Consortium. What was a childhood obsession with connections has become a movement that has transformed the planet.
Surely, Berners-Lee ranks with Gutenberg, Marconi and Alexander Graham Bell as one of the most influential people in the history of communication. However, his predecessors stopped at one major innovation. Berners-Lee is going for an encore.
It’s called the Semantic Web and he intends it to be no less revolutionary than the World Wide Web. If successful, it will unlock the power of information stored in the world’s computers.
To understand how, we first need to understand how and why the web was created.
A Brief History of the Internet
The internet started as a US Military project called ARPANET that was part of the overall scientific funding in response to the Soviet launching of Sputnik. It was a revolutionary communication network mostly because of its open architecture, which allowed separate parts of the network to develop independently.
It also incorporated packet switching technology that was much more efficient than earlier systems. Messages could be broken up into small “packets,” sent along different routes, then put back together again. Like having multiple registers at the grocery store, this enabled much more traffic to be sent through existing wires.
The design and the efficiency of the network made it ideal for connecting academic institutions and allowing scientists to collaborate. From the early 1970’s till the late 1980’s, the Internet was primarily used by academics attached to large research institutions. Later, commercial dial-up services became available and consumers could connect to the internet as well.
“Walled Gardens” Before the Web
While it was theoretically possible for anybody connected to the internet to communicate with each other, practically it was difficult. One person’s system would have to talk to another person’s system and they would both have to speak the same language.
Furthermore, you would have to know what you were looking for and where it was. Finally, you would have to have access to the information. The Internet was far from universal.
While the internet of the late ‘80s and early ‘90s was exciting, it was also somewhat constricting. Applications like e-mail were becoming popular and millions subscribed to online informational services, yet it wasn’t anything like the internet we know today. It was an internet without the World Wide Web.
Tim Berners-Lee’s Vision
As a child in an academic family, Tim Berners-Lee was obsessed with the way knowledge was connected. He believed that information out of context loses its meaning.
Just as words describe other words, documents describe other documents; discoveries reference other discoveries and so on. We all stand on the shoulders of giants, so access to information demonstrably increases the efficiency of thought.
For him it was maddening that computers could hold so much information, yet much of it was useless. People who needed it couldn’t get to it. They usually didn’t know where it was and even if they did, it was cumbersome getting computers to talk to each other. He felt that information should not only be available, but easily accessible.
Vision Meets Opportunity
The problem was especially acute at CERN, where Berners-Lee worked and one of the world’s premier physics laboratories. Thousands of scientists would come each year to use its enormous particle accelerator and then go back to their home institutions.
There was an enormous need for the scientists to collaborate and share information. Many documentation systems had been proposed and implemented, but none had been effectively adopted. In effect, there was lots of knowledge with little connection. It was a perfect opportunity for Berners-Lee to work out his childhood dream of connecting intelligence.
Berners-Lee saw that the problem with the previous systems was that they were based on hierarchies. Nobody could agree on the proper way to classify and organize information. Moreover, they didn’t want to use a documentation system based on what someone else thought was important. What was central to one person was peripheral to another.
In 1989, a revolutionary year around the world for many reasons, Berners-Lee proposed a “web” of information that had no hierarchy, only links. His proposal proved to be as transformative as any event that year.
Universality of Meaning
Imagine you are in a foreign country where you don’t speak the language. You will have a hard time communicating with others in the way that you’re used to. However, within a very short time you will learn to recognize universal forms of communicating.
You’ll notice that traffic lights use the same colors to mean the same things. Red means stop, green means go. Signs for bathrooms also tend to be international, or at least super-regional. With a few very basic standards and some finger pointing, you’ll find that you’ll be able to get by.
Tim Berners-Lee wanted to do the same with the internet. He realized that it was foolish to try to get everybody to use the same languages and protocols to run their computer networks. Like language and culture, different local networks have to address different local needs and preferences.
He therefore sought to have as few rules as possible so that everybody would be more likely to use them. One of the ways he did this was by creating what programmers call a markup language. He named the particular language he invented HTML and it has become the basic language of the web.
To understand how a markup language works, think of a screenplay. The writer can add unspoken directions to actors if he wants them to act (wryly) or (angry) or (cheerful). In Berners-Lee’s web language, there were similar directions to <make this bold> or <go to www.digitaltonto.com>.
By using universal markups, information could be universally displayed even if different programs were used to create the document.
The World Wide Web
Although he came across an enormous amount of resistance, it all worked splendidly. All you had to do was tell people that you are using his standards to identify yourself (URL), announce that you were using his protocol (HTTP), and use HTML to “markup” your documents. Without changing your internal systems you could broadcast to the world!
Previously, if you wanted to broadcast or publish there were enormous obstacles and costs. For TV and Radio, you would need a license and even to print a small newsletter there were production and distribution costs.
If you wanted to announce something to the world you either had to be a business or a large institution or you would have to convince someone in power that what you had to say was important.
Now, with the web, all you needed was a personal computer and a phone line. Anybody could share anything they wanted without getting permission. The benefits have been enormous and we all have access to information that previous generations couldn’t dream of.
That was Tim Berners-Lee’s vision, that everybody could share documents with everybody else. However, he believes it doesn’t go far enough. Since the late 90’s he’s been working on a second stage that will unlock even more information.
The Semantic Web
Today’s web is centered on documents. They are different than traditional documents because they are dynamic – we push buttons and they will change – yet they are documents nonetheless. The Semantic Web seeks to free the data that underlies web documents.
Imagine you want to sell car. You can upload the specifications to different web sites and the users of those web sites can see what kind of car you have, what price you are offering, etc.
However, what if the person who wants to buy your car doesn’t go to the sites that you’ve posted on? You could sell your car much more easily if you could just upload it once and all car web sites could access it.
The key is to get computers who don’t speak the same language to understand that they are talking about the same thing.
Different systems use different terms. For instance, in our car selling example, the company who made the car can be a “manufacturer,” and “producer” or a “make” and that’s assuming that the site is in English.
A person would recognize that these terms are equivalent, but a database wouldn’t. We have come a long way in teaching machines to talk to people, now we have to get machines to understand other machines.
This will be done with a set of rules called RDF, which will allow computers to know that two things are the same in some way. Additional “metadata” can be added to definitions and function like a cross language dictionary. Like tourists do in a foreign country, one set of terms to be translated to be understood in terms of another set of terms.
Ontologies can be local or global. For instance, if an industry wants its computers to speak only a specific dialect, it can exclude global ontologies. Data can be freed from proprietary system structures just like documents were freed in the early days of the web.
The possibilities are exciting and applications are already being rolled out. Advertiser’s data about brands can be matched with media data about consumers. Data about poverty and hunger stored in computers around the world can be combined and analyzed. Through combining databases, we will be more likely to identify problems and find solutions.
The Future of the Semantic Web
Just like the early Web, the Semantic Web has its critics. Much of the criticism comes from the technical community, who fear that the extra data will prove cumbersome. They fear that applications will have to spend too much time describing what they are doing and not enough time doing it. Others fear that the project just isn’t feasible.
There are also privacy concerns. We are often uploading data to databases without even knowing it. Every time we go to a web site, make a purchase or link to a friend on a social network we are creating data. The prospect of all our activity being connected together is a bit scary.
However, it’s not the technology we should fear, but how people use it. Use can be regulated. Moreover, everyday useful data is collected and connecting all of that data together could help us solve big problems such as disease, global warming and poverty.
The journey seems worth the effort.