Inside the "Deep" Net
by Don Rittner


It's been almost a decade since I penned the first Net book, EcoLinking - Everyone's Guide to Online Environmental Information (Peachpit Press). I was highly optimistic about the potential of the Net even then. That optimism has been reinforced a thousands times. The Net has developed into a wonderful tool for communication and research.

The Net is constantly evolving into, well, no one knows. It's like a digital version of Brownian Movement: the random movement of surfers suspended in an electronic fluid. It's too dynamic to pin down. It has fooled the best pundits and none of us want it to stop evolving anyway.

In the "old" Net days, sending an email was an exercise in faith. There was no guarantee it would get delivered. Often no one could figure out where it went - digital purgatory, perhaps. If you wanted to search for info, you had to know where it was. There were no search engines like Yahoo or Google. No, search engines meant getting into your car and driving to the library. There was no Web, watching CNN live, or listening to your favorite classical radio station as you surfed.

With the introduction of the MESH, aka the Web, by Tim Berners Lee, information became linked to each other making it easier to find in this "hyperlinked" environment.

You currently use this version of cyberspace when you use your Web browser to surf the Net. When you use a search engine to find information, each "hit" is linked. You simply click on it and go to that Web site. This is called the "Surface" Web.

The Surface Web is analogous to a fishing trawler crossing the sound. It casts out its net and drags for fish perhaps a few feet underneath. The problem with this scenario is there are a whole lot more fish further down in the water.

The same holds true on the Net. Search engines only find "static" pages on a Web site. Those that have been linked to other Web sites, or pages. Search engines send out little "spiders" that "crawl" through Web sites indexing information. If a page is not linked to another it won't get indexed. The results of this search is what you find when you do keyword searches on Yahoo or other sites.

Here lies the problem. Net evolution has given us access to searchable databases. Databases on the Net are a tremendous resource. however, search engines do NOT index those databases. They don't have access to the information buried "deep" in the database. It's estimated that the "Deep" Web is a vast reservoir of content that is 500 times larger than the known "Surface" Web.

There are new technologies that can penetrate the Deep Web. Look at these statistics:

Public information on the "Deep Web" is currently 400 to 550 times larger than the commonly defined Web.

The "Deep Web" contains 7,500 terabytes of information, compared to 19 terabytes in the Surface Web. A terrabyte is 1024 gigabytes.

The Deep Web contains nearly 550 billion individual documents compared to the 1 billion of the Surface Web.

More than an estimated 100,000 Deep Web sites presently exist.

60 of the largest Deep Web sites collectively contain about 750 terabytes of information ­exceeding the size of the Surface Web by 40 times.

On average, Deep Web sites receive about 50% greater monthly traffic than surface sites and are more highly linked to than surface sites. Yet, the typical Deep Web site isn't well known to the Internet search public.

The Deep Web is the largest growing category of new information on the Internet.

Deep Web sites tend to be narrower with deeper content than conventional surface sites.

Total quality content of the Deep Web is at least 1,000 to 2,000 times greater than that of the Surface Web.

Deep Web content is highly relevant to every information need, market and domain.

More than half of the Deep Web content resides in topic specific databases.

A full 95% of the Deep Web is publicly accessible information ­ not subject to fees or subscriptions.

These are fantastic statistics. A recent study revealed that the largest search engines today individually index at most 16% of the Surface Web. By missing the content on the Deep Web, Web searchers are searching only 0.03% ­ or one in 3,000 ­ of the content available to them.

Gaining access to the Deep Web is mind boggling when you realize what's there. The next metamorphosis will be to the Integrated Web, in my opinion. You'll have instant access to live video, audio, and data at will. Verbally command your computer to get the day's news, video, bring up stock results, search a library or catalog, all while you surf or compose your next column.

I'm looking forward to that day.