Search Engines and Directories in the Deep Web

Librarians Navigating Findability and Searching in an Online World

© Allan Cho

Jan 9, 2009
Invisible Web versus Surface Web, ICRISAT Library and Information Services
While many navigate the web for information comfortably believing that we are accessing the entire web for information, search engines are quite limited.

Often synonymously referred to as the deep or hidden web, the invisible web contains content that is not part of the surface Web. Untouched, un-indexed and unreachable by search engines, the invisible web is estimated to be several orders of magnitude larger than the surface Web.

Surface Web Versus Deep Web

In contrast, the surface Web is that portion of the World Wide Web that is indexed by conventional search engines. A famous study by Michael Bergman estimates that the deep Web contains 7,500 terabytes of information compared to 19 terabytes of information in the surface Web.

Some examples of material on the web that is not easily discoverable on the surface web include:

  • Pages which are not linked to other pages
  • Websites that require registration
  • Pages created by JavaScript and Flash
  • Non-textual files (e.g. multimedia videos)

Surface Web

The area of the web in which we can surf is often scanned, or indexed, by automated spiders or web crawlers. Search engines such as Google or Yahoo!, use spiders and web crawlers to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches.

However, those areas of the web which are untouchable are also left unindexed, and hence is called the hidden or invisible web.

Librarians and Searching

Think about the Web as a vast library. Just as we wouldn't expect to just walk in the front door and immediately find information on the history of paper clips lying on the front desk, web searching might require some digging around before the answer can be found.

Unfortunately, due to the limitations of the surface web, search engines are not necessarily the best tools for searching. Each search engine divides the World Wide Web into what it indexes and the “everything else” that it does not cover.

Subject Directories versus Search Engines

Like search engines, subject directories are databases of websites. Yet, unlike search engines, the information in subject directories is collected and organized by people rather than machines, which is one strategy that search method that reaches the hidden web.

Computers can collect and organize millions of websites in the database, the expertise of subject professionals are much better at finding and evaluating the quality of information than computers, and that is why subject directories are a much offer larger advantages over automated search engines. Although one may find fewer results with a subject directory than a search engine, the results will be much higher quality.

Librarians and Directories

Directories are nothing new and have been a main staple for library professionals long before the advent of the internet. Librarians have always used directories for reference transactions, although mostly in paper form. With the web, librarians have created digital directories in the form of websites that provide a large collection of links, arranged according to a classification scheme that enables browsing by subject area. The features of a directory include:

  • A listing of websites
  • Organized in a hierarchy of categories
  • Indexed by human beings

Librarians' Internet Index (LII)

LII is a directory created by librarians that offer a searchable and browsable collection of over 20,000 websites and organized into 14 main topics and nearly 300 related topics.

In conclusion, librarians offer a niche of subject professionals which help navigate the jungle that is the world wide web. While Google might do an adequate job at finding answers to straightforward inquiries, it is limited in vision and range as the invisible and deep web is expansive and nebulous in scope.


The copyright of the article Search Engines and Directories in the Deep Web in Internet is owned by Allan Cho. Permission to republish Search Engines and Directories in the Deep Web in print or online must be granted by the author in writing.


Invisible Web versus Surface Web, ICRISAT Library and Information Services
       


Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo