What is the Googlebot Spider?

A Look at Cached Pages, Google’s Spider and Page Crawling

© Mia Carter

Aug 9, 2008
Google's spider crawls a different type of web., Mary K. Baird Photo
Wondering how Google keeps up with the creation of new websites, web pages and all of the updates to those sites and pages? The key is Googlebot, Google's spider.

Editor's Choice

Google is perhaps among the world’s most popular search engines. Millions of internet users visit the site every day to perform research or to look up and discover new websites.

But how does Google “know” about all those websites and web pages? Thousands of new internet sites and pages are developed every day, and even more are altered, updated and redesigned. So many web page creators and internet surfers alike wonder just how what keeps the Google search engine in the know.

Well, simply stated, Google’s spider is the key to the site’s efficacy as a search engine.

What is a “Spider” or “Crawler?”

Search engines like Google develop software programs that are designed to “crawl” the millions of websites and web pages that comprise the internet. These programs are known as “spiders” or “crawlers.” Googlebot is the name of Google’s spider program.

According to software developer and internet enthusiast Alan Sparks, the term “spider” arose like this: “The internet is also called ‘the web’ and software programs like Googlebot navigate the web pages and websites that comprise the web. What navigates and walks around on a web? A spider – that’s how that term arose. And spiders don’t really walk per se – they crawl, hence the term ‘crawlers.’”

How Does the Googlebot Spider Work?

Googlebot never sleeps. Googlebot is constantly crawling new sites and re-visiting existing sites, updating Google’s “memory” or index as it goes.

Sparks explained how it all works: When a new website is created, the webmaster submits the website address to Google and other search engines like Yahoo! and MSN. The website is added to a list of new sites that Googlebot will visit. It typically takes several weeks for the spider to pay its initial visit to a newly created website.

When the spider reaches the website, it automatically navigates through the site, for keywords and tagging like meta tags, navigating the inbound and outbound links and the various components of the site. As Googlebot visits and “crawls” through the website, the software is essentially forming a snapshot of the website and all its individual web pages at that particular point in time. That snapshot or “memory” of the website and its individual web pages is cached or “filed.”

The cached information is then added to Google’s memory banks, also known as the index. The index is Google’s “memory,” and when a visitor types in a search term, Google searches its memory for websites and web pages that fit the bill.

At various intervals, Googlebot will revisit the websites in its index. The spider software will “crawl” the various components of the website again, forming a new snapshot. This new snapshot is then added to the index, thereby keeping Google’s memory very close to current.

How Does Googlebot Affect Website Developers and Website Visitors?

The best websites on the internet, are dynamic and ever-changing. But there is a delay between the time when the webmaster changes the website and when the new content appears in search results. Simply stated, it takes time for Google to “learn” about the changes on a web page, and it’s the Google spider that goes out and checks pages, searching for new content and updates.

When conducting a search on Google, the search results reflect the information that was available during Googlebot’s last crawl of the site.

Theoretically, the webmaster must take into consideration that he can change the content, layout, links or other components of the website, but Google will not “know” about these changes until Googlebot’s next visit. So if the site is crawled every hour on the hour, and it’s visited by Googlebot at 1:00 p.m., any changes made after that time will not be evident to Google until the spider’s next crawl of the site at 2:00 p.m.

Notably, Google visitors have the option to view the cached page when looking at the websites and web pages in the search results. This option allows for faster loading, but this cached version of the page reflects what the page looked like when Googlebot last crawled the site, so any changes that have been made since then will not appear unless visitors click on the website’s link to actually visit the site.

Google is among the most sophisticated search engines in existence, and the Google empire has stretched to include an advertising program, e-mail, map programs and more. And unfortunately, curious minds will have to settle for educated guesses rather than hard fact, as Google will continue to keep all those secrets to success under tight wraps.

Related Reading

Readers who enjoyed this article may also enjoy other Suite101 articles, including What is Twitter?, How Do I Create a Website and What is an RSS Feed?


The copyright of the article What is the Googlebot Spider? in Internet is owned by Mia Carter. Permission to republish What is the Googlebot Spider? in print or online must be granted by the author in writing.


Google's spider crawls a different type of web., Mary K. Baird Photo
       


Post this Article to facebook Add this Article to del.icio.us! Digg this Article furl this Article Add this Article to Reddit Add this Article to Technorati Add this Article to Newsvine Add this Article to Windows Live Add this Article to Yahoo Add this Article to StumbleUpon Add this Article to BlinkLists Add this Article to Spurl Add this Article to Google Add this Article to Ask Add this Article to Squidoo

Comments
Feb 28, 2009 2:30 AM
Nina Saville :
What a great article! So simply explained and erudite at the same time. It is lovely to read tech articles that don't blind you with science. Ilook forward to reading more of this excellent journalists work.
Nina S
1 Comment: