Technical Guidelines from the Google webmaster guidelines
Google webmasters guidelines > Technical (there are five guidelines in this
category)
The guidelines in this category are , er, for lack of a better word..
technical. This means that they are pretty specific about a couple of
things and that they bring up many terms that you might not be aware of.
The good news is, each of these guidelines can be broken down into
pieces small enough to understand. Even if they seem confusing I would
recommend that you read through them well as they (and all the
guidelines) play an important role in how your site is indexed by
Google.
- Use a text browser such as Lynx to examine your site, because most search engine spiders see your site much as Lynx would. If fancy features such as JavaScript, cookies, session IDs, frames, DHTML, or Flash keep you from seeing all of your site in a text browser, then search engine spiders may have trouble crawling your site.
- Allow search bots to crawl your sites without session IDs or arguments that track their path through the site. These techniques are useful for tracking individual user behavior, but the access pattern of bots is entirely different. Using these techniques may result in incomplete indexing of your site, as bots may not be able to eliminate URLs that look different but actually point to the same page.
- Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell Google whether your content has changed since we last crawled your site. Supporting this feature saves you bandwidth and overhead.
- Make use of the robots.txt file on your web server. This file tells crawlers which directories can or cannot be crawled. Make sure it's current for your site so that you don't accidentally block the Googlebot crawler.
- If your company buys a content management system, make sure that the system can export your content so that search engine spiders can crawl your site.