feedthebot.com How to follow the Google webmaster guidelines




professor pointing at written guideline

This Google guideline states...


"Make sure your web server supports the If-Modified-Since HTTP header. This feature allows your web server to tell Google whether your content has changed since we last crawled your site. Supporting this feature saves you bandwidth and overhead."


Definitions


  • web server - the computer (and the software on that computer) that hosts your Web site.
  • HTTP Header - Piece of information about a webpage, normally not seen by the user, that one computer tells another computer.
  • bandwidth - the amount of traffic that is allowed to happen between your web site and the rest of the internet - determined by your hosting service, using more bandwidth sometimes results in paying more for your hosting.

Examples and Explanations


examples

The If-Modified-Since HTTP header tells a search engine spider one of two things about a webpage...


- This webpage has not changed, no need to download again.

- This webpage has changed so download again because there is new information.


Update:If you are wondering if you are supporting this or not, use the if modified since tool. It will tell you.


One way to describe If Modified Since is to think of that little flag on your mailbox outside your house. If you put the flag up, then the postman knows you have mail in there and he will come and get it. If the flag on your mailbox is down then the postman knows he doesn't have to pick up any mail.

If Googlebot was the mailperson, and your webpage was the mailbox, Googlebot looks to see if that "flag" is up or down before it accesses your page. (I explain this guideline in a video below)


Why is that important?

Since Google spiders billions of pages, there is no real need to use their resources or yours to look at a webpage that has not changed.

For those people who have very large websites the crawling process of search engine spiders can consume lots of bandwidth and result in extra cost.


How does it work?

Let's say you have a website about pets that has pages about dogs, cats, and turtles.

You have just updated the turtle page with new photos.

Every once in a while Googlebot (the search engine crawler of Google) will visit your pages. It will check each page in your website and if it none of the pages have changed, it will not load any of those pages.

But in our case, the turtle page has changed (and has a "flag" up"). So when Googlebot comes it will see the "flag" for the turtle page and it will access the turtle page.

It knows that the turtle page has been updated because of the If Modified Since header code. Since none of the other web pages have been updated, there is no reason to get a new copy of them since it already has a current copy in the index.


304 status code


Status codes are listed in your log files, and you will typically see then in your statistics report.


If a search engine crawler sees a web page status code of 304 it knows that web page has not been updated and does not need to be accessed again.


How do I know if my web server supports the If-Modified-Since header?


Update: This tool will tell you if a web page supports it... If Modified Since Tool

There are two other ways of doing this as well. If you have access to your statistics and/ or log files you can just look for the status code of 304, if you see 304 as a status code in any of your logs then your web server supports this header. If you can't do that then you should call your web host and ask them. When deciding on a web host this is one of the questions you should ask. "Does your web server support the "If-Modified-Since HTTP header?"

Virtually every modern hosting company supports it, but it is still worth checking.


important concepts

Key concepts


Using a web server that supports the If Modified Since header is recommended, and will result in less bandwidth being used by search engine crawlers.


The overwhelming majority of web hosts use this, you likely need not worry about this guideline unless you have a massive website with thousands or tens of thousand of pages.



How to determine if you are following this webmaster guideline


To see if a web page supports the if modified since http header you can use this tool:

If Modified Since Tool


Resources


From Google:

Google help pages -

What do HTTP status codes mean?


From other sources:

This tool tells you if a web page supports the If Modified Since header

If Modified Since Tool

This is a video of me describing the "If-modified-since" header

If-Modified-Since header

The W3 protocol defining the "If-modified-since" header

If-Modified-Since


Next