scraping content

Avoid using scraped content

Scraping: Taking content from other places on the web and publishing it on your own site.

There are many websites that only contain pieces of other websites or stolen content. Many sites will take articles from other websites and publish them as if they were their own, or will copy entire websites.

This behavior is against the Google guidelines and against copyright laws in the United States and other countries.

Anyone scraping content knows they are doing it, if you do not think you are scraping content, you probably are not. However...

Use caution when you display information from other websites.

Sometimes webmasters have things like "latest news" feeds or twitter feeds, etc in their sidebar.
Is that bad?
In most cases it is fine to display such things in your sidebar, but sometime people start displaying too much information from too many sources and they are in danger of breaking this guideline or the Auto generated content guideline

How much is too much?

Google doesn't state anywhere the exact answer to this question, but I would say it makes good sense that content from other sources should not exceed 10 percent of your webpage. An example would be a typical blog that has a news feed in it's side bar...

scraping content

If you write a very short blog post, the information in your news feeds could rather easily have more content than your blog does...

scraping content

It is worth considering how a search engine would see that page...

It would see that most of the page was unoriginal content, or repeated content like your logo and footer and such. This means that overall the webpage may not be considered a great resource for the subject of the post.

Examples of scraping as defined by Google

  • Sites that copy and republish content from other sites without adding any original content or value
  • Sites that copy content from other sites, modify it slightly (for example, by substituting synonyms or using automated techniques), and republish it
  • Sites that reproduce content feeds from other sites without providing some type of unique organization or benefit to the user

by Patrick Sexton