The words “duplicate content penalty” strike fear in the hearts of marketers. People with no SEO experience use this phrase all the time. Most have never read Google’s guidelines on duplicate content. They just somehow assume that if something appears twice online, asteroids and locusts must be close behind.
This article is long overdue. Let’s bust some duplicate content myths.
Note: This article is about content and publishing, not technical SEO issues such as URL structure.
Myth #1: Non-Original Content on Your Site Will Hurt Your Rankings across Your Domain
I have never seen any evidence that non-original content hurts a site’s ranking, except for one truly extreme case. Here’s what happened:
The day a new website went live, a very lazy PR firm copied the home page text and pasted it into a press release. They put it out on the wire services, immediately creating hundreds of versions of the home page content all over the web. Alarms went off at Google and the domain was manually blacklisted by a cranky Googler.
It was ugly. Since we were the web development company, we got blamed. We filed a reconsideration request and eventually the domain was re-indexed.
So what was the problem?
- Volume: There were hundreds of instances of the same text
- Timing: All the content appeared at the same time
- Context: It was the homepage copy on a brand new domain
It’s easy to imagine how this got flagged as spam.
But this isn’t what people are talking about when they invoke the phrase “duplicate content.” They’re usually talking about 1,000 words on one page of a well-established site. It takes more than this to make red lights blink at Google.
Many sites, including some of the most popular blogs on the internet, frequently repost articles that first appeared somewhere else. They don’t expect this content to rank, but they also know it won’t hurt the credibility of their domain.
Myth #2: Scrapers Will Hurt Your Site
I know a blogger who carefully watches Google Webmaster Tools. When a scraper site copies one of his posts, he quickly disavows any links to his site. Clearly, he hasn’t read Google’s Duplicate Content Guidelines or the Guidelines for Disavows.
Ever seen the analytics for a big blog? Some sites get scraped ten times before breakfast. I’ve seen it in their trackback reports. Do you think they have a full-time team watching GWT and disavowing links all day? No. They don’t pay any attention to scrapers. They don’t fear duplicate content.
Scrapers don’t help or hurt you. Do you think that a little blog in Asia with no original writing and no visitors confuses Google? No. It just isn’t relevant.
Personally, I don’t mind scrapers one bit. They usually take the article verbatim, links and all. The fact that they take the links is a good reason to pay attention to internal linking. The links on the scraped version pass little or no authority, but you may get the occasional referral visit.
Tip: Report Scrapers that Outrank Your Site
On the (very) rare occasion that Google does get confused and the copied version of your content is outranking your original, Google wants to know about it. Here’s the fix. Tell them using the Scraper Report Tool.
Tip: Digitally Sign Your Content with Google Authorship
Getting your picture to appear in search results isn’t the only reason to use Google Authorship. It’s a way of signing your name to a piece of content, forever associating you as the author with the content.
With Authorship, each piece of content is connected to one and only one author and their corresponding “contributor to” blogs, no matter how many times it gets scraped.
Tip: Take Harsh Action against Actual Plagiarists
There is a big difference between scraped content and copyright infringement. Sometimes, a company will copy your content (or even your entire site) and claim the credit of creation.
Plagiarism is the practice of someone else taking your work and passing it off as their own. Scrapers aren’t doing this. But others will, signing their name to your work. It’s illegal, and it’s why you have a copyright symbol in your footer.
If it happens to you, you’ll be thinking about lawyers, not search engines.
There are several levels of appropriate response. Here’s a true story of a complete website ripoff and step-by-step instructions on what actions to take.
Myth #3: Republishing Your Guest Posts on Your Own Site Will Hurt Your Site
I do a lot of guest blogging. It’s unlikely that my usual audience sees all these guest posts, so it’s tempting to republish these guest posts on my own blog.
As a general rule, I prefer that the content on my own site be strictly original. But this comes from a desire to add value, not from the fear of a penalty.
Ever written for a big blog? I’ve guest posted on some big sites. Some actually encourage you to republish the post on your own site after a few weeks go by. They know that Google isn’t confused. In some cases, they may ask you to add a little HTML tag to the post…
Tip: Use rel=“canonical” Tag
Canonical is really just a fancy (almost biblical) word that means “official version.” If you ever republish an article that first appeared elsewhere, you can use the canonical tag to tell search engines where the original version appeared. It looks like this:
That’s it! Just add the tag and republish fearlessly.
Tip: Write the “Evil Twin”
If the original was a “how to” post, hold it up to a mirror and write the “how not to” post. Base it on the same concept and research, but use different examples and add more value. This “evil twin” post will be similar, but still original.
- Example Guest Post: Common Website Navigation Mistakes
- Example Post on My Site: Website Navigation Best Practices
Not only will you avoid a penalty, but you may get an SEO benefit. Both of these posts rank on page one for “website navigation.”
Calm down, People.
In my view, we’re living through a massive overreaction. For some, it’s a near panic. So, let’s take a deep breath and consider the following…
Googlebot visits most sites every day. If it finds a copied version of something a week later on another site, it knows where the original appeared. Googlebot doesn’t get angry and penalize. It moves on. That’s pretty much all you need to know.
Remember, Google has 2,000 math PhDs on staff. They build self-driving cars and computerized glasses. They are really, really good. Do you think they’ll ding a domain because they found a page of unoriginal text?
A huge percentage of the internet is duplicate content. Google knows this. They’ve been separating originals from copies since 1997, long before the phrase “duplicate content” became a buzzword in 2005.
Disagree? Got Any Conflicting Evidence?
When I talk to SEOs about duplicate content, I often ask if they have first-hand experience. Eventually, I met someone who did. As an experiment, he built a site and republished posts from everywhere, verbatim, and gradually some of them began to rank. Then along came Panda and his rank dropped.
Was this a penalty? Or did the site just drop into oblivion where it belongs? There’s a difference between a penalty (like the blacklisting mentioned above) and a correction that restores the proper order of things.
If anyone out there has actual examples or real evidence of penalties related to duplicate content, I’d love to hear ’em.
About the Author: Andy Crestodina is the Strategic Director of Orbit Media, a web design company in Chicago. You can find Andy on Google+ and Twitter.