FreshMail created edited
Duplicate Content: A Comprehensive Guide
Back to list of articlesDifferent rumors regarding duplicate content are circulating among SEO experts. Some of them say that duplicate content is something that hurts SEO and might bring disaster upon your website. Another group of experts agrees that duplicate content is not good for your website. But they believe it won't be the reason of getting a penalty from Google.
Who is wrong and who is right is a big question. In this guide, we will try to figure out what pitfalls duplicate content hides.
A Dark Side of Duplicate Content
First and foremost, let's find out what "duplicate content" is. Duplicate content is the same content that you've created but it happened to be found on different web-resources or different pages of your website.
Even if it is not your fault, duplicate content affects your SEO in a few ways:
There is a probability that it can outrank original content
Sometimes other websites ask permission to publish your content on their web pages. This practice is known as content syndication. Also, these third-party websites can "steal" your content and republish it on their websites. This shady practice is called "content scraping".
The bad news is that these two practices create duplicate content that can outrank your original type of content on your website.
It distracts crawlers' work
Google uses crawlers to explore new content on your website. Crawlers find new pages with the help of internal links that you add across the website. And when these bots stumble upon the pages with the same content, it reduces the speed and frequency at which other pages could be recrawled.
Backlink dilution
Let's say you have created a page with some piece of content. This page has a certain number of backlinks that refer to it. However, you noticed a few other pages with duplicate content on your site as well. Moreover, these pages have their own number of backlinks.
Likely, Google can identify duplicate pages and group them in one cluster. Furthermore, referring domains are also grouped in one cluster. For instance, there are two duplicate URLs on the website. The first URL has 20 referring domains, the other one - 100. Google chooses one of the URLs from the same cluster and consolidates these domains (20 + 100) to this URL only.
You might be curious if duplicate content can lead you to a Google penalty? Yes, if you don't shy away from using shady SEO techniques like creating duplicate pages, publishing scraped content, etc. Fortunately, if you are a white-hat SEO evangelist, you are safe.
The Reasons Behind Duplicate Content Appearing
If you start exploring the reasons behind duplicate content appearing, you will see that there are lots of them. A few examples for you to take into account:
Paginated comments could be one of the reasons that lead to creating duplicate content
You know that different content management systems and WordPress specifically suggest an option for creating paginated comments. As a result, you will create multiple versions of the same URLs. To avoid this, you should turn off paginated comments option or use Yoast plugin to noindex paginated pages.
Category pages and tags
When you use tags, many content management systems create dedicated tags pages in this case. For instance, you have written a post on graphic design trends. You decided to use the tags "graphic design" and "trends". Thus, you have two tag pages:
https://www.domain.com/tag/graphic-design/
https://www.domain.com/tag/trends/
It may lead to creating two pages with duplicate content (not always though). To avoid this negative practice, you should stay away from using tags at all.
Using print/mobile-friendly URLs
Both print and mobile-friendly URLs have the same content. The only difference between these URLs is their structure:
https://www.domain.com/page
https://www.domain.com/print/page
domain.com/page
m.domain.com/page
To solve this issue of duplicate content appearing, you should canonicalize mobile and print versions of the page to the original URL.
Staging environment as a reason
When you need to test something on your website or just change some code, you create a staging environment first. That would be a duplicate version of your website. It helps you avoid updating your website with the new changes to prevent negative consequences (if any).
However, the staging environment might become an SEO issue when Google starts indexing it. To protect your staging environment you should use VPN access, HTTP authentication, and IP whitelisting.
Search result pages
There is a well-known fact that many websites have search boxes. It will take you to a parameterized search URL.
domain.com?q=search-term
"Typically, web search results don’t add value to users, and since our core goal is to provide the best search results possible, we generally exclude search results from our web search index. (Not all URLs that contain things like “/results” or “/search” are search results, of course.)"
To solve this issue you'll need to use a robots meta tag that would remove search pages from Google's index. Otherwise, block access to the search result page in robots.txt.
You have reviewed only five reasons that cause duplicate content appearing. There are more of them like AMP URLs, trailing/non-trailing slashes, etc. Just remember that these issues are always up-to-date and you must be aware of them.
Now, let's find out how to detect duplicate content on your website.
Is There Duplicate Content on Your Website?
As you have already guessed, duplicate content negatively affects your website. Therefore, you must know how to detect and fix it.
You can use an old-school Google Search Console to see if there are any of these reports shown:
- Duplicate, Google chose different canonical than user
- Duplicate without user-selected canonical
- Duplicate, submitted URL not selected as canonical
And test how Google treats any specific URL from your site with the help of URL Inspector tool.
However, if you want to get a more detailed report on the presence of duplicate content on your website, you can use Site Audit tool from Ahrefs.
The report that you should explore is called "Duplicate content":
You can see two tabs - "near duplicates" and "exact duplicates". Check out if any of them contain duplicated pages by clicking any of these orange clusters:
From this example, you can see that there are 3 near-duplicates have been detected on the website. Your purpose is to find all possible cases of duplicate content and fix them. You might wonder if there is a way to find duplicate content across the web? Yes, you can find resources that scraped content from you and published it on their websites.
The easiest way to detect duplicate content is with the help of Google search. Put a snippet of the original text on your page in quotes into a search bar. If there is any page that scrapped your content, you will see it in Google's results:
It should be stated that most of the time your content can be scraped by some low-authority websites. In this case, you won't lose traffic.
However, if your content has been "stolen" by an authoritative website, you might experience a decrease in traffic. To feel secure about traffic, you can grab a scam URL and analyze it with Site Explorer. You should draw attention to the "organic traffic" report:
If you notice that scraped content brings more traffic to the other website by any chance, you must reach out to a website's owner and ask to remove the content. Also, you can ask to add a canonical link to the original website. In the worst-case scenario, you can submit a Google DMCA takedown request.
To Sum Up
This guide aimed to set the things straight around duplicate content. You got familiar with:
- what duplicate content is
- how serious is the issue for your website
- how to detect and protect your website from it
Make sure that your website is duplicate content free and keep an eye on other more technical SEO issues along the way.
Sergey Aliokhin, Marketing Manager at andcards.