Duplicate content and WordPress – a case study

SEO, Techy Stuff!, Website auditing 4 Comments | Date Published: 11 January 2013
Share on Twitter

WordPress duplicate content alert signA client recently came to me saying the search traffic to her WordPress blog had suddenly stopped and she didn’t know why. A quick review seemed to indicate that Google had possibly penalised her as her blog posts were no longer indexed.

Ouch – not a good situation.

A quick review showed up a whole heap of duplicate content issues which could be causing the problem. This blog post explains why and provides suggestions to fix.

The situation

My client’s site is based on the wordpress.org platform. WordPress is a fantastic and versatile platform which can be great for the search engines. But it can also be notorious for creating duplicate content if you are not aware of the pitfalls.

Google has been busy these last couple of years with amending its algorithm to weed out spammy sites. Part of this process has been to identify sites that ‘lack quality’. The types of indicators that Google say they look at include too many adverts above the fold (ie in the immediately viewable part of the screen), links pointing in from other low quality sites, too many ‘thin’ pages without enough content or too many pages that are duplicated, or near duplicated.

In my experience this has so far primarily affected the bigger content heavy sites, and smaller sites have not been so badly hit. However, it seems that smaller sites are now getting caught up in the tangle and can be getting lost due to duplication issues, even if there is great content on the site.

What we found – where is the duplicate content?

There were 5 key areas where we discovered potential problems. Covering all 5 would make for an extremely long blog post, and your eyes will probably glaze over way before you get to the end!

So, I’ll cover the first and most prolific issue here, and the rest in subsequent posts. That will give you a chance to go see whether this issue is affecting you, and fix it before looking at other potential issues!

[ED update: Here is a link to the second post in the series]

1. Duplicates due to tag, category, date archive and author directories

First we ran a Google search using the search term site:www.clientsURL.com (for your site just replace www.clientsURL.com with the web address for your site).

This returned 304 pages and is an approximate indicator for how many pages Google has indexed for the site. The site only has 54 posts so this was the first red flag!

Looking at the pages that Google returned, and a pattern seemed to emerge – a lot were in one of the following 3 formats:

  • www.clientsURL.com/blog/tag/blog-post-name
  • www,clientsURL.com/blog/category/blog-post-name
  • www.clientsURL.com/blog/2012/03/blog-post-name

This is a very typical situation with WordPress – although in reality there is only one version of each post, they can be found under every category and tag that they are associated with, as well as the archive date of when the post was published. As they can be found under multiple website addresses (URLs), Google will view them as duplicate pages.

Another common occurrence is when there are multiple authors on a site – a post can also then appear under an author sub directory.

Fixing in WordPress moving forward

The easiest way to deal with this is via the Yoast or All in One SEO plugins. If you don’t have an SEO plugin I suggest you install Yoast’s. Both these plugins have checkbox options to noindex tag, category, and archive sub directories.

If you have the Yoast plugin go to SEO > Titles & Metas.

  • On the General tab check the ‘Noindex subpages of archives’ box.
  • On the  Taxonomies tab, check the Meta Robots ‘noindex, follow’ check boxes under Categories and Tags
  • On the Other tab, check the Meta Robots ‘noindex, follow’ check boxes under Author archives and date archives

For the All in One plugin, go to Settings > All in One and check the checkboxes alongside ‘Use noindex for archives’, ‘categories’ and ‘tags’.

It may take a while for Google to catch up, but you should see these duplications drop out of the index overtime. However …

‘Not selected’ in Google Webmaster Tools

Looking at Google’s webmaster tools for the site, showed that there had been a sudden leap in the number of pages that Google indicate that they are ignoring.

Webmaster Tools chart showing volume of URLs ignored for the index

So as well as all the tag, category pages etc being indexed, something else was going on.

More investigation was clearly necessary. We shall cover this in subsequent web posts.

Be sure to go check whether your category, tag and date archive directories are being indexed. If they are, go through the steps outlined above and let me know in the comments how you get on.

If you enjoyed this article and want to be sure you know when the rest in the series are published, sign up to my RSS feed or follow me on twitter @wendychamier.

If you would like help in solving a riddle for your site, check out the website audit services that we can offer. I love riddles and I’d love to help identify and sort out any problems you may have with your site.

By Wendy Chamier

Share on Twitter

4 Responses to “Duplicate content and WordPress – a case study”

  1. Clare Says:

    Great article Wendy. I think everyone running a blog should pay heed. I look forward to reading the next installment.

  2. Lindsey Collumbell Says:

    Another excellent post with your trademark great advice … that will be why I refer my clients to you when they need in-depth advice and assistance. Keep these posts coming Wendy – they’re really useful.

  3. Wendy Says:

    Thanks Clare – Part II will be coming this week!

  4. Wendy Says:

    Thanks Lindsey. Glad you found the post useful! And thanks for the referrals :-)

Leave a Reply

*


− 8 = one