Does using a CDN generate duplicate content?

[fa icon="calendar"] Apr 21, 2016 2:02:21 PM / by Alex López

In this post our CTO Alex López addresses the common questions that come up when using a CDN to host content on web stores regarding SEO and especially duplicate content. 


 

Duplicate content. Two words that webmasters fear. Nightmares of an evil Penguin dancing over their website buried 6 pages deep in Google are not uncommon. 

And they are right to be worried about that. Having duplicate content on your site is something that most likely will negatively affect your rank in search engines, which is bad for you. However, SEO is not an exact science, and it’s very difficult to find absolutes. It’s about the whole picture, and there’s no way to be sure that you’re doing everything right, just because you have no access to the search engines algorithms (which is key to keep their independence).

A bad move could mean penalties for your site, and it’s not easy to recover from them. Our client’s growing SEO knowledge makes them being aware about these things, which raises some questions when it comes to our free product image Content Delivery Network (CDN). I hear  the same question from retailers over and over again: “If other shops use the same product pictures I do from your CDN, wouldn’t that be duplicate content?”

It’s more than fair that our clients have these concerns, so the least I can do is being as updated as possible about the topic, so I can provide them with documented answers. After doing some extensive research, I thought it would be helpful to write this post and share my conclusions with everyone.

 

What exactly is a CDN?

The first step here is to understand what a CDN is and what webpages use them for. A CDN is basically a way to transfer content and information to anywhere in the world using a distributed network of servers may it be an HTML web page or another asset like PDF files, pictures…etc) to users, on a geographically optimised basis. That means, depending on the user’s location, the closest server will deliver the content, so it reaches the user as fast as possible.

The fact that the CDN is composed of several servers is undetected by the user, so the same link can lead to any of the servers. However, CDNs don’t strictly host the content, but rather act as a caching system. So, for any content served by a CDN, there’s an original copy somewhere in the Internet, where the CDN goes to fetch the information both the first time and periodically to get an updated version of it. It means we have at least two copies of anything served by a CDN.

 

CDN, Duplicate Content, and Page Speed

There can be an unlimited number of pages linking a given CDN content (they would all use the same link), which is the case for Google hosted webfonts, jQuery or Bootstrap libraries… is all that duplicated content?

No.  This is not duplicate content. Your next question might be “how do CDNs avoid search engines penalties, if they’re serving a duplicated version of a content?”  They simply inform search engines that is duplicated, and to do that they use the so-called “canonical links”. With the usage of canonical links, CDNs state not only that the content is a duplication, but also provide the original location of it. This canonical link is an http Header that CDNs add to the delivered content, having this aspect:

 

CDN Canonical Link HTTP Header Example Canonical Link

 

As for those websites using the same CDN link? How could it be bad if they’re linking content that is thought to be used by many, and the CDN is properly setup with the canonical links? Consider the example of Google’s hosted webfonts -- can you even imagine if they penalized the use of these basic resources, just because they were sourced from an outside server? In these cases, using a CDN can only benefit your SEO rank, as your page will load faster.

Since they use a cache for the resources and provide the content from the closest server available to the user, the content will always get there faster than if you host it yourself. Pagespeed positively impacts your SEO and your users experience which is not to be taken lightly if you rely on traffic for conversions. In e commerce, your slow page will mean your potential buyers will get fed up, and the Googlebot will index your site slower as well.

 

CDN and Static Content

There’s another general concern when using an external CDN to deliver some static content, in the case it’s a public resource or belongs to another company. The link you use will be pointing to a domain that is not yours, so you might think this could negatively affect your SEO rank. Well, you just need to make yourself these questions:

  • Did you generate that content, so you can claim it as yours?
  • Do you think you’d rank better for the fact of hosting, for example, jQuery in your own server?

Most likely the answer is “No”. But remember, I’m talking about third party content, which is a whole different story. Sourcing images from a CDN for your web store is different from the original content for your brand.

 You should always deliver your own content from your own domain (regardless of the use of a CDN) in order to help your site being listed in the top of SERPs (Search Engine Result Pages) for search queries related to the original content you produce.

 

CDN & SEO: The Takeaways

So let's recap: how does a CDN affect your SEO?

  • Hosting content from a CDN is not duplicate content if properly formatted
  • CDNs help your site to load faster, which search engines love and reward with higher positions in results.
  • Increased page speed can result in better user experience and more conversions
  • Consider which content you host from a CDN - if it is content you own, it is better to deliver it from your own site domain

 

Topics: Ecommerce, SEO, Web development