Google, Yahoo and Live all agree to new standard on Canonicalization
Reading through Vanessa’s Fox’s post from search engine land on how all three search engines have come to an agreement on the new canonical tag left me with more questions than answers regarding an issue that is often ignored by your average every day webmaster.
Canonicalization Study
Aaron Wall did a study on how canonicalization can effect the performance and organic traffic conversion of a website. He found that in the example he tested the website had improved organic traffic conversion by 300% – a massive improvement just for eradicating any duplicate pages.
Different Types of Canonicalization
Now we know that Canonicalization can take many forms:
- http://www.example.com
- http://www.example.com/index.cfm
- http://example.com
- http://www.example.com/home.html
Many webmasters or search engineers, if faced with this problem, will have to look through their internal link structure to determine the pages that point to these pages. The next stage would be to update the .htaccess file (providing your running the website on an Apache server) to eliminate the ‘www’ vs ‘non-www’ issue and don’t get me started on https! In reality it can become a nightmare for webmasters but could this new tag change it all?
Below is the code that you will need to add to the <head> tag on your webpage:
Apparently, that is all you need to do and Google will, well near enough, see this as the chosen page when there are many available. I say “near enough” because Google has stated on their blog that the standard is a ‘hint that we honor strongly’ – meaning that it isn’t necessarily something that will be followed.
I updated my Twitter the other day with a thought I have had since the Big Daddy rollout in January 2006. Whose problem is it? Should we have to place a canonical tag within our <head> tag? Do we really have to update all our links that point to a specific file at the root?
Personally, good search engineers will already know the file structure they should use to help search engines index pages and crawl the content. A well defined robots.txt file can restrict the potential use of duplicate URLs. So I really don’t think that this tag will change much at all but at least the search engines are acknowledging that it is a problem.
Author – Andy