Understanding Google’s Penguin Updates & How It Affects You
By Fairuze Shahari. Filed in Web 2.0
One of the biggest talking points among SEO consultants has to be Google’s constant barrage of animal updates. Among these, its Penguin updates have arguably had the most impact on the industry. Since it first waddled into view in April 2012, the Penguin updates have completely changed the SEO landscape. It has made link-building, so important to SEO, more complex.
In this post, we are going to look at the various incarnations of the Penguin updates since its introduction, while paying more attention to Penguin 2.1, Google’s latest update that occurred on October 4.
Recommended Reading: How To Recover From Penguin 2.0 Using Only Free Tools
Before we go into it, we probably should briefly mention two other major Google updates that occurred in the last two months to provide you a bit more context.
The first is Google’s move in making all searches encrypted and secure. The main implication here is that you will probably lose most, if not all, of your keyword data in Google Analytics. You will no longer be told which keywords are driving visitors to your site.
The second major update is the Hummingbird update. This is a very different animal from the Penguin update, in that it is a major change to how Google’s algorithm works whereas Penguin is a filter that sits on top of the regular algorithm to serve a specific function. The Hummingbird update is designed to be faster and to better understand the context of a web page.
Understanding the Importance of Links
Before Google came around, most search engines used keyword density as a major factor to determine the relevance of a website and how well it should rank in its search results. So that means that if you wanted your website to rank for the keyword ‘blue widgets’, you just place more instances of the words ‘blue widgets’ on your page.
Needless to say, these search results were not very hard to manipulate.
Google completely changed the rules of the game with the introduction of its PageRank algorithm. PageRank (PR), named after one of Google’s cofounders Larry Page, has its roots in academic notation – the more an academic paper is cited, the more authority it has. Google’s PageRank algorithm ambitiously (attempt) to rate the authoritativeness of every page on the Internet.
The Concept Of Link-Building
The more links you have pointing to your website, the more authoritative it appears to be. Of course, not all links are treated equally.
A link from another authoritative website carries more weight than from a less authoritative one. Imagine the difference between being cited in one of Bertrand Russell’s works and in my college essay. As innovative as it was at the time, it didn’t take long for enterprising SEOs to figure out how to manipulate it.
They found out that if you received a link from a page with PR4, then your own page will have a PR2 (or so). So they started buying links on authoritative websites.
As PageRank grew more sophisticated over the years, so did black hat SEO tactics (those that conduct SEO activities that go against Google’s Guidelines). SEOs now have access to software that can virtually automate the link building process for us. These software can create email addresses, build forum profiles and even solve CAPTCHAs.
In short, SEOs have gotten very advanced in creating spam.
Even though most of these software create links that are not authoritative, they can create thousands or millions of links within a short period of time and with less effort. Taken together, these links will help a website rank better in the SERPs… or it used to. Google’s Penguin algorithm – along with many of its other updates – attempts to stop this.
An Overview of the Penguin Updates
The Penguin algorithm is a filter that sits on top of Google’s regular algorithm and attempts to catch link spam. Link spam refers to the manipulative ways that spammers and black hat SEOs create links to boost their rankings in the SERPs. This is usually done using software, so the links and content it creates are generally useless for human readers – the very definition of spam.
Pre-Penguin, such methods worked well especially for very competitive search queries. Black hat SEOs could propel their websites to the top of the search results, at the expense of those who spent the time creating quality content that searchers might actually find useful. (Do note that the Penguin update only affects links to a website, and does not examine the content of your website. An earlier Google update, the Panda update, penalizes thin and low quality content on websites.)
To continue serving the best results for a particular query, Google has to clean up the spam in its index. or users may start abandoning it in favour of other search engines (DuckDuckGo, anyone?).
The following is a list of Penguin updates over the last year or so.
- Penguin 1.0 (or Penguin 1): April 24, 2012 (affected 3.1% of queries)
- Penguin 1.2 (or Penguin 2): May 26, 2012 (affected around 0.1%)
- Penguin 1.3 (or Penguin 3): Oct. 5, 2012 (affected 0.3%)
- Penguin 2.0 (or Penguin 4): May 22, 2013 (affected 2.3%)
- Penguin 2.1 (or Penguin 5): Oct. 4, 2013 (affected 1%)
Penguin 1.0, looked at the links pointing to a website’s home page, only examining the quality of your links on your home pages, but not your inner pages. It affected approximately 3.1% of queries. When it first came out in 2012, it completely shook the SEO industry and forever changed how SEO companies execute their link building strategies.
Penguin 1.0, and its subsequent iterations, aimed to combat the following spam techniques:
- Over-optimized anchor text
- Links to and from ‘bad neighbourhoods’
- Too many links from irrelevant sites
Over-Optimized Anchor Texts
The text used to link back to your website is the anchor text, which helps the search engine understand your website; the more of these you have, the better you rank. Spammers have been building keyword-rich anchor text back to their sites to help them rank for competitive keywords.
The Penguin update penalizes websites that have unnatural or over-optimized anchor text profiles. If you have a thousand links back to your site, and 70% of those links use the keyword rich anchor texts, then there is a good chance you will get penalized and be completely wiped off Google’s results.
There is no ‘safe’ percentage of keyword rich anchor text to aim for now. It varies from query to query so it is a bit harder for spammers to game the system.
Links To and From Bad Neighbourhoods
A ‘bad neighbourhood’ refers to websites that are of low quality or have inappropriate content such as adult content, pharmaceuticals and gambling sites. Google might deem you to be part of a bad neighbourhood if such sites link back to your site. Also, if there is a link to your website on a page that also has links to such sites, then you might be in trouble.
Too Many Links from Irrelevant Sites
If you are running a catering website, then it makes sense that other catering or related websites link back to you, but not from unrelated niches such as aerospace engineering pages. When this happen, it is an indication that you are engaging in low-quality link building. This was one of the reasons JC Penney found itself penalized by Google.
The important thing to remember about the Penguin penalty is that it is looking to penalize websites with what it deems to be unnatural link profiles.
Penguin 2.0, which was released more than a year later, went much deeper and looked at links pointing to your inner pages as well. This Penguin update was actually the 4th release but dubbed Penguin 2.0 because there were major changes to the way it worked. It still targeted the same kinds of link spam as Penguin 1.0 only going deeper. It affected around 2.3% of English language queries.
On October 4, Google released the latest update of the Penguin algorithm, Penguin 2.1. We believe that Penguin 2.1 made good on Google’s promise to start devaluing upstream links. Even though it is expected to affect less than 1% of queries, I believe it will have a larger impact on SEO than Google states.
In order to appreciate the full impact of Penguin 2.1, you will need to understand 3 things:
- The importance of contextual links in SEO
- Article spinning
- Tiered link building
The Importance Of Contextual Links
As you know by now, links are the lifeblood of SEO. Despite claims to the contrary, they are still among the most important factor in determining your site’s ranking.
Contextual links are links that appear within the copy of an article. These links are more valuable because Google believes that they are harder to manipulate. In order to get a contextual link from an authority website, your own website must contain content that is both relevant and useful. Such links are therefore sought after by SEOs.
Writing quality content is easier said than done. It takes a lot of time, effort and knowledge to create content that people want to link to. One way that SEOs go around this is by writing and spinning their own articles and placing them on article directories or Web 2.0 properties such as WordPress.com and Blogspot.com.
Article spinning refers to the practice of putting articles through a software that uses "spintax". Spintax is a database of synonyms and related words that the software uses to replace the words in the original article, thereby creating unique articles of varying degrees. Publishing duplicate copies of articles from another site may get you penalized; this method serves as a bypass of sorts.
One properly written article can produce tens or even hundreds of unique articles. Just because an article is unique does not mean that it is of high quality or even (human-)readable but it does allow SEOs to place links within the content.
Tiered Link Building
Tiered link building is an advanced SEO tactic that involves the creation of microsites around your main site. They will then place the previously spun articles into these sites.
These websites can either be hosted on its own domain name (example.com), or built on Web 2.0 properties such as WordPress.com or Blogspot.com. They are known as Tier 1 links or Tier 1 websites because links coming from within the content of a website are deemed more trustworthy in Google’s eyes (i.e. less risk of a Penguin penalty).
They will then use automated software to build riskier links to their Tier 1 websites to get them indexed, and provide them with more link juice. These types of links include links from comments and forum profiles and may number in the hundreds of thousands or even millions.
Tiered linking building can be considered a grey or black hat SEO tactic.
From the reports we are gathering online, Google is targeting popular blog networks such as the SAPE network (Russian link network reference by Matt Cutts below) as well as those using tiered link building to rank sites.
Penguin 2.1 – Affect On Link Networks
Penguin 2.1, along with the rest of Google’s updates, have made tiered link building much harder. You need to really be careful to stay under the radar and not be identified as a link network. For example, you need to use separate domain name registration information, different email addresses, different Google Analytics accounts. Even your web hosting is becoming a major SEO consideration.
We are not saying that SEOs will stop using tiered link building or that it no longer works. It’s just that it is much harder to do it right. I won’t be surprised if some enterprising SEO has figured out a way to build tiers that Penguin 2.1 can’t detect by now
Other Types of Link Spam Targeted
Glenn Gabe at G-Squared Interactive has also done an analysis of 26 websites penalized by Penguin 2.1. He found several categories of links that were targeted.
Forum Spam – Forum spam are links that appear within forums and use exact match anchor texts to link back to their main website.
Forum Bio Spam – Some SEOs will build fake profiles on forums and place exact match anchor texts to their websites.
Do-Follow Blogs – A dofollow blog refers to one that doesn’t use any nofollow tags on the links, even in its blog comments section (note – nofollow links are those that add the "rel=nofollow" tags on them and prevents PageRank from being passed). Glenn identifies the resources page as a cause for concern as they act like directories and, if they use exact match anchors, might indicate to Google that the website is only a resource for rich anchor text links.
Blogroll Spam – While John Mu from Google says that these links are not bad in and of themselves, it can cause a problem if they appear on questionable websites.
Spammy Directories – Directories have always been a favourite of SEOs. These seem to show up quite frequently among sites that have been targeted by Penguin 2.1. The reason they are so popular is that they are very easy to get but now, they also come at a heavy price.
Blog Comment Signature Spam – Even though most blog comments are nofollowed, Glenn’s research seem to indicate that Google is still penalizing websites that have exact match anchor texts links from comment signatures. This means that if you are using commenting as a SEO tactic, leave your real name instead of your keyword.
Classified Ads Spam – These seem to be new. They refer to links with exact match anchor text placed on questionable classified ads sites.
Don’t Blame Everything on the Penguin
It might be tempting to blame all your problems on the Penguin update. However, this might simply not be the case. For example, Fingerfoods, an Australian catering company based in Sydney, was affected by Penguin 2.0. However, upon closer examination, they also discovered duplicate content issues as well as malicious links being inserted on random pages of their website.
Even if your website has been hit by the latest Penguin 2.1 update, there are still steps you can take to recover your website. However, this can be a long and arduous process. For some companies, you might want to consider starting afresh on a new domain if this is possible. How did your website fare for Penguin 2.1? Let us know in the comments below.
Editor’s note: This post is written by Fairuze Shahari for Hongkiat.com. When he’s not furiously downing G&T’s, Fairuze Shahari writes for CloudRock.asia, a web design and SEO company with a presence in Singapore and Malaysia. You can find him on G+.