Flying Twitter Bird Widget By way2blogging.org Informations of Web Creative Agency, Digital Solutions, Display Advertising - Ammit Thapa

Thursday, 22 November 2012

How Can Google Stop the Black Hats?

Google, the black hats are winning. Despite a persistent and partly effective effort to clean up SERPs and allow brand-building tactics to win, the most coveted SERPs are still being owned by the smartest algorithm gamers and black hats in the business – the very guys Panda and Penguin was out to destroy.
And the pressure is growing it seems for Google to do something about it.

The Independent newspaper published this column on the issues with the non-compliance of many well-ranked websites just last week. In it the Office of Fair Trading was asked to take a view on why so many sites had no credit license.

And the issue doesn’t end there. Analysis of other highly competitive verticals, including gambling, reveals similar problems. One site owner even wrote in detail about how advanced much of their “work” is, and how hard it is to escape.

The question is how are they managing to own such hardcore niches, and what can Google do about it?

google-vs-the-black-hats

How are the Black Hats Winning?

In the case of the payday loan niche, the tactic has been well documented by CognitiveSEO, but it appears that the same “link bombing” technique is being used to game other highly profitable sectors, such as key gambling SERPs.

The basic premise is simple. Look at almost any of the sites positioned in the top 10 search results on google.co.uk for “payday loan” and you’ll see that they are “powered” by a blog or link network using owned or hacked sites. In the vast majority of cases those domains are less than a month old – yet they are some of the most valuable positions available in organic search.

The way they do it is actually quite straightforward. After hacking or creating a huge network of sites (all of which are indexed and have equity to pass), they simply create a script that allows them to inject hundreds of exact match anchor links to the domain and site of choice simultaneously.

Often they will test the network first on an unsuspecting domain to gauge its effectiveness before switching it to their preferred domain, explaining the occasional random rankings seen recently.

Those links won’t all be dropped in one go though. They are usually added in over a 7-14 day period. If we look at the acquisition profile for pollyspaydayloans.org.uk (currently position 1 as I write) we can see the domain was registered on October 19.
backlinks-payday-loans
On that same day it started gaining links, the vast majority of which used aggressive anchor text, pointed at the homepage:
anchors-payday-loans
It’s a tactic that will undoubtedly fall prey to Penguin, EMD, or even Sandbox quickly, but one other trick is used to prolong the site’s life, by playing Google’s crawlers.

How Google Crawls the Web

To understand why this works you need a brief understanding of how Google currently crawls the web.
Googlebot, as it is widely known, has always been a text-based web crawler, capturing the web by recording and organizing sites and pages looking at the code that makes up a site.

In recent years the appearance of visual snapshots and an understanding of headless browsers and the theory that Google uses its Chrome browser as part of that crawl have pushed us toward the belief that Google actually “sees” the web page too.

The problem is that the trick being used here suggests that those two crawls aren’t in parallel, or don’t talk to each other at least, to match what the text crawler is seeing to that of the visual crawler.

The Trick

I say this because in the case of several successful black hat sites they appear to be using a clever CSS trick, hiding links in powerful places that pass huge chunks of link equity, while part fooling Googlebot, buying them precious time at the top.

A lot of key links are “placed” in a position so high up on the page that they are “invisible” to the normal user, often sat in the header in pixel position -9999px or similar. That way the user, and visual crawler, doesn’t see the link and so it takes Google much longer to find out how that site is actually ranking.
Here’s what the offending script usually looks like:

hidden-link-payday-loans
As an added bonus, as well as buying time for the site, Google may also be seeing this link as a header link, passing even more link juice across because of it. A 2004 patent application by Google suggested they planned on assigning greater relevance to links in such positions and I wrote a little more about personalized PageRank in “Is Google Afraid of the Big Bad Wolfram?

Those making money out of the sites know this, and they also know that by the time Google’s crawlers piece together the picture from their main “base” crawl, not just their regular visual and “fresh” crawls, that they have already made a chunk of money.

The time comes, of course, when the site will be taken out, either by sandbox or by a Panda or Penguin crawl, but by that time the money is made and time bought to simply line up another site. And the process is then repeated.

How Does Google Fix it?

There is little doubt that Google’s engineers are very aware of this problem. The fix will no doubt become higher priority once they have figured out Penguin and as pressure increases from financial regulators to prevent non-compliant sites from ranking.

In my opinion, they have three basic options available to them, each requiring a different level of resource and investment to make them work.
  1. Manual Policing: This is the most obvious route to take and would be most straightforward. The problem is that Google may then be charged with editing results (something they have been very careful to avoid for obvious reasons). In practice it simply requires a manual search quality rater to monitor key verticals daily, analyzing backlink profiles, domain age, and other key telltale giveaways to prevent those sites from surfacing.
  2. 301 Redirect Check: In some cases black hats are able to remove any penalty and return quickly by redirecting a bad domain to a new one. For a period of time this bypasses the filter. Google could fix this by algorithmically searching for redirects when it crawls a top-level domain and matching back to historical crawls, or any index of penalized domains that may exist.
  3. Look Again at How They Crawl the Web: There is a gap, seemingly, between what each specific crawl “sees”. Much of this would be prevented if Google could find a way to pull the data semantically into parallel so the data is held centrally. That way they could spot hidden links quickly and immediately penalize a site for the tactic. Even if that penalty were simply a trigger to prompt a manual review or to pull the site “to one side” so it could be crawled more in depth faster the problem of hidden links would go away overnight.
If theories that Chrome is Googlebot are true, and I believe it is the case, then the solution cannot be too far out of reach.

The next problem to solve of course is hacked sites, and that one is very, very difficult to solve. Perhaps manual backlink checks are the only solution to this issue?

As one prominent SEO practitioner said to me as we discussed the issue, “Penguin has removed the crap black hat brigade, leaving the very best to get very rich.” For me, that pretty much sums up where we are right now. Only Google has the tools to change it.

0 comments:

Post a Comment