Showing posts with label Penguin. Show all posts
Showing posts with label Penguin. Show all posts

Thursday, 22 November 2012

Google May Discount Infographic Links in Future ! Ridiculous

Last week in an interview with Eric Enge, Matt Cutts mentioned that Google might discount Infographic links in future. To quote Matt, “if at some point in the future we did not start to discount these infographic-type links to a degree. The link is often embedded in the infographic in a way that people don’t realize, vs. a true endorsement of your site.”

As a justification for this probable move he cited a few reasons :

“What concerns me is the types of things that people are doing with them. They get far off topic, or the fact checking is really poor. The infographic may be neat, but if the information it’s based on is simply wrong, then it’s misleading people.”

He also mentioned, “people don’t always realize what they are linking to when they reprint these infographics. Often the link goes to a completely unrelated site, and one that they don’t mean to endorse.”

So to summarize, three reasons why Google might be discounting infographics links in future are :
Infographics could be far off topic in relation to what the business is dealing with
The fact represented in the infographics is really poor - resulting in misleading info
People don’t realize what they are linking to when they republish an inforgraphics

And for these Google might discount all infographic links. Really ?? Are you kidding me? It is completely ridiculous and it seems Google is increasingly getting the God complex.
Google has always mentioned about creating extraordinary content that people would love to link to and now when people have identified a definitve form of such content they want to discount those links.

Let’s take a more detailed look at the points mentioned above..
  1. Off topic Infographics : Yes, this could definitely be  a valid reason to discount the links. If we are dealing with SEO and publish an infographics on the most influential political leaders of the world, there is every reason and justification for Google to devalue any link that the site gets through it and they also have the capability to judge this contextual relevancy of the graphics to the overall theme of the website.
  2. Poor Research Data : How is Google going to determine the quality of the research data ? In an infographics all research data are graphically represented and while Google might have really advanced their capability to read and understand image, I don’t believe it is anywhere close to interpreting graphically represented research data. The only option is manual verification - that is not a scalable and feasible process given the volume of infographics published and also, two different reputable sources could have two different value for same data point, what if Google looks at a source other than the one you used for infographics ? Does that make your depreciate the data quality of your infographics ?

  3. People don’t realize what they are linking to why republishing infographics :  Really ? Webmaster’s and content editors are that foolish ? Someone who maintains a good quality website ( because that is already a prerequisite for the link to be valuable) would definitely be wise enough to know and check what they are linking to. For a second, let’s accept that webmasters are foolish enough to link to a website without checking it. In such case whose responsibility is that ? When I am linking to a website from my site in whatever form, it is my responsibility to check what I am linking to, if I am linking to something wrong / irrelevant / unethical that should go against me and not the site I am linking to. So in this case, if at all Google has to take any action they should take it against the re-publishing website and not the site that created the infographics.
I have worked on several infographics for different projects and website and know for sure an infographics with this data or poor graphics would never succeed ( yes, we tried that too and learnt from the mistake).

How infographics get links ?

Let’s look at how infographics get their links. Once you create an infographic, the first thing that you do it is publish on the social media channels and as it starts getting shared, it catches the attention of bloggers who start republishing. Now the prerequisite here is the infographic getting “shared” and that only happens when it is of certain quality and actually provides some interesting/ useful information for the readers. So if the content isn’t of good quality it wont get shared, neither would it get substantial number of links. And when people have endorsed the infographics through social sharing ( and consequentially by linking) - why does Google have a problem with it  ?

Of course there are other ways to get links for infographics, like mailing to bloggers directly, doing press release etc but even there anyone who republishes an infographics would definitely spend a couple of moments to evaluate the quality of it and when Google’s discounting these links seems like a sheer disrespect towards people’s judgement. This is an unbelievable arrogance resulting from Google’s monopoly in the search space.

Is Google Socially Blind ?

Search engines today are increasingly relying on social data and in this case social data could be one of the key indicators of the quality of the infographics. Should we / Do we have to believe that Google doesn’t have access or capability to judge the social response to a page ? and when they see a major positive reaction, isn’t that enough to tell them about the quality of the content ?

The Embed Code Issue 

Google can definitely have some problem with the embed codes that are provided with infographics, as that proactively suggests the link and poses an opportunity for the publishing site to get the same anchor text link. However, with Penguin in place it should not be a tough job for Google to work out the anchor text bit. But if there is no embed code provided there will be a ton of people copying and republishing infographics without crediting the original source - what happens then ? We have seen Google crediting authority websites when they republish some great content that was originally created by some lesser known sites and while most reputed bloggers do provide necessary citation to source, I have encountered two cases where two extremely reputed authority sites have published our infographics without any credits ( they did add a link to us, only after we requested them to mention us as the source). For one of those infograpics Google still ranks that authority site above our site even though the original site has received enough links and social mentions. In this situation, can a business investing in creating a good infographics really afford not to use an embed code ?

I look at providing embed code as an initiative to make the content more linkable. If you are creating a good content that you know people are going to love and link to, what is wrong with making it a little easier for them ?

I can understand if they decide to discount links coming from infographics directories as any one can get a link from those but saying that they might discount links that an infographic receives sounds ridiculous. This is as good as saying that we may devalue the organic links that you have earned by creating some awesome content that loads of people loved, linked to and shared.

This is one of those frustrating moments when I really wish that we had a strong competitor from Google that would make them think twice before contemplating such ridiculous steps.

How Can Google Stop the Black Hats?

Google, the black hats are winning. Despite a persistent and partly effective effort to clean up SERPs and allow brand-building tactics to win, the most coveted SERPs are still being owned by the smartest algorithm gamers and black hats in the business – the very guys Panda and Penguin was out to destroy.
And the pressure is growing it seems for Google to do something about it.

The Independent newspaper published this column on the issues with the non-compliance of many well-ranked websites just last week. In it the Office of Fair Trading was asked to take a view on why so many sites had no credit license.

And the issue doesn’t end there. Analysis of other highly competitive verticals, including gambling, reveals similar problems. One site owner even wrote in detail about how advanced much of their “work” is, and how hard it is to escape.

The question is how are they managing to own such hardcore niches, and what can Google do about it?

google-vs-the-black-hats

How are the Black Hats Winning?

In the case of the payday loan niche, the tactic has been well documented by CognitiveSEO, but it appears that the same “link bombing” technique is being used to game other highly profitable sectors, such as key gambling SERPs.

The basic premise is simple. Look at almost any of the sites positioned in the top 10 search results on google.co.uk for “payday loan” and you’ll see that they are “powered” by a blog or link network using owned or hacked sites. In the vast majority of cases those domains are less than a month old – yet they are some of the most valuable positions available in organic search.

The way they do it is actually quite straightforward. After hacking or creating a huge network of sites (all of which are indexed and have equity to pass), they simply create a script that allows them to inject hundreds of exact match anchor links to the domain and site of choice simultaneously.

Often they will test the network first on an unsuspecting domain to gauge its effectiveness before switching it to their preferred domain, explaining the occasional random rankings seen recently.

Those links won’t all be dropped in one go though. They are usually added in over a 7-14 day period. If we look at the acquisition profile for pollyspaydayloans.org.uk (currently position 1 as I write) we can see the domain was registered on October 19.
backlinks-payday-loans
On that same day it started gaining links, the vast majority of which used aggressive anchor text, pointed at the homepage:
anchors-payday-loans
It’s a tactic that will undoubtedly fall prey to Penguin, EMD, or even Sandbox quickly, but one other trick is used to prolong the site’s life, by playing Google’s crawlers.

How Google Crawls the Web

To understand why this works you need a brief understanding of how Google currently crawls the web.
Googlebot, as it is widely known, has always been a text-based web crawler, capturing the web by recording and organizing sites and pages looking at the code that makes up a site.

In recent years the appearance of visual snapshots and an understanding of headless browsers and the theory that Google uses its Chrome browser as part of that crawl have pushed us toward the belief that Google actually “sees” the web page too.

The problem is that the trick being used here suggests that those two crawls aren’t in parallel, or don’t talk to each other at least, to match what the text crawler is seeing to that of the visual crawler.

The Trick

I say this because in the case of several successful black hat sites they appear to be using a clever CSS trick, hiding links in powerful places that pass huge chunks of link equity, while part fooling Googlebot, buying them precious time at the top.

A lot of key links are “placed” in a position so high up on the page that they are “invisible” to the normal user, often sat in the header in pixel position -9999px or similar. That way the user, and visual crawler, doesn’t see the link and so it takes Google much longer to find out how that site is actually ranking.
Here’s what the offending script usually looks like:

hidden-link-payday-loans
As an added bonus, as well as buying time for the site, Google may also be seeing this link as a header link, passing even more link juice across because of it. A 2004 patent application by Google suggested they planned on assigning greater relevance to links in such positions and I wrote a little more about personalized PageRank in “Is Google Afraid of the Big Bad Wolfram?

Those making money out of the sites know this, and they also know that by the time Google’s crawlers piece together the picture from their main “base” crawl, not just their regular visual and “fresh” crawls, that they have already made a chunk of money.

The time comes, of course, when the site will be taken out, either by sandbox or by a Panda or Penguin crawl, but by that time the money is made and time bought to simply line up another site. And the process is then repeated.

How Does Google Fix it?

There is little doubt that Google’s engineers are very aware of this problem. The fix will no doubt become higher priority once they have figured out Penguin and as pressure increases from financial regulators to prevent non-compliant sites from ranking.

In my opinion, they have three basic options available to them, each requiring a different level of resource and investment to make them work.
  1. Manual Policing: This is the most obvious route to take and would be most straightforward. The problem is that Google may then be charged with editing results (something they have been very careful to avoid for obvious reasons). In practice it simply requires a manual search quality rater to monitor key verticals daily, analyzing backlink profiles, domain age, and other key telltale giveaways to prevent those sites from surfacing.
  2. 301 Redirect Check: In some cases black hats are able to remove any penalty and return quickly by redirecting a bad domain to a new one. For a period of time this bypasses the filter. Google could fix this by algorithmically searching for redirects when it crawls a top-level domain and matching back to historical crawls, or any index of penalized domains that may exist.
  3. Look Again at How They Crawl the Web: There is a gap, seemingly, between what each specific crawl “sees”. Much of this would be prevented if Google could find a way to pull the data semantically into parallel so the data is held centrally. That way they could spot hidden links quickly and immediately penalize a site for the tactic. Even if that penalty were simply a trigger to prompt a manual review or to pull the site “to one side” so it could be crawled more in depth faster the problem of hidden links would go away overnight.
If theories that Chrome is Googlebot are true, and I believe it is the case, then the solution cannot be too far out of reach.

The next problem to solve of course is hacked sites, and that one is very, very difficult to solve. Perhaps manual backlink checks are the only solution to this issue?

As one prominent SEO practitioner said to me as we discussed the issue, “Penguin has removed the crap black hat brigade, leaving the very best to get very rich.” For me, that pretty much sums up where we are right now. Only Google has the tools to change it.