Rand Fishkin is the CEO & Co-Founder of the web’s most popular SEO Software provider; SEOmoz. Together with Eric Enge, he co-authored the Art of SEO from O’Reilly Media and was named on the 40 Under 40 List and 30 Best Young Tech Entrepreneurs Under 30. Rand has been written about in The Seattle Times, Newsweek and PC World among others and keynoted conferences on search around the world. He’s particularly passionate about the SEOmoz blog, read by tens of thousands of search professionals each day. In his minuscule spare time, Rand enjoys the company of his amazing wife, whose serendipitous travel blog chronicles their journeys.
Interview Transcript
Eric Enge: The discussion topic for the day is Panda signals. What was Google trying to do with Panda?
Google is using the aggregated opinions of their quality raters, in combination with machine learning algorithms, to filter and reorder the results for a better user experience.
Rand Fishkin: My opinion is that Panda is the start of something relatively new for Google. They are using the aggregated opinions of their quality raters, in combination with machine learning algorithms, to filter and reorder the results for a better user experience.
That’s a mouthful, but essentially what it means is that Google has this huge cadre of human workers who search all the time and rate what they find. What they want to do is find ways to show things they like and suppress things they don’t like. Google has previously been reticent to do this across the board and use it as a primary signal, and have historically used this data only as a quality control check on the algorithms they write.
Now, I think they are being more aggressive and trying this out on a certain type of site. Panda impacts more than 11% of queries which is a robust change; although, I don’t think it is the largest change we have seen from Google’s algorithm over the last few years. I like the direction they are going in, but I get the sense they don’t know what is in the algorithm at this point.
Machine Learning Algorithms
Eric Enge: Because they have implemented a machine learning based algorithm.
Rand Fishkin: Yes, that’s a problem anytime you implement a machine learning technique. Machine learning takes a bunch of predictive metrics and uses a neural network, or some other machine learning model, to try and come up with the best fit to the desired result. I think one reason machine learning is slowly making its way into Google’s algorithmic updates is they are uncomfortable with not knowing what is in the algorithm.
It’s not as if you target specific sites like EzineArticles and eHow, but sites that the quality raters identified as fitting into the eHow profile. The challenge is to find metrics that will push those sites down, but keep deserving sites high.
The machine learning algorithm will search across all data points it can, but it may use weird derivatives, for example, the number of times the page uses the letter x may have a super high correlation to whether people didn’t like its quality so the machine learning algorithm pushes down pages that use the letter x. That’s not an actual example but you get my point.
You can no longer dig into the code and figure out which engineer coded into the algorithm that the letter x in pages means lower rankings. An engineer did not do it, the machine learning system did it. So, you know they have to be careful with how they implement it.
I got the sense from the Wired interview and other writings that even Amit and Matt were a little nervous about how this works. I think they recognized that they hit some sites unintentionally. The most frustrating part for them is that they don’t know why the algorithm hit sites they didn’t want to.
Eric Enge: You have to go back and try tuning some of the ratings of various parameters and see how it comes out.
Rand Fishkin: Yes, but it is so much harder to tune a parameter when you don’t know what the parameters are. This means you have to retrain the model and test it rather than just change a particular parameter.
Eric Enge: Yes, the overhead of going through the whole process is much greater.
Rand Fishkin: You can’t simply say “we would like to boost back up these five sites.” The reality is unless you rewrite the whole system you can’t individually boost up one site’s ranking. What do you do? Turn up a “this site is good” knob? I don’t think they have one of those.
Eric Enge: No, I bet they don’t. You talked about their quality raters, but they also have other signals. For example, the Chrome Blocklist extension.
Rand Fishkin: Yes, originally they did not use the Blocklist in the Panda update. It didn’t make the original cut, which was in late February, but they later announced they were using the data from the Blocklist, so it got into the later releases. It’s not just the Blocklist, Google now has a ton of user and usage data, and a much better representative sample than they ever had before.
Eric Enge: We also now have in the search results themselves a direct way to block a result, so it looks like Google is expanding upon this initiative.
Social Data and Google
Eric Enge: Let’s dig a bit deeper into the social initiatives at Google.
Rand Fishkin: The newest initiative is the +1 buttons.
Eric Enge: The +1 is the beginning of their counter-attack on Facebook.
Rand Fishkin: I disagree with that characterization because I don’t think it is necessarily a direct attack on Facebook. I think +1 is a way Google hopes to learn more about what people enjoy, support and want to share, and learn what people want to see more of in their search results in addition to something like the Block List.
Eric Enge: Didn’t Larry Page make the bonus of every Google employee dependent on the success of that feature?
Rand Fishkin: Dependent on Google’s social success not specifically on only that feature, but yes, he did.
I think Google is clearly saying we need lots of social data, but I am hesitant to say +1 is about competing with Facebook. If you want to compete with Facebook you need a social network, you need something where people share photos of their kids and connect with each other and that kind of thing. Google is going a different route which is “tell us what you like in your search results and we will show you more of that and less of what you don’t like.”
Eric Enge: Yes, I agree that the +1 by itself isn’t enough to counter Facebook, but it seems to be a piece of a larger puzzle that is emerging.
Rand Fishkin: The +1 button is definitely a competitor to using the Like and Share data which Google almost certainly is doing. They publicly said they use data from Facebook, but they weren’t specific about what data. Then our correlation data came out showing the Share as being massively well-correlated for an individual metric, even controlling for links. I think that strongly suggests Google is getting value out of leveraging that Facebook data so they want to protect that source and make sure Facebook doesn’t cut them off from it.
Eric Enge: As a minor aside, I saw speculation that since the Like button does largely what the Share button does, they may be planning to retire Share and go with the Like.
Rand Fishkin: I heard rumors of that as well. I would be surprised if they did that because Like and Share do very different things. Even though Facebook’s message is that Like is very similar to Share, it is not.
I am sure you noticed inside Facebook they offer both Top News and Most Recent. By default, they show you the Top News rather than the Most Recent so you rarely see things that your friends Like on their Wall, unless many people Liked it, but you will always see everything that your friends Share. (Note: if you enter in a description when you Like something it behaves exactly the same as a Share, but Rand is assuming here that many users will not do that)
The Share button is more like a link behavior than a Like button. Share is a much more robust action. As a Facebook administrator, you can look at Facebook Insights and see the percentage of Share impressions you get from your network and a Share always carries more value. Another big difference, from Google’s point of view, is that many people click the Like button all over the place, but the Share button is more intentional, not just I Like this, but I want everybody else to see this. It is much more like a link behavior than the Like button is.
Eric Enge: Yes, so it will be interesting to see what happens with that.
Rand Fishkin: You have the new Send button too.
Eric Enge: Soon you will have no room for your content because your page will be covered with Facebook buttons.
Rand Fishkin: Please no.
Eric Enge: Let us talk about other signals Google could be using at this point.
Rand Fishkin: Google and Bing both have data deals with Facebook. The Facebook growth team, which is their marketing team, was at SEOmoz a couple of weeks ago and we were talking in-depth. There was NDA stuff I can’t go into, but one thing they noted, that is public, is that Google gets considerably less data about the social graph from Facebook than Bing does. However, they get more than what is just in the open graph API.
When you talk about signals, I think Google is able to see deeper into the social graph via Facebook data than any of us can test on our own. Many people have a concern around abuse. For example, what if I get ten thousand random people to go Like my page. There are probably very good signals about the authenticity of social sharing that Google is able to get through Facebook, and Bing maybe even more so.
User Interaction With Your Site
Eric Enge: What about other kinds of signals, like user interaction with the search results themselves?
Rand Fishkin: I thought Bill Slawski from SEOByTheSea had a great post about a Bing patent application. It looked at all sorts of user and usage data that I think Google is thinking about leveraging in some way.
Things like time on the site and whether people print the page. The pages printed tend to indicate the site must be high quality or that people have high interaction with that page. Also, if you scroll down, back up and around a lot that indicates there is some level of positive interaction on the page.
In terms of search activity, do you come back to the search results and perform different searches, do you come back to the search results and click on other results, or do you perform the same search at a different engine, which of course Bing and Google will both know through their clickstream data? These types of aggregate tracking metrics across the board will help the search engines determine if they are doing a good job and if users are satisfied with what they are providing, and if individual results and individual sites are delivering the goods.
Eric Enge: The more mundane things, such as bounce rates, time on site, page views per visitor and number of repeat visitors, are good basic metrics as well. All these can be fed into the kind of inputs that go into a machine learning engine.
Rand Fishkin: Yes, absolutely. You can feed all these metrics into a neural network and tell it what you would like it to produce and get nice metrics back. Create an amalgamation of all the data pieces and then use them as a metric in your algorithm.
Our data scientist at SEOmoz, Matt Peters, thinks, based on our correlation data, the main Google algorithm doesn’t have that many metrics. Matt thinks two hundred signals might actually be on the high end of what they are using.
I think they find a few signals in each sector they like and then concentrate on making those better from the services side. Rather than taking fifty million metrics about a link, build one great metric about links that takes into account many things.
I thought that was kind of fascinating to think about. Maybe that is good from a computation and processing standpoint, as well as the speed of the results and data usage, in terms of what the search engine has to do when they calculate ranking.
Eric Enge: Maybe they use other sets of metrics for quality verification rather than direct signals.
Rand Fishkin: That could be. That way you don’t have to calculate it across every search performed.
Eric Enge: Right. You apply the greater set of metrics on your test set of some number of thousands, or tens of thousands, of sites or whatever you want it to be. Then, you do your curve fitting with the machine learning algorithm against that. It is fascinating to think about that whole process, about how they do these things.
Advertising and Panda
Eric Enge: Can you talk about advertising as a signal in Panda?
Rand Fishkin: The advertising thing is looming large. In our study of Google results recently, we looked at ad placement and ad size, number of ads on a page and total pixel coverage, and these had a prominent and obviously non-zero negative correlation with ranking. So, Google is essentially saying, “If you have big blocks of Google AdSense on your page, on average your rankings in the SERPs will be lower”. I think it is good that Google is not reinforcing their own feature. I was presenting this in Australia and a Google search quality engineer in the room was quite happy to see that data reported.
It is a positive thing from a perception standpoint for Google, and it is also an indication that people who are extremely aggressive with advertising likely took a little bit of bath in this update.
Eric Enge: Yes, I tweeted an article earlier today that described an interaction between a marketer and Google about getting his AdWords account reinstated as he had AdSense campaigns on his sites that did not meet Google’s criteria.
The detail that Google gave him on how they do their rating was astonishing. Basically, they did specific comparisons of the percentage of content versus the percentage of ads above the fold on a 1024×768 screen, as well as other metrics.
The guidance was that the percentage of ads could not be greater than the percentage of content. Then they detailed what they were defining as content and it was fascinating reading. Of course, this was about his Adsense account getting banned, but you can expect that they would be doing a very similar thing in organic SEO evaluation. Another thing I heard they look at is the click-through rate on the ads.
Rand Fishkin: If it is high, that’s an indication of two things: you either have high-quality relevant advertising or, on the other side, you do not have worthwhile content on your page and all people can do is click your ads. I think they look for manipulation on the advertising, for example, popovers or ads that blend in and fool you into thinking they are part of the links in the content, but they will also look to see if this is a good AdSense publisher.
Eric Enge: So a high CTR could be a good signal or a bad signal.
Rand Fishkin: Yes.
Eric Enge: That’s clearly the case. But for most publishers, they have to worry if they have AdSense well below the fold in their page and if it is getting two-tenths of a percent click-through rate, is that a good thing?
Rand Fishkin: It depends.
Eric Enge: I saw another post, again this is AdSense related but it gives signals as to how Google thinks about quality, and this post said when your click-through rate on a page is low, your payout can vary dramatically. For example, imagine that you have a one percent click-through rate across four ads, and one of them is at two percent, and the rest are at 0.2%. It may be that you can remove three of the ads and lift your click-through rate to two percent, you might actually make more money even though the ad with a two percent click-through rate still gets the same number of clicks. The other three ads were simply dragging down your payout per click.
What Makes a Good Quality Page?
Eric Enge: Can you talk about the makeup of a high-quality page?
Rand Fishkin: One component is content block formatting. This is where they look at the usability and the user experience of how content is formatted on a page. For example, advertising or other things might be interrupting the content. Having that content be easily consumable in a friendly way seems to be a positive signal. I don’t know if that is a feature of the machine learning and the quality rater stuff or something they independently grade, but it definitely seems to be a part of it.
Eric Enge: You could imagine that human quality raters would respond well to that. It may not explicitly be part of the algorithm but it may fall out of the algorithm.
Rand Fishkin: That’s what is so interesting now and why SEO takes on a broader focus if Google is going to continue going in this direction. Essentially, everything that makes your page good for humans will help it to rank better and that is really exciting.
Eric Enge: It is exciting. Another thing is that affiliate links aren’t inherently bad, but pages that have a large percentage of links that are affiliate links seem to have a negative correlation.
Rand Fishkin: Unfortunately, that wasn’t something we could measure, but I know there has been a lot of circumstantial evidence around it. In my opinion, there is a high correlation between affiliate websites which often have generic, similar, low-quality, low-value ad content, and lower rankings in this update.
Eric Enge: There are definitely some issues for many people in those businesses. At one point Google said something about a weak set of pages on your site potentially dragging down the whole site. I think it was Amit that said that.
Rand Fishkin: We have seen that a lot. In fact, a couple of Seattle startups were on a thread in the SEOmoz Q&Arecently and talked about how they lost a lot of Google traffic, but weirdly the traffic they lost was to their lowest converting, lowest value pages.
In one case the hit was due to display advertising. In the other case, which was more of a direct conversion path, they hadn’t lost that much. Google might take thirty percent of your traffic, but for some, it might be the thirty percent they didn’t really need.
Eric Enge: That is certainly not harmful, but it suggests if you have a chunk of weak pages dragging down the site, you should NoIndex them and that may help you recover.
Rand Fishkin: I would be interested to see that because I am not sure if Google is sophisticated enough to do the separation between what is indexed and not, and to distinguish what is intentionally put on a site and not. They might be dragging down sites that have these pages even if they are not indexed.
You might have to block with robots.txt or put a password protect on them or just find another way to keep the robot from getting to them. If I were running one of those businesses, I would be doing some testing.
Eric Enge: I think that’s a good idea. I am also concerned about this notion that you need to find some weak pages and fix them because I think most of the people that were hit by this have a broader problem than a cluster of weak pages. They are not all going to be eHows where, as Aaron Wall suggests, it is a branding issue and the algorithm had to be tweaked to figure out how to get them.
The eHow business model was clearly rooted in a certain amount of manipulation. For example, you don’t need twelve articles on the same topic with slight variations in the search phrases written by different people. You could see why that would be objectionable from a search quality perspective.
Rand Fishkin: Absolutely.
Does User-Generated Content Help a Page Rank Better?
Eric Enge: You published an article in SEOmoz about the correlation between user-generated content sites fairing reasonably well in Panda even in the absence of much of a link profile.
I have seen places where people have what I have call thin UGC, and they seemed to get hit.
Rand Fishkin: That was not from our correlation data. I believe it came from an analysis that Tom Critchlow wrote about from personal experience. That is what I have been seeing and feeling for the most part with a couple of exceptions. I have seen places where people have what I have to call thin UGC, and they seemed to get hit.
LinkedIn is a great example of a big winner in the Panda update with its relatively robust user-generated content. That has been a great way to do long tail for a long time, and I think it will continue going forward, but there is a quality bar that has to be met.
Eric Enge: I think one could try to simulate user-generated content, but you probably wouldn’t get the desired effect, because I think the real-time stream of things taking place is important, that has to be part of that mix, don’t you think? You can’t just throw up ten user-generated content samples on a bunch of pages and say we have user-generated content but nothing else goes up for another six months.
Rand Fishkin: There needs to be some naturalness to those signals. You think about content that gets contributed on LinkedIn, Quora, Facebook or Flyshare. There is an authenticity that is connected to real people who have real profiles elsewhere. A number of sharing activities go on in these little communities, and some percentage of it receives comments from other people, so here are many associated signals the engines can be looking at.
What’s not in Panda
Eric Enge: What have we left out that they might be looking at discussing?
Rand Fishkin: One thing that is interesting and extremely frustrating is what Google left out of the update, which is any type of devaluation of manipulative linking.
Eric Enge: Yes, My last column on Search Engine Land was on that topic.
It’s called the Speculating on the Next Shift in Google Search Algorithms. It was exactly about that.
Rand Fishkin: I think it infuriates both of us, the way black hat and grey hat manipulative link acquisition strategies work tremendously well in tons and tons of areas and they are becoming more bread and butter. I think after the aggressive spam policing and punishment of link buying exercises that Google went through from ’06 to ’08, the last two to three years have been a wasteland in terms of addressing these paid links that claim to be directories or the sidebars of blogs.
Eric Enge: Exactly. In my article, I did provide details of the kind of links that one site in the coupon space was using. They are prospering, and they have hit counters, they have blog posts with the classic three anchor text rich links, and I actually show a sample of a footer link where the link, the anchor text rich link, is literally six inches below the last usable text from the page.
Rand Fishkin: It is really disturbing. I think the most frustrating thing is that people who are new to SEO, or people who have been doing SEO for a little while, might think they don’t want to do any black hat kind of strategies; however, everyone in the top ten is doing it and the top three seem to be really effective at it, and maybe there are only a few brands here and there that do white hat and still succeed, but my clients are asking me for results so I have to do this.
It is easy to find link profiles of your competitors and see that those links are working and test them out. It’s horrifying because it creates a bunch of people who believe in the value and power of black and grey hat SEO and a bunch of rich people who are selling those links. It creates this whole marketplace around this type of stuff which, in my opinion, tends to be the operators who are the least ethical, least trustworthy and least reliable.
This means everyone in the search and SEO spaces suffer from unreliable, negative operators being the norm in the industry. It creates a terrible perception among marketing managers, that SEO is just a black hat wasteland. How can you build a great business, how can you build a great reputation, how can you build a great career in this field, if that’s what is going to work?
If You Have Been Hit by Panda What Do You Do?
Eric Enge: With all these things in mind, what does someone who has been hit by Panda do?
Rand Fishkin: I think one of the best things you can do is determine if there are pages on your site that are performing very well versus ones that aren’t and look at the difference between them. Almost always, you will find a significant difference.
If you have been hit, and your pages have been hit across the board, and you don’t have anything that’s ranking on your site look at what is now ranking in your space. You should especially look at the content that hasn’t earned good links but is ranking. Obviously, nothing that’s been manipulated with black, grey hat tactics, or any pages that have been well-linked to or socially promoted.
To get out of the Panda box… Look at the long tail content in your industry that is performing well purely on the basis of its good content.
One thing you should look at is the long tail content in your industry that is performing well purely on the basis of its good content. Look at the formatting, look at how they use advertising, look at the content blogs, look at the layout and the UI and the UX, look at what the content is, look at the experience it creates for people, look at how it was generated. All of those things can give you the signals you need to do the right thing to get out of the Panda box.
Eric Enge: So the great insight you are pointing out to find something that ranks even though it isn’t well-linked. That’s a great approach because the lack of the inbound links to the content says that all the other signals are doing the right things.
Rand Fishkin: Yes.
Eric Enge: Of course, if you have bad user engagement historically, you may have to wait some time for those signals to be seen by Google and acted upon.
Rand Fishkin: This is one of the things that is frustrating for a lot of people. That this is going to take time to recover from.
Eric Enge: You may be waiting six months before you come out from under this problem. I just made up the six months, but you shouldn’t be thinking six weeks.
Rand Fishkin: Agreed. I know people whose board is breathing down their neck, but you need to tell them essentially we were earning traffic we didn’t quite deserve, or we were doing things in a low user experience quality way and we are going to have to take that up a bunch of notches, and over the next few months we can hope to recover.
The first thing I would do is start publishing things in a different subfolder. I would start publishing them in a different format, and I would start seeing whether I could get those pages indexed and ranking well; and if not, it might mean that there is something going on domain-wide. You then need to do what you spoke of earlier, which is consider removing many of your pages from the search indices so that your site can regain the domain authority and the ranking it deserves.
Eric Enge: For many people, it may be easier to start over.
Rand Fishkin: Yes.
Eric Enge: Thanks Rand!