Paul Yiu is the Principal Group Program Manager for Bing Social Search. The team works on integrating social content, such as Twitter and Facebook, to improve search quality and making the experience more personally relevant. Prior to joining the Bing team, Paul was at Yahoo working on Web search and mobile search, most recently as Senior Director of Product Management at Yahoo Search. Prior to Yahoo, Paul has 15 years of software/internet experience ranging from online exchanges, mobile commerce, multimedia, to enterprise software, at companies big and small.
Key Points from Interview with Paul Yiu
Tons of discussion in here about how social media can impact search results. While the author authority discussion is near the end I am going to highlight it first here (Twitter was the specific social network discussed), because there is a LOT of confirmation of how author authority works:
- Bing does look at the number of followers you have.
- They also look at the number of people you follow. If you follow 200 people and 8,000 follow you this might indicate more authority than if you follow 9,000 people and 8,000 follow you.
- “We can actually analyze the follow graph and tell if you are trying to game the system.”
- Relevance of the followers is used as a signal: “who you are connected to says something about you. You don’t want to get into the wrong crowd; It’s not good if you hang out with the bad group at the high school.”
- The relevance of who you follow also matters.
- You don’t hurt your Twitter stream by talking about irrelevant stuff. What matters more is what happens to your relevant stuff.
- Retweeting patterns are tracked and used as a signal – especially for your relevant tweets.
- The relevance of the re-tweeters matters too.
- There are many iterations if signals that oculd be tracked, but as you get deeper and deeper into it the strength of the signal diminishes, so a limited number of factors (such as those above) are considered.
Here are some of the other key points from the interview:
- (re: social media): “The behind the scenes signals are pretty useful for us, as search engines always need to find fresh content, and it’s always hard to rank fresh content.”
- “We are trying to merge a little bit of the search and browsing intent into one, and have your friends help you navigate the web a little bit better. In a way we are bringing the office water cooler to the search engine.” (the emphasis is mine).
- (regarding using “wisdom of the crowd” to move content higher in the results): “… people tend to like gossipy things, such as who got pregnant or was in a scandal, or something like that, so it tends to work in those cases, but not so much in the case of navigational searches …”
- “If the content doesn’t earn its spot its placement gets modified” (confirming again that Bing uses CTR which was done for the first time in this interview with Duane Forrester.
- As of February 22, 2012 users could associate articles of interest with people they know (the “subject” person). The subject can then use Facebook to decide if they want that article associated with them or not. If they let it be associated that article will not be highlighted in the search results for friends of the subject.
- Social media can provide some useful enhancements to search, but currently is not in danger of reshaping the structure of search (my paraphrase of a conversation below).
- Bing currently does not analyze Facebook updates to collect information that could be used to personalize search results. For one thing, there are serious privacy concerns with this.
- “The typical network on Twitter has characteristics that are hard for people to emulate artificially. These (artificial networks) are unnatural, and when we see networks like this you can tell these people are trying to sell teeth whitening or whatever.”
- “… when you say stuff where people tend to re-tweet you it behaves bit like a link.”
- The level of effort to make a social media action affects the signal strength. For example, a Like is very little effort, and a Share requires a bit more effort.
Full Interview Transcript
Eric Enge: Can you start off with an overview of Bing and social to date?
Paul Yiu: Over two years ago we went down this path of integrating Twitter into search. Much of what we’ve done with Twitter is actually really interesting even though you may not visibly see everything. Here is what the UI looks like.
We use Twitter and public Facebook information to improve our search engine. You can see a list of topics that are trending and based on the topic what are the links people are showing on the web, and what are the updates. It could be a tweet, or a Facebook update.
The behind the scenes signals are pretty useful for us, as search engines always need to find fresh content, and it’s always hard to rank fresh content. When I worked at Yahoo three years ago, we were pretty happy if a user searched for up to the date news, such as an earthquake, and they got the right page back within a day of the event. Now, users are a lot more demanding. They want up to the minute accurate information, and we use Twitter and Facebook to help us provide that.
It really helps us in terms of finding pages that people are sharing out there, and getting the right content and even ranking. Recently, there was a premier league championship game and someone kicked this great goal. Without social signals, we would have likely shown more of the facts of who won the game. But, with social signals we were able to show the video of that really beautiful shot that everybody on the web was sharing. Because of these signals, our results can better reflect what people are doing on the web.
So today, the algorithm is flavored by people.
So today, the algorithm is flavored by people. That was our first step. As we worked more with Facebook the conversation became more and more interesting. We came up with some neat ideas to make search a little more personalized. What if you could see what your friends are paying attention to when you are searching? Just as if you are having a conversation in a coffee shop, and someone walks in, the natural tendency is all of us will look. We want to mimic that in search. We started a project a year and a half ago that we called Sergeant Pepper, because the idea was that you want to get by with a little help from your friends (that was one of the songs on the Sergeant Pepper album by the Beatles).
Suppose you search for YouTube. One guess might be that your likely intent is to check out cool stuff to look at when you get there. But maybe I have a friend in Indonesia who shared this other link on YouTube, and maybe we should let you know that. We are trying to merge a little bit of the search and browsing intent into one, and have your friends help you navigate the web a little bit better. In a way, we are bringing the office water cooler to the search engine.
Eric Enge: A few months back I did a search on the New York Post, and what showed up was a wisdom of the crowd based result which was the content that had received the most likes on your post.
I believe that Bing did this because I didn’t actually have a direct friend that had liked something, so it was trying to show me the most popular content on the site. However, that no longer happens when I do that search now.
Paul Yiu: Yes, you can still see those types of results, but we did some tuning of the way that works.
In some cases people really engaged in those types of results, so we still show those, and in others they did not, so we stopped showing it in those cases. For example, people tend to like gossipy things, such as who got pregnant or was in a scandal, or something like that, so it tends to work in those cases, but not so much in the case of navigational searches as in your example.
Eric Enge: So, in the case of the New York Post, people aren’t really as interested in the most liked articles.
If the content doesn’t earn its spot its placement gets modified.
The Digital Essentials, Part 3
Developing a robust digital strategy is both a challenge and an opportunity. Part 3 of the Digital Essentials guide series explores five of the essential technology-driven experiences customers expect, which you may be missing or not fully utilizing.
Get the Guide
Paul Yiu: Yes. The way we think about it is not different from any other search engine. If the content doesn’t earn its spot its placement gets modified.
Let’s look at a case where the intent is a little more specific, such as a search on “family-friendly hotels in Maui”.
I am going to Maui in two weeks. Some of our friends liked some family-friendly hotels. The intent here is for people to help me make decisions where all these results are great, but maybe my take of the web is different from your take of the web like my friends are, I’ve got lots of friends who are parents.
Another scenario is when people search on their own name. Now, why is that interesting? Well, people actually do search for themselves online quite a bit for our sense of vanity. When I do that on my name there is this cool math professor that comes up, that’s definitely not me. So, one of the things we started thinking about is wouldn’t it be cool if I can actually alter this result for me and maybe my friends who likely are looking for the Paul Yiu that works on Bing. Then there is Harry Shum, he is my boss, and he shares a name with the guy on Glee.
On February 22, 2012 we released a way to improve those results for yourself.
You get a mixture of the one that’s the PhD that’s with Microsoft, you get one that’s a great dancer, that’s younger, and more handsome, and prettier. So then, how do we make this a little better? On February 22, 2012, we released a way to improve those results for yourself.
Let’s say I search on Harry Shum, and I see the Glee guy, But I also see that the one I know won an award as an Asian-American exec of the year. The Harry Shum I know is not going to share it on Facebook, because he is a humble guy. What I can do is link that to him.
Now, the idea here is as soon as I do that that it actually links over to Facebook and it shows up in his timeline. Facebook will tell him that I linked him to this article, or Harry has the choice to indicate if he prefers to still not share that with people, or he can just let it be. If he lets it be, any of us that know Harry Shum, and search on his name will get this particular link.
As more friends do this, the more we can make this search result page about him.
As more friends do this, the more we can make this search result page about him. The Glee guy, or his buddies, can also pick and choose things that they think make him shine as well. For his friends, his search results would be about the Glee guy, and stuff more about him. This helps people find information on their friends more quickly. We are hoping this will change the way people do people search.
Eric Enge: This is good stuff. It’s a great example of what you can do with social data. Do you have another example?
Paul Yiu: Yes, Bing knows that Sean and I are friends on Facebook, for example.
As you can see the Scott I know gets premium placement. In fact, friends, and friends of friends will get a special treatment right now.
We can do that because we are pretty confident. When you go beyond that, the signal gets a little weaker.
It’s really interesting when you stumble across that person that you knew in high school, that you haven’t seen for a long time, and you end up friending him/her on Facebook, and then you wonder what they have been up to. We can help you find that out quickly.
Paul Yiu: Exactly. Harry’s award result might have ranked #15 in the results naturally, because many of the results are being taken up by the Glee guy. But now, for Harry, and his friends something that would have been ranked at #15 will show up higher.
Eric Enge: Of course, sometimes you really do mean the guy from Glee, even though you know Harry at Microsoft as well.
… we limit the number of items that can be promoted this way to three.
Paul Yiu: You are totally right. How we handle that is that we limit the number of items that can be promoted this way to three.
Eric Enge: Let’s talk about the richness of the dataset. Obviously, there is an enormous amount of data available, but it seems to me that we have 50% of the US population on Facebook, which means that 50% are not on it. Of those that are on it, it’s probably 10% that are very active. Perhaps 20% dabble and contribute content or Likes from time to time, and 70% who do more or less nothing at least in terms of sharing content or Liking things.
Paul Yiu: I would have to ask Facebook that question. The reason I say that is we have by and large the publicly shared information on Facebook, so your Likes and your public shares and your profile information, and your friends. We don’t have all the private stuff, such as the things I only shared with my mom. So, it’s hard for us to gauge whether or not it’s ten, twenty, or seventy.
However, Twitter publishes data like that, and a tiny percentage of people on Twitter account for a lot of the huge chunk of content.
Eric Enge: What I am getting at is that the limited data does require you to look for different scenarios where you can leverage the data to tell you something unique and useful. Social media is not in danger of replacing web search algorithms just yet. What you can do is leverage Facebook to provide you with enhancements to the search results. Does that make sense?
Paul Yiu: Yes.
Eric Enge: One interesting scenario is when your friends Like some content. Let’s say you are looking for a restaurant in the Bellevue area. The search results might show me that my friend Liked a particular Italian restaurant, and maybe I’ll ask him what it’s about.
Paul Yiu: That’s a great example. I live in California all the guys I know that live around here have restaurants they have Liked, and it’s super helpful. News is an interesting example because it offers a hybrid between search and browse. A friend of mine Liked this article in the New York Times on decision fatigue, how we get worn out by the end of the day. When I searched for the New York Times I saw that, and it’s what he paid attention to.
In a way, it’s like the water cooler was brought to my search result page. That’s a hybrid between search and browse that can be super helpful.
Another is travel and shopping because. If I go on vacation and have a good experience, we may indicate that we Like the hotels and places to go. Perhaps for family-friendly hotels in Maui.
Of course, how this adds value to you depends on what your friends actually do.
Doing this in search mimics what we do in the non-search world. Maybe there are things you happen to want to ask your friends about. Of course, how this adds value to you depends on what your friends actually do.
My friends are mostly nerds. When these crazy thin ultra notebooks were all the rage, I searched for that and I saw a lot of recommendations right in the search results, so that was really cool for me.
If I were in the alternative music scene, I might see different things. So, it depends on what my friends are into. A lot of my friends are in tech, so that influences my queries. Sports queries are interesting too. I may want more than the super unbiased ESPN article about the big game. I may want to know from the Alabama perspective what was great about the game.
This all acts to tweak the web to give it a little more flavor and personality.
Eric Enge: You probably serve the unbiased article and the Stanford article.
Paul Yiu: Yes, you still get the regular news.
Eric Enge: What if someone types in “hotels” and you see that they’ve been, talking to their friends about going to Rome in Facebook updates or Twitter tweets. Is that a possible source of data at some point you think?
Microsoft is super careful from the privacy and user control perspective.
Paul Yiu: Not currently. We do something based on your IP address if you search on things like “restaurants”, we try to find restaurants that are near your location. We haven’t done what you are suggesting yet. It would be interesting to do, but Microsoft is super careful from the privacy and user control perspective. We would have to be super transparent about that, and even then, there might be many people who object about the way their data was being used.
Eric Enge: It really does get down to all of these scenarios. It speaks to how complex the problem is because you have all these data sources and Facebook is a disjoint data source from search data, and, you are trying to find points of intersection. It will probably be a systematic process that will unfold over many years. It will also evolve as the younger generation gets older and more people are engaged with social sites.
Paul Yiu: Yes, there is a long way to go. We’ve been at this just for a couple of years, and Google just got started.
Eric Enge: What about the challenges that you have with the scale of the data. You have this enormous scaling problem with having a search engine, and, now you have this other enormous database of social information.
Paul Yiu: Yes, it’s hard. When we did it with Twitter it was really challenging because the whole idea of Twitter is being fresh. You can’t wait for hours for a Tweet to make its way onto our search result page. So, that really pushed us.
One of the advantages we have working at a company that believes in search like Microsoft is that we have really good infrastructure and a really good platform.
As a result, we have built the right infrastructure for this. When you create a Tweet, or a Facebook update, we want to reflect that in our system very, very quickly. We did not have to build a whole lot of new infrastructure, as Bing already knows how to index content and serve it and rank it very quickly.
In addition, how you think about ranking is also different. But fortunately, we work for a company that believes in search. There are a lot of people here that we can collaborate with to make this thing happen.
Eric Enge: Can you outline a bit about how you determine the value of a tweet?
One metric is how many people follow you, but that can be gamed a little bit. We can actually analyze the follow graph and tell if you are trying to game the system, because your network on Twitter looks disjointed.
The typical network on Twitter has characteristics that are hard for people to emulate artificially. These are unnatural, and when we see networks like this you can tell these people are trying to sell teeth whitening or whatever.
We look at the way people are connected, and often we correlate that to the quality of a Tweet.
We look at the way people are connected, and often we correlate that to the quality of a Tweet. We can also analyze the content the Twitter account links to. What does that mix look like, and how do people interact with the content you are tweeting. That’s just on the Twitter side of things.
On the Facebook side in a way we are still working on it; with Facebook most of the time it’s your true identity. On Facebook right now is just stuff from your friends, so it’s a different problem.
Eric Enge: One example additional metric for Twitter is an account with 8,000 followers, but they follow 9,000 people, that looks like an awful lot of swapped following. In contrast you have another account where someone has 8,000 followers and only follows 300.
Paul Yiu: Yes. We look at that.
Eric Enge: What about the relevance of the followers?
… who you are connected to says something about you.
Paul Yiu: Yes that matters too. Just like it is in the real world, who you are connected to says something about you. You don’t want to get into the wrong crowd; It’s not good if you hang out with the bad group at the high school.
Eric Enge: I suspect their history of tweeting matters too. Maybe they have a history of tweeting about country music singers, and so it would matter that they have a good following and they only follow relevant people themselves.
Paul Yiu: Correct.
Eric Enge: What about ReTweets?
Paul Yiu: Yes. We actually look at how often those tweets make their way beyond just you. That seems to be a good signal.
Eric Enge: So, you don’t hurt yourself if 80% of your tweets are talking about what you had for lunch today, and those are just more or less ignored as long as the ones you do about country music get a lot of action on average.
Paul Yiu: That’s right. You can talk about the weather all day long.
Eric Enge: I assume the authority and the relevance of the people who are doing the re-tweeting matters too?
Paul Yiu: Yes, there are almost an infinite number of things you can consider.
Eric Enge: This kind of thing goes on and on. Just like standing in front of a mirror with another mirror directly behind you. The eighth mirror doesn’t matter so much, right?
Paul Yiu: That’s a great analogy the mirror in mirror thing.
Eric Enge: Ultimately, it matters that you have good followers, that you follow good things, that you tweet relevant things to a topic area and get a good response from people who are relevant themselves, right?
Paul Yiu: Yes.
Eric Enge: It’s all self-reinforcing.
… when you say stuff where people tend to re-tweet you it behaves a bit like a link.
Paul Yiu: Yes. You don’t want to be connected to a bunch of junk, and when you say stuff where people tend to re-tweet you it behaves a bit like a link.
Eric Enge: I also believe that the level of effort in the action matters. So, clicking on a Like button does indicate interest, but it’s a very low level of effort.
Paul Yiu: That’s right, it’s easy to Like something.
Eric Enge: Compared to a share, which is more effort. Then there are links implemented to your web pages from other web pages. These still carry more weight because you actually have to own and operate a website.
Paul Yiu: Yes, that’s right.
Eric Enge: Of course, this assumes all other factors are equal and I understand there are a lot of moving parts in that conversation.
Paul Yiu: Yes. That’s right.
Eric Enge: Thanks Paul!
Paul Yiu: Thank you Eric!