One fascinating area of search is the process by which search engines break down the content of a search query and figure out how to process it. In my search to learn about this I was given the opportunity to chat with Bing’s Andy McGovern about how they looked at this challenge. The following Q&A captures the bulk of that dialog:
Eric Enge: 1. In today’s world of search, one of the challenging aspects of handling arbitrary user search queries is classifying how to treat them, and that this involves doing your best to extract the user’s actual intent. For example, some of these can lead to “rich answers”, and for others, the best response is traditional web search results. Can you talk about the general process for how this is handled?
Andy McGovern: Bing has lots of “Instant Answers” (or just “Answers” for short) that show up along with web results on the Bing SERP. For example, there are some standard Answers for domains like Sports, Weather, Stocks, Images, Videos, and some special seasonal Answers like Elections and the Super Bowl. These experiences need to show up at the right time depending on what the user is interested in. We make these routing decisions in part using query classifiers.
These query classifiers use machine learning, which is based on data from past users’ searches and clicks, and also input from trained evaluators, to determine what your query is about. Some queries have explicit intent, like “pictures of cute puppies”, but we are also able to determine implicit intent, like if the user just queries for “cute puppies”, we still know that the user probably wants to see images (or videos). We execute lots of these for each query, and we use the output to help decide what to show on the SERP.
Eric Enge: 2. Can you identify the different major types of classifications? I.e., we have queries that deserve rich answers, queries that get simple web search results, but are their other gradients in-between?
Andy McGovern: We run lots of query classifiers across lots of domains. When it comes down to it, nearly all SERPs represent an in-between gradient. Like with the “cute puppies” example, some people may want images, and others may want videos. Some people might like to click on the image and video thumbnails that we put directly on the SERP, while other people might want to visit a website that is dedicated to celebrating how cute puppies are. Our goal is to return all the result types that people might want, including both web results and rich answer experiences, and to rank them on the page.
Eric Enge: 3. I imagine (tell me if I am wrong) that we also have other factors that come into play, such as “Query Deserves Diversity” (QDD) scenarios for queries such as Jaguar. How does that factor into the process?
Andy McGovern: Yes, diversity is very important, both in the types of results that a user might want (Answers vs. web results, for example), and also when there is ambiguity in the query, like in the case of Jaguar, which could be a car, or an animal, or something else. We handle this by first gathering as much content from different sources as possible for a given query, and then making the best decision possible about how to compose the content on the page. If there is more than one viable interpretation of a query like Jaguar, then we compose the SERP in a way where these different intents are represented, with the stronger intent on top. In many cases, we also offer users the opportunity to disambiguate their search via Related Searches or in the Context Pane. Search for a common name like “john smith” and you’ll see a section in the Context Pane that shows a list of John Smiths that you might have been interested in.
Eric Enge: 4. How are these decisions calculated and made? Is this all determined by a probability analysis of some sort? I.e., do you only serve a rich answer if you believe you have greater than “X percent” chance of being right?
Andy McGovern: At a basic level, all decisions that Bing makes about what to show its users are made using probabilities, based on aggregated feedback from past users, and also based on results from trained evaluators. But it really comes down to an ordering problem. Think of each result on the SERP, whether it’s a web result or an Answer or something else, as occupying a slot on the page, from 1 to N. Our job is to fill those slots with the best results in the best order.
Eric Enge: 5. Do you leverage aggregated historical user interaction data to refine how these probabilities are assessed? Or how does aggregated historical data play a role in the overall process?
Andy McGovern: Yes, we do. Past users’ queries and clicks, used in an aggregated, anonymized way, help teach the Bing engine which results it should show, and in which order, for future queries. This happens primarily by using these queries as training data for our machine learning models, which are then used to classify queries and rank the results that shows on the page.
Eric Enge: 6. What role does personalization play in the overall process? What other factors come into play? Factors like the searcher’s location and language, for example.
Andy McGovern: Personalization factors into the search experience in various ways that are more and less subtle, depending on the circumstance. Your past queries, current location, and other context might be taken into account in order to give you a good set of personalized results. One of my favorite examples is giving good results for ambiguous acronyms. If a user queries for “wdot”, it could refer to the Department of Transportation for Wisconsin, Wyoming, or Washington. There’s a good chance that your current location will play a role in our decision of which result to put on top for that query. If you search for “movies” or “weather” or “sushi restaurants”, then we will use your location to give you results near you. If you search in German, then you’ll get great German results for your query. And so on. We run lots of experiments on personalization, and we’re adding more and more personalized features over time.
Eric Enge: 7. In our call the other day, you described how this is done in an initial processing step, and then passed on to “downstream” components. Can you provide an example scenario that illustrates the concept?
Andy McGovern: I think you are referring to Answers, right? If so, I basically described this in previous responses above. “Downstream” refers to components that are invoked later chronologically in the Bing runtime stack. So in the “cute puppies” example, we first determine that the query has (implicit) intent for images and videos, and then we send the query to the Bing Image and Video and Web services to get the results, and then we make a decision on which content types are worth showing, and in which order. In the “cute puppies” case, when I just ran the search on Bing, I saw images, then a couple of web results, then videos, and then more web results.
About Andy McGovern
Andy McGovern is a Principal Program Manager at Microsoft working on Bing and Cortana. He has been in the Bing Relevance team for the past several years. Previous to that, he worked on various projects within Bing related to Local, Maps, Global Relevance, and Infrastructure. In his free time, he likes jogging, water activities, and volunteering in South America.