A Mathematical Model for Assessing Page Quality

Guest Post by Ashutosh Garg

In this article, by page quality, we refer to the quality of a page with respect to a search query and a user who issued the query.
Page quality is a broad concept and depending upon the actual context in which one plans to use quality score, the actual algorithm will vary. Instead of going into a specific algorithm, this article will present a framework in which to think about page quality and how one can morph it to fit one’s unique situation.
Some of the situations where page quality is used are

  1. Search Engines – search engines score a page with respect to query and use this signal to understand if a page might be relevant to a user’s query or not. Additionally, by assigning a numeric score, one can identify one page is “relatively” better than the other one or not.
  2. Ad Targeting – when showing a particular ad to a user, an ad network may score the ad and corresponding landing page against user-issued query and use it to identify if the ad is indeed relevant to what user is looking for or not.
  3. Discovery – A page can be evaluated, even in absence of query, to understand its quality and thus identifying if this page should be recommended to end user or not.

In this article we will consider the different algorithms used to assess page quality.
The first set of algorithms will compute the score of the document as a function of the actual query issued by the user –
IR score – Information Retrieval community has been researching how to compute the best possible score for a page given a query. This is probably the most important score one can use in evaluating a page. Various open source search engines like Lucene implemented this algorithm. Given a query Q={q1, q2, q3} containing three words and page P, various steps that are used in computing the score of the page are

  1. Come up with a relative weight of each section of the page – A typical webpage can be divided into various components like – title, headings (H1, H2, H3..,) body, bold text, large text, small text (based on font sizes), text above the fold of the page (assuming a certain display), anchor text, boilerplate text, text on pages being pointed to, text on pages user visited to prior to visiting this page, text present in images on the page, URL text, etc. Depending upon the application, one can assign different weight to different elements of the page. One rule of thumb is to see how people are going to discover the page and form their first impression. If it is search – people will discover page by reading the title and snippet displayed by search engines. People will form their first opinions by reading the text above the fold.
  2. Generate features based on query – Take the query and break it down into n-grams (a bigram is all phrases of length two). This is followed by assigning weight to each of these n–grams. For e.g. consider a query – “canon digital camera”. In this query, “canon” is an important unigram as it refers to the brand. “canon digital” is a bad phrase while “digital camera” is a good phrase. Traditionally people have used TFIDF (*idf ) to come up with a weighting. One thing to be cautious is which dataset that is used to compute TFIDF. It should very closely resemble the dataset where weighting is applied.
  3. Document quality for computing TFIDF score – A document that consists of the content of all the pages on the web will match any query. However, it is not a great experience to come across a very large document. At the same time, a document which is identical to query is bad as a user won’t learn anything new when (s)he lands on the page. Review what the platform is used by most of the visitors of your website. If they are using smartphones, ideal document length should be less than 500 words, tablet – 1K words, laptop – even longer given the presentation. Some way of normalizing the score by document length should be used. Various papers have been published in the IR community to normalize IR score based on document length.
  4. A simple scoring of a document can be

Page P consists of fields di, with weight wi and query Q consists of words qk. Length of page is L, number of phrases in query is Nq
where f is a normalization element function based on doc length.
Which page has a higher IR Score for “Canon digital camera”?

Both product pages above are for Canon digital cameras, but one of them has a much higher IR score. Can you see tell which page’s score is depicted in the table below?

Query Words/Phrase Title H1 Body Bold Weight
Canon 1 1 4 0 1
Digital 1 1 2 0 1
Camera 1 1 7 0 1
Canon digital 1 1 0 0 2
Digital camera 1 1 2 0 2
Canon digital camera 1 1 0 0 3

Query Behavior Score – It is the score based on how people are interacting with the page. How often, visitors of a page find the page interesting for a given query. Most websites out there have a way of defining success (also known as conversion). In case of eCommerce websites, conversion is defined as the purchase of a product or service. In case of lead-gen websites, conversion is defined as filling out a form. In the case of media websites, it could be interaction with some media element – like playing of video or number of page views. For a query, one can compute the conversion rate and use that directly as the behavior score. The challenge with that is typically this data is very sparse. On an e–commerce website, conversion rate can be as low as 0.5%. That means for every 200 visits for a given query, on an average one conversion will be observed. Long tail queries, by definition, have low volume, making this computation impossible. There are two ways to address this concern –

  1. Query level generalization – instead of computing the score for actual query, compute the score for an abstraction of a query. E.g. query “canon digital camera” can be abstracted to following –
    • Three-word query
    • Query with a brand name
    • Query with all words present in the title of the page
  2. Now one can instead say – what is the conversion rate of all queries which are of length three and have all the words in the title and also have a brand name as one of the words of the query. As you can see, this abstraction can be very general or very specific. Based on the amount of data available, one can choose an appropriate level of abstraction.
  3. Alternates to conversion such as bounce rate – While a conversion rate can be as low as 0.5% or lower, bounce rates are typically in the range of 20–80%. This means that you need significantly fewer visits to evaluate the quality of the page. One needs to be careful as bounce rate may not always be highly correlated with conversion rate.

There is a second set of scores for a page that are computed independently of the query. Some examples are:
Behavioral score of page – How people perceive a page is a big indicator of the quality of the page. This can be measured by analyzing the user behavior. Some of the factors that are traditionally used are –

  1. Conversion score – Compute the conversion rate of this page independent of the queries leading to this page
  2. Bounce rate – Compute the bounce rate of this page independent of the queries leading to this page
  3. Number of page views – How many pages are viewed in a session followed by viewing of this page
  4. Number of repeat visitors to this page – How many users keep coming back to this page.
  5. How many people add products to cart after visiting this page?
  6. Average amount of time that is spent on this page.

Behavioral signals cannot be analyzed in isolation. They have to be analyzed relative to other pages that are similar. E.g. On an e-retailer’s website, one can compare the behavior of a product page with respect to other product pages.
A simple way to compute the score would be

Where fi is the value of feature (bounce rate etc) and mfi is the average value of feature fi across all pages of the same type. wi is the weighting given to different features – one may give a weighting of 0.8 to conversion and only 0.1 to bounce rate which is a very noisy feature. A more sophisticated way to do this is to look at the number of people who bounce off a website and then click on a different search result for the same search query.
Reputation of the page – Page rank is a great proxy to compute how reputed the page is with respect to other pages on the site. Other factors of reputation are – how far is this page from the home page – number of hops required to get to this page as one would navigate from the homepage.
Language quality of the page – One can build a language model over the content that people have liked on the site and score the page with respect to this language model. HMMs are used to typically model the page. Some of the papers describing language models are –,,
Once all these scores have been computed, the next step is to combine these scores.
Let’s say the scores are:
IR (IR Score)
B (Behavioral score)
R (Reputation score – or page rank)br />
LM (Language Model Score)
A simple way to combine these scores would be
S = wir* IR + wbi* B + wrR + wlmLM
The weights can be adjusted to reflect how much weight you want to give to each feature. If the page is new, behavioral data will be minimal and you want to give it a small weight. However, if the page is old, you must give it a much higher weight.
While the above gives a good perspective on how one can go about computing the score of a page with respect to query, it requires a reasonable amount of IT investments which may not be possible for an average marketer. In the next blog post, I will cover some of the methods that can be applied on top of Google analytics to approximate these scores.

About Ashutosh

Ashutosh is the Chief Technology Officer (CTO) of BloomReach and a true guru of all things search, with 10 years of information retrieval, machine learning and search experience. Previously, he was a Staff Scientist at Google for 4+ years, which spanned 8 product launches. Prior to that, he was at IBM research. He is also a prolific publisher/inventor, with a book on machine learning, 30+ papers, and 50+ patents. Ashutosh holds a BTech from IIT-Delhi and a PhD from U of Illinois UC. Ashutosh has numerous awards, including best thesis award at IIT Delhi, IBM Fellowship and outstanding researcher award at UIUC.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Eric Enge

Eric Enge is part of the Digital Marketing practice at Perficient. He designs studies and produces industry-related research to help prove, debunk, or evolve assumptions about digital marketing practices and their value. Eric is a writer, blogger, researcher, teacher, and keynote speaker and panelist at major industry conferences. Partnering with several other experts, Eric served as the lead author of The Art of SEO.

More from this Author

Subscribe to the Weekly Blog Digest:

Sign Up
Follow Us