QueryTerms Result Processor to the rescue
The QueryTerms Result Processor is a QRServer query pipeline stage that when enabled will return the lemmatized query terms that matched the query term in a dynamic result field named queryterms. The ESP Document Hit Highlighting Guide describe how to enable the QueryTerms Result Processor. The FAST documentation has the following description on how the QueryTerms Result Processor works.
The QueryTerms result processor intercepts and parses the QRServer’s query tree (FQL) to determine the terms and phrases that make up the query as well as the fields to which they have been applied. Then it lemmatizes the terms and phrases based on the query language (using pre-configured expansion dictionaries) so that the query represents the full set of terms and phrases used by the underlying search engine to identify matching documents.
The result of all this work by the QRServer is that the the lemmatized version of original query terms are returned in a field named queryterms. Below is an example of how that looks for a search for “heart”.
The area highlighted in green above is the queryterms field generated by the QueryTerms Result Processor. The semicolon separated values in the queryterms field are the lemmas for “heart”. The area highlighted in yellow above is the out of the box hit highlighting on the original query terms. If you are getting results through the FAST Search API, the original terms will be tagged with <b> tags instead of the <key> tag you see above. However, the “hearts” lemma highlighted in red above in the body field of the document did not get hit highlighted. This is where the custom code comes in to do the highlighting. Now that you have the lemmas for the original query terms, you’d simply iterate through your results in the UI and highlight the lemmas found in whatever field you want to display the highlighted terms.
In the application that I worked on, we highlighted the original query terms in yellow. This was done by looking for the <b> tags (since we were using the Search API) in the results and applying the right UI styling to highlight it in yellow. To highlight the lemmas, we first tokenized the items returned in the queryterms field into a query terms list. However, we had to make a small tweak to this list. The queryterms field also contains the original query terms. In the above example, notice that the original query term of “heart” is included. Therefore, when we build the query terms list, we checked to see if any of the original query terms are included in the list of values in the queryterms field. If an original query term is found in the list of values, we excluded it from the query terms list. Once the query terms list was built, we iterated through the documents in the results looking for a match for any of the items in the query terms list in the title and body field. If we found a match, we highlighted the matching term in the title or body field in blue to indicate that the term was a lemma.
What about synonyms?
What if you also wanted to highlight the synonyms for the original query terms? This was a requirement for that application I mentioned above. The requirement was that in addition to highlighting the original query terms and lemmas, synonyms of the original query terms were to be highlighted in green. Once again, we used the queryterms field to drive the highlighting.
An important fact to keep in mind is that the approach I’m describing below only works if query time synonym expansion is enabled. This will not work if document side synonym expansion is used.
To highlight synonyms differently than the original query terms and lemmas, the FastQT_Synonym query transformation information is needed. When query side synonym expansion is enabled, ESP will return the query transformation that took place to inject the synonyms into the query. Injection of the synonyms into the query will result in the queryterms field having both the lemmas and the synonyms included along with the original query term as illustrated in the example below.
I’ve highlighted several items above to illustrate the various elements involved in this example. This example builds upon the previous example with the addition of the FastQT_Synonym query transformation information highlighted in blue above. The queryterms are highlighted in green. However, unlike the previous example, the queryterms field in this example includes the synonyms for heart in addition to the original query term of “heart” as well as the “hearts” lemma. The out of the box hit highlighting is in yellow. Again, unlike the previous example which didn’t have query side synonym expansion enabled, the you’ll notice (highlighted in red) that in addition to the original query term of “heart” in the title field, the synonyms “Cardiac” and “coronary” are also hit highlighted by the out of the box functionality with the <key> tag in the body field because matches for the synonyms were found there. You’ll also notice that the lemma of “hearts” in the body field is not hit highlighted just like in the previous example.
The QUERY attribute of the FastQT_Synonym query transformation information can be used to build a list of synonyms for the original query terms. The information in the QUERY attribute will contain the original term followed by a +> which is then followed by a comma delimited list of synonyms for that term. If there are more than one original query term that is synonym expanded, then the pattern is repeated separated by a semicolon. Here is an example of a multi-term synonym expansion for “heart attack” where the individual terms “heart” and “attack” where expanded: heart+>cardiac,cardiac structure,coronary,heart structure,hrt;attack+>attack behavior
Earlier, I described how to highlight the lemmas in the UI using the information provided by FAST in the queryterms field. The inclusion of the synonyms in the queryterms field mud
dies things up a bit when it comes to determining which of the items are lemmas versus synonyms versus the original query terms. However, its still pretty easy to determine which terms are synonyms and which terms are lemmas. As you have probably already guessed, we’ll use the information provided in the FastQT_Synonym query transformation. Use the information in the QUERY attribute of the FastQT_Synonym query transformation to build a list of synonyms. Then, just like the process described above, build a list of lemmas from the queryterms field using the synonym list to separate the synonyms from the lemmas in the queryterms information. You’ll also need to eliminate the original query terms from the queryterms field value to get a true list of lemmas.
Once the synonym and lemmas lists have been built, it’s the same process as described above for custom hit highlighting the lemmas. Simply iterate through the documents in the results and use the synonym and lemmas lists to determine which terms are going to be custom hit highlighted in the UI.
If you have any questions regarding this blog post, please feel free to email me at at rem@pointbridge.com. I welcome feedback on this content and also greatly appreciate suggestions for grammatical and/or spelling errors.