During the redesign of the user interface for a search tool, it became clear that issues existed with the ranking of results that were more important than the issues with the user interface.
The search gave access to financial research, published as a set of PDFs and Excel documents. Users were concerned that they were often unable to find older documents, even when using highly targeted search terms. When searching for regular publications (documents published daily or weekly, using the same title), they were confused to find that these would appear in an apparently scrambled order, rather than ordered by publication date. They were also surprised that searched often returned apparently irrelevant results.
I undertook a study to see if the ranking rules could be improved.
Research was presented as a set of documents held on a server. Metadata relating to these was stored in a database and accessed via an Autonomy search index. Autonomy provides several ranking tools, but one of these was of particular interest: DATEBIAS.
It was clear that publication date was highly important for this content set, with information becoming outdated quite quickly. However, presenting results in purely reverse chronological order was not an option as this exposed the many irrelevant matches that would occur for each query (for example, searching for ‘euro’ might return a document that mentioned the word once, even though the document itself was not primarily about the currency). What was required was a blend of recency and match quality, that balanced how new a document was with how well it matched the query.
DATEBIAS provides a method of ranking documents based on their publication date and its effects can be adjusted at run time by adjusting two factors: the size of the boost (in percent) and the length of time over which it should be applied (specified in days). The degree of boost is applied at a maximum for current documents and falls in a linear way for older documents, reaching zero at the end of the specified period. For example, a setting of 20% over 20 days would boost documents published today by 20%, documents published 10 days ago by 10% and documents published more than 20 days ago would receive no boost.
At the start of the study, the settings imposed a very strong boost over a relatively long period.
One of the challenges when working with search results is that, in reviewing a set of top 10 or 20 documents that are returned, you do not see which documents were not returned. This makes it hard to judge if the process is working well, or if more relevant documents were missed. Using a special URL to extract large numbers of results as XML, I developed a visualisation that showed all documents returned by a given query and how their final rank value was related to their age.
In this visualisation, each dot represents a document; their vertical location represents their final rank value; and the further they are to the right, the more recent they are.
This visualisation allowed me to examine the effect of different queries; test the effect of datebias and look for optimum settings.
The effect of DATEBIAS can be seen by including the DATEBIAS boost in the value plotted on the vertical axis. The boost has the effect of pushing up the rank value of the newest documents. The five documents with the highest combined rank are highlighted in each case. These would be the top five documents appearing in the search results. The effect of DATEBIAS is clear, causing more recent documents to be selected by pushing them up the vertical axis.
Whilst this plot is useful for understanding the effect of DATEBIAS, it is less helpful in finding the best settings, because it obscures the match quality of the documents that are affected by the DATEBIAS.
In these charts, the vertical axis shows only the match quality, but the highlighted dots represent the top five documents according to the combined rank, as before.
The top image represents the case when no DATEBIAS is applied: as before the top documents are good matches, but drawn from a wide range of dates, which doesn’t make sense for this type of search. The middle image shows the effect of applying a very strong DATEBIAS over a short period – recent documents are returned at the top of the results, but this includes documents that were not a good match for the query. This is unhelpful.
The bottom image represents the ideal combination – only good matches are returned, but they are also recent in date.
Finding the Best Settings
Whilst it would be possible to tweak the boost and date settings and try to uncover the best combination by trial and error, I wanted to find a way of testing a large set of possible combinations against a number of different queries to see if an optimum setting could be found.
To do this, I built a VBA tool to automate the analysis. It extracted XML from the server for 10 different queries in one go. Using this I was able to test 48 different DATEBIAS configurations. In order to assess the quality of the results returned in each case, I developed a scoring system, which would allow a single effectiveness score to be derived for each DATEBIAS combination. This worked by analysing the position of the top ten results relative to the rest.
Good results were considered to be those that maximised the product of Q and R – essentially finding the documents that were as far to the top and right of the scatter plot as possible.
This score accorded well with subjective experience of good results.
The scores for the 48 different combinations of Boost and Date Range were plotted on a 3d surface chart in order to find the best settings. It revealed that the optimum settings used a very low level of Boost, spread over a long period of time, which ran counter to initial expectations when the study began.
The objective nature of this process gave business stakeholders the confidence to prioritise the implementation of the changes, resulting in an immediate improvement in the quality of results delivered to users.