Hacking Quepid to test search relevance on any website - Charlie Hull

If you know your search is broken (often because lots of users and colleagues are complaining about it) the first step is to come up with some kind of measurement of just how bad it is. Only then should you start to fix it – perhaps by improving the source data, or tweaking the search engine configuration – as otherwise it’s hard to tell if you’re making things better or worse. You need to start to actively test search relevance, which hopefully will become something you do on a regular basis.

Quepid is a free, open source tool that I first became involved with in 2017 while leading Flax, a UK search consulting company. We had a client who was testing Apache Solr as a potential upgrade from the Endeca search engine, and I worked with the team at OpenSource Connections (OSC) to move Quepid from a tool used by a single developer to a tool used by whole teams. Later on after I joined OSC, Quepid became the centerpiece of the relevance testing process and I continued to contribute to its development and use it in client projects.

Quepid lets you connect to a search engine, create test cases including sets of queries and lets you manually rate how relevant the results from these queries are. Roll all these ratings up and you have a search metric – a number that shows you how good (or bad) search is on your site.

As Quepid was first built to work with the Apache Solr search engine, it usually relies on a direct connection to the search backend – which can take a while to set up. You might need permissions, tunnels, proxies and some developer time. The tool now works with many other search engines including OpenSearch, Elasticsearch, Vectara and Algolia, but you’ll still need help and the right access to get it working.

Mis-using Search Endpoints

However, if you don’t have access to the search engine backend, want to get started quickly, or you don’t even know who runs the search function for a particular company, what can you do? Donning my black hoodie, I’ve come up with a hacky way to connect Quepid to potentially any website.

Quepid lets you create search endpoints that rely on a particular search engine or a HTTP Search API. Send a search request to this API and back should come some tidy JSON containing a set of search results. A lot of sites have a Search API just behind the front end – but again, you may need help to gain access to it.

What I’m going to demonstrate is how to utilise this Search API feature of Quepid to work directly with a website front end. It’s not pretty, it’s not a complete solution but it’s enough to get you started. It will involve some inspection of website traffic, some Javascript coding and a little guessing.

There are some other options, which would involve writing a lot more code and/or configuration:

Write a web scraper or use a commercial scraping platform to send queries, grab the search results and turn them into a CSV file, then import this into Quepid manually (CSV import is another Quepid search endpoint option, but this is an offline process and thus not so interactive).
Create or extend a web server that sends the search requests to the website, reformats the results then presents these on a Search API that can be easily hooked up to Quepid. Here’s a blog on how to do this with .NET.

What’s going on when we search?

I’ll use a UK website as my testbed – Brakes, a major food distributor. Let’s start by taking a look at how search works on their site – my test query is loaf:

As you might expect, we get a set of results, in this case presented in a table. Somehow we need to get these results out of the page and into Quepid, ready to rate them.

Let’s first use the browser’s developer tools to inspect what’s going on when we make this search. Bringing up the tools (on Firefox for Windows it’s Ctrl-Shift-I) and selecting the Network tab, then refreshing the page, shows lots of traffic. We’re looking for some kind of search query – here it is, a HTTP GET request with the query word loaf:

If we click this line, select the Response tab on the right, and slide on the Raw switch, we can see the actual HTML sent back to the browser when the search happens. Somewhere in this should be our search results – let’s click into this HTML, Ctrl-F to find and look for ‘sidoli’ as that’s a pretty unique word from the first search result:

test search relevance and find the results in the HTML

In this case, it looks like the search results are in a block of Javascript. This won’t always be the case of course – they could be formatted as HTML – but let’s see what we can do with this data.

Creating our endpoint

Let’s fire up Quepid and create a new search endpoint for this website. We’re going to pretend that Brakes’ website URL is actually the URL of a Search API – we’ll need to look back at the test search we just did and grab anything before the ‘?’. We’re using HTTP GET requests here:

create a Quepid search endpoint to test search relevance

Make a note of what comes after the ‘?’ – in this case text=loaf – we’ll need this later.

Quepid uses Javascript to translate between what a Search API returns and how Quepid represents search results – there are two template functions provided, one to return a count of the number of search results and the other to return an object with the actual results. You can see this in the template code provided, where data is the response from the API:

numberOfResultsMapper = function(data){
   return data.length
};

docsMapper = function(data){
  let docs = [];
  for (let doc of data) {
     docs.push ({
       id: doc.publication_id,
       title: doc.title,
     });
  }
  return docs;
};

Note that this code assumes that data is a nice tidy bit of JSON, containing a set of result documents – this is certainly not what we’re getting back from the Brakes website! We’re going to have to add a Javascript function to extract our search results from the raw HTML we saw above and turn them into a format Quepid understands.

Extracting search results – the fiddly bit

In this case I’ve used some simple string manipulation to chop the search results out of the returned HTML and then Javascript’s eval() function to turn them into a Javascript object. I can’t pretend to be a Javascript expert, so my code may not look pretty but it does work, and you can see the various steps.

When you try to do the same for another, different website, you’ll need to write your own code to reformat whatever HTML is returned – perhaps you’ll have to strip out certain characters, traverse the DOM or whatever. I leave that as an exercise for the reader!

For debugging, Quepid provides a simple syntax check with a red indicator at top left of the Javascript box when something is wrong. If you need to trace what you’re doing when trying to use your endpoint in Quepid, use Javascript’s console.log() function and Developer Tools to view the output. Here’s my Javascript with an extractor function:

myExtractor = function(data){
	const starttoken = "window.productListObject = [";  // the start of the Javascript block containing search results
	const endtoken = "]"; // the end of this block
	obj = [];
	str = data; // get all our HTML
	str2 = str.substring( str.indexOf(starttoken)); // chop off the start
	str3 = str2.substring(0,str2.indexOf(endtoken)+endtoken.length); // chop off the end
	obj = eval(str3); // turn this into an object
//	console.log(obj) // << use this to check the structure!
	return obj;
};

numberOfResultsMapper = function(data){
  myExtractor(data).length;
};

docsMapper = function(data){
  let docs = [];
  for (let doc of myExtractor(data)) {
	docs.push ({
 	id: doc.id,
 	title: doc.name, // map the object names into Quepid fields
 	price: doc.price,
 	product_id: doc.id
   });
  }
  return docs;
};

To make sure you’ve created an object that Quepid expects you can even print the object to the console (see the commented out line above marked with << , just before the end of my extractor function):

check we can import data into Quepid to test search relevance

Creating a Quepid Case

We can now create a test case using our new endpoint to talk to the Brakes’ website. From the Quepid top menu, select ‘Relevancy Cases’ and ‘Create a case’ to start the wizard. Give your case a name, select the endpoint we just created. Select fields to match the ones we used above:

Add loaf as our first query. You’ll now get a stern warning:

To fix this, click ‘Finish’ and now the button. We need to tell Quepid how to send queries to the Brakes website in the correct format: remember we noted above that this is text=loaf – we replace ‘loaf’ with a special token that Quepid will use for each query:

Click ‘Rerun my Searches!’ at the bottom and we should now see some search results ready for rating – if this doesn’t work and you get an error, the Search Endpoint isn’t working correctly and you’ll need to go back and fiddle with Javascript – remember to pop open the developer tools to see more details of what happened. Here’s what you want to see:

Note that we’re only showing the information we could easily extract from the HTML – no pictures or links to actual products – and there’s no Explain data on the right to show us why this result matched, as would be provided if we were talking directly to some search engines like Solr. There are also a few odd characters in the titles! However, we can now rate these results and get a metric for how good our ‘loaf’ search is (here we’re using nDCG) and also add some more queries to this Case which will be run automatically on the Brakes’ site:

We can now try some different search configurations – and see if we can achieve better scores! Quepid can re-run all the tests automatically and re-uses any previous ratings.

Enhancements

Those with superior Javascript skills to mine should be able to come up with much better ways to extract the data Quepid needs – pictures are very useful when rating, for example, but I couldn’t figure out where they were being returned on the Brakes site. Many sites will return search results in HTML blocks which will require a lot more parsing – perhaps with the native DOMParser interface. Again, I suggest you use the developer console for debugging, you’re going to need it!

Do bear in mind that the documentation for Search Endpoints is rather thin. You may be able to get support in the #quepid channel in Relevance Slack where the Quepid developers hang out (thanks Eric Pugh for reviewing this blog and for stewarding Quepid).

Figuring out which queries to test is another issue – if you have access to query logs, or a list of problematic queries for a particular site, you should be able to build a test set. Sampling is a good approach.

Now we can hook up Quepid to any website, even AI-powered search is testable – we really don’t care if search is lexical, semantic or a hybrid of both. We could even use Quepid to rate the results of a Retrieval Augmented Generation (RAG) system, where just a single answer is generated.

Remember that sending a large number of automated searches to a website may make you look like a bot, and risk you getting your or Quepid’s IP blocked – so let’s be careful out there!

Test search relevance on any website

I’ve shown how we can use Quepid’s Search Endpoints feature and a scrap of Javascript to test search relevance on potentially any website. If you’d like to start measuring how good, or bad, your search is – without bothering your developers – get in touch and I can show you how to develop effective processes and tools to iteratively improve search quality.

Hacker Vectors by Vecteezy

Enjoyed reading? Share it with others: