Wait... Why is there an Eval Framework when we are building a hybrid optimizer for OpenSearch?
Recently I was asked about where the Evaluation Framework that we are building for OpenSearch came from, since our goal when we set out was to build an Optimizer for Hybrid Search!
For those of you who haven’t heard about the new hotness that is Hybrid, it’s basically a way to query your keyword based index in parallel to your semantic (neural) index, and then bring the results together. Yes, this is basically what we used to do in the dark ages via Federated Search ;-). It’s another case of problems we faced in the past coming back around, only with some fresh eyes and a new name.
Lucas Jeanniot wrote a great blog post on how to do this blending in OpenSearch in his blog post An overview of Rank Normalization in in Hybrid Search, I highly recommend you read it.
Today, when we deploy Hybrid Search we have a number of rules of thumb for balancing the weighting of keyword and neural search. You can play with these weightings, just like I do in this notebook https://github.com/o19s/chorus-opensearch-edition/blob/45fc2c7041957adb20fc62e1a0d4195b1a8edc47/katas/005_2_run_a_hybrid_search.ipynb, however what value should you use?
Oh wait… To optimize something we need an objective function. To define if we are getting closer or getting further away, we need to know if our Optimizer is actually improving search or not. Which means…. Drum Roll please.. We need a way of objectively evaluating our Search Quality, which means introducing some sort of Search Quality Evaluation Framework.
We are in throes of development, with the coding done to take User Behavior Insights data and converting it into Implicit Judgements completed. We’ve also got some decent dashboards for looking at classic Information Retrieval Metrics applied over time to a set of queries completed as well. We’re getting closer to actually being able to run those query sets on a recurring basic to produce the quality metrics.
This framework won’t be everything we want it to be in this first phase... It’ll be limited by the existing user interface elements available to us in OpenSearch Dashboards. It’ll also be somewhat hampered by our lack of truly robust job processing capability built into OpenSearch…. (And yes, I’ve heard the “just integrate with Spark”, but I want a out-of-the-box capability!). However, it WILL mean that in 2.19 you will be able to generate implicit judgements and use them to monitor your search quality over time, oh yes, and also use it to optimize your hybrid search weightings!