-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE][RFC] Querqy Refactor with Querqy Unplugged & Search Pipelines #184
Comments
Notes from our 2023-07-26 Public Meeting
|
The core benefit that Querqy gives its users is to maintain rules on a query level and to build a query tree that leads to clean scores. Parts of this are described here: https://opensourceconnections.com/blog/2021/10/19/fundamentals-of-query-rewriting-part-1-introduction-to-query-expansion/ For the retail area, it is very important that scores are clean. For instance, it should not make a difference whether a multiword-synonym has more or less terms (e.g. apple smartphone vs. iphone). Retailers normally have thousands of business rules for various different reasons, which cannot be implemented in a generic manner, such as
|
Retail search is quite specific regarding two aspects:
You have a lot of quick wins in this area if you are flexible to specifically deal with short-head queries. |
Thanks @JohannesDaniel. I think I get what you're saying. I was thinking about this in terms of how we implement a search processor and less about the specifics of retail query rewriting. For example, when reranking search results, there are many ways to do this: multiple services, scripting inside OpenSearch, Learning-to-Rank. But, the core pattern is the same: take search results, manipulate them (if needed) to be sent to the reranker if the OpenSearch hits need to be transformed, for example, rerank, return the results, transform again if needed, and then profit. ;) Of course, something like Learning-to-Rank also requires judgements, feature generation, and feature logging, but those can be handled elsewhere and integrated with clean APIs. So, when we think about how anything should be integrated into a Search Pipeline, I want to understand if there is a layer of abstraction we can introduce to make it possible to have different types of replace rewriters so OpenSearch is able to accommodate other types of rewriters for other use cases that may not require specific rulesets. This is something we can discover in the design/prototyping of the processors themselves. |
You cannot maintain tons of business rules with painless script. Furthermore, Querqy checks whether queries meet certain attributes against hundreds or thousands of rules (usually, retail companies maintain such an amount of business rules). This requires specific optimizations and proper rewriting. |
When a user enters a search, they have an intent in mind about what they want to find. This intent is typed in their own words and may not match the text in the search index. An area of search meant to assist with interpretation is query understanding. A technique in query understanding is called query rewriting. Before the index is searched, the query is examined to provide the search with more context and then the query is rewritten with this new context. This RFC suggests ways to integrate a specific library used for query rewriting and also attempts to define proposals for more generic interfaces for query rewriting in search pipelines so that builders can bring their own rewriting logic while still taking advantage of the benefits of search pipelines - logical separation,
Creating rules to refine queries in search applications is a standard practice. Users enter free text search queries with the intent to find something specific. For example, a search query on a site selling home goods could be “gas grill weber.” Through query rewriting, the engine could interpret “weber” as the brand Weber and rewrite the query to boost “gas grill” matches where the Brand field in the index is “Weber.” My assumption and my experience tells me that many search application builders do this with work with custom code or don’t know that they could do this type of rewrites at all. Querqy was developed as a plugin for ElasticSearch and Solr to help centralize and reduce complexity of rewriting. Later it was ported to OpenSearch. The plugin currently lives in the querqy Github repo and does not get upgraded with each release because this is difficult to do unless the plugin is in the opensearch-project org and has access to all of the CI infrastructure as other plugins.
Querqy comes with these rewriters that may be usable implemented as a SearchRequestProcessor:
(copied & pasted from https://docs.querqy.org/querqy/rewriters/common-rules.html)
Common Rules Rewriter
Query-dependent rules for synonyms, result boosting (up/down), filters; ‘decorate’ result with additional information.
Replace Rewriter
Replace query terms. Used as a query normalisation step, usually applied before the query is processed further, for example, before the Common Rules Rewriter is applied
Word Break Rewriter
(De)compounds query tokens. Splits compound words or creates compounds from separate tokens.
Number-Unit Rewriter
Recognises numerical values and units of measurement in the query and matches them with indexed fields. Allows for range matches and boosting of the exactly matching value.
Shingle Rewriter
Creates shingles (compounds) from adjacent query tokens and adds them as synonyms.
I propose that OpenSearch's Search Pipelines feature (https://opensearch.org/docs/latest/search-plugins/search-pipelines/index/) in combination with Querqy's library based implementation, Querqy Unplugged: https://github.com/querqy/querqy-unplugged be used to integrate multiple query rewriting components as processors. So, this could also reveal a clearer way to bring backend functionality into OpenSearch without having to move repositories into the project itself:
Benefits
Drawbacks
Other possibilities
Questions:
The text was updated successfully, but these errors were encountered: