Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Template query #16818

Merged
merged 7 commits into from
Jan 23, 2025
Merged

Conversation

mingshl
Copy link
Contributor

@mingshl mingshl commented Dec 9, 2024

Description

This Pull Request introduces the following key enhancements:

1. New 'Template' Query Type:

Implements a new query type that holds an object of query content.
Supports placeholders within the query content.
Passes the query content to search request processing.

2. Query Rewrite Context Refactoring:

Query Rewrite Context now carries the pipelineContext variables after all search request processors processed.
Template queries resolve placeholders during query rewrites.
Inner queries are constructed dynamically based on resolved placeholders.

Related Issues

#16823
opensearch-project/ml-commons#3054

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@mingshl
Copy link
Contributor Author

mingshl commented Dec 9, 2024

@reta this PR addressed the issue #16823, can you please help move the issue to OpenSearch Project?

Copy link
Contributor

github-actions bot commented Dec 9, 2024

❌ Gradle check result for ff1acc0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@mingshl
Copy link
Contributor Author

mingshl commented Dec 10, 2024

updated git rebase HEAD~3 --signoff

Copy link
Contributor

❌ Gradle check result for cf97416: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@reta
Copy link
Collaborator

reta commented Dec 10, 2024

@reta this PR addressed the issue opensearch-project/OpenSearch#16823, can you please help move the issue to OpenSearch Project?

@mingshl I sadly cannot do it (not a maintainers on ml-commons), @dblock @andrross could you?

@mingshl
Copy link
Contributor Author

mingshl commented Dec 10, 2024

@reta this PR addressed the issue opensearch-project/OpenSearch#16823, can you please help move the issue to OpenSearch Project?

@mingshl I sadly cannot do it (not a maintainers on ml-commons), @dblock @andrross could you?

Thanks @reta. I will reach out.

@mingshl
Copy link
Contributor Author

mingshl commented Jan 14, 2025

resolved changelog conflict.

Copy link
Contributor

❌ Gradle check result for 86a56e1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@rishabhmaurya
Copy link
Contributor

copying my comment from RFC
@mingshl Thanks for adding this support. I do agree on its utility in core, however, the ways it is exposed looks a little complicated to me -

GET my-keyword/_search
{
  "query": {
    "template": {
      "term": {
        "text": {
          "value": "${ext.ml_inference.params.inference_result}", // this is the field generated from ml_inference search request processor 
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny"
      }
    }
},
  1. is ext some reserved keyword in search pipeline world? If not, we can be more precise say search_pipeline_ext.
  2. Assuming a template query can be associated with one or more request processors, for placeholders, we can probably remove redundant information and directly use something like - ${<processor_id>.<output_param>}, so this example will become ${ml_inference.inference_result}.
  3. For inputs params, we probably need type information too?
"search_pipeline_ext": {
  "<processor_id>": {
    "params": {
       "<param_name>": {
          "value": <value>,
          "type": <type>
        },
        // OR shorthand notion when type isn't provided explicitly
        "<param_name> : <param_value>"
     }
  }
}

@mingshl
Copy link
Contributor Author

mingshl commented Jan 14, 2025

copying my comment from RFC @mingshl Thanks for adding this support. I do agree on its utility in core, however, the ways it is exposed looks a little complicated to me -

GET my-keyword/_search
{
  "query": {
    "template": {
      "term": {
        "text": {
          "value": "${ext.ml_inference.params.inference_result}", // this is the field generated from ml_inference search request processor 
        }
      }
    }
  },
  "ext": {
    "ml_inference": {
      "params": {
        "text": "sunny"
      }
    }
},
  1. is ext some reserved keyword in search pipeline world? If not, we can be more precise say search_pipeline_ext.
  2. Assuming a template query can be associated with one or more request processors, for placeholders, we can probably remove redundant information and directly use something like - ${<processor_id>.<output_param>}, so this example will become ${ml_inference.inference_result}.
  3. For inputs params, we probably need type information too?
"search_pipeline_ext": {
  "<processor_id>": {
    "params": {
       "<param_name>": {
          "value": <value>,
          "type": <type>
        },
        // OR shorthand notion when type isn't provided explicitly
        "<param_name> : <param_value>"
     }
  }
}

I think the ext is not reserved for search pipeline only. Different plugins have different search extensions. for ml_inference search extension is living in ml-commons repo.

And theoretically the field in search extensions can be used for multiple processors.
for example, ext.ml_inference.params.text field can be used for one ml inference processor to generate embedding, and the second ml inference processor to detect language. It's not linked with one processor.

But I like the idea that in the ml_inference search extension, in the future, we can let user define the interface of variables,but probably the definition of search extension can be part of model interface. Because it's likely that different model use the same variable name but in different types. That can be a future enhancement. tagging @b4sjoo here as he is commiter for model interface in ml-commons. cc @ylwu-amzn

Copy link
Contributor

✅ Gradle check result for e5784fd: SUCCESS

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 82.64463% with 21 lines in your changes missing coverage. Please review.

Project coverage is 72.25%. Comparing base (c6dfc65) to head (fcc44d5).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
...pensearch/index/query/QueryCoordinatorContext.java 58.82% 7 Missing ⚠️
...g/opensearch/index/query/TemplateQueryBuilder.java 90.16% 3 Missing and 3 partials ⚠️
...pensearch/index/query/BaseQueryRewriteContext.java 88.57% 2 Missing and 2 partials ⚠️
...java/org/opensearch/index/query/QueryBuilders.java 0.00% 1 Missing ⚠️
...rg/opensearch/index/query/QueryRewriteContext.java 0.00% 1 Missing ⚠️
...rch/search/pipeline/PipelineProcessingContext.java 0.00% 1 Missing ⚠️
...g/opensearch/search/pipeline/PipelinedRequest.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main   #16818   +/-   ##
=========================================
  Coverage     72.24%   72.25%           
+ Complexity    65421    65403   -18     
=========================================
  Files          5306     5309    +3     
  Lines        304215   304300   +85     
  Branches      44118    44124    +6     
=========================================
+ Hits         219788   219866   +78     
+ Misses        66414    66377   -37     
- Partials      18013    18057   +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some concerns here:

  • breaking API
  • synchronization issues
  • I'm not sure your substitution is doing what you think it's doing (or it's doing something not expected by a casual reader)

msfroh and others added 7 commits January 23, 2025 11:18
Make QueryRewriteContext an interface. Rename existing impl to
BaseQueryRewriteContext. Create coordinator-level context called
QueryCoordinatorContext. It can expose pipelined search request,
including PipelineProcessingContext to rewrite methods.

Signed-off-by: Mingshi Liu <[email protected]>
Introduce template query that holds the content of query which can contain placeholders and can be filled by the variables from PipelineProcessingContext produced by search processors. This allows query rewrite by the search processors.
Signed-off-by: Mingshi Liu <[email protected]>
Signed-off-by: Mingshi Liu <[email protected]>

Add changelog

Signed-off-by: Mingshi Liu <[email protected]>
Signed-off-by: Mingshi Liu <[email protected]>
Signed-off-by: Mingshi Liu <[email protected]>
Signed-off-by: Mingshi Liu <[email protected]>
Copy link
Contributor

❌ Gradle check result for fcc44d5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for fcc44d5: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks for your patience!

@msfroh msfroh merged commit 13ab4ec into opensearch-project:main Jan 23, 2025
33 of 37 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jan 23, 2025
Introduce template query that holds the content of query which can contain placeholders and can be filled by the variables from PipelineProcessingContext produced by search processors. This allows query rewrite by the search processors.

---------

Signed-off-by: Mingshi Liu <[email protected]>
Co-authored-by: Michael Froh <[email protected]>
(cherry picked from commit 13ab4ec)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants