Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Concurrent executing query in InnerHitsPhase #16878

Open
kkewwei opened this issue Dec 18, 2024 · 8 comments · May be fixed by #16937
Open

[Feature Request] Concurrent executing query in InnerHitsPhase #16878

kkewwei opened this issue Dec 18, 2024 · 8 comments · May be fixed by #16937
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance

Comments

@kkewwei
Copy link
Contributor

kkewwei commented Dec 18, 2024

Is your feature request related to a problem? Please describe

In our product environment, we utilize the nested field and inner_hits, the source contains excessive number of fields, and inner_hits query will match about 80 sub documents within a parent document. we find that the FetchPhase is costly, it will cost 3s+, but when we delete the inner_hits, it will only cost 700ms.

In FetchPhase, each document will execute the InnerHitsPhase serially (regards as sub query phase), if we need to
fetch values from multi documents in FetchPahse, then the overall fetch phase will be very slow.

Describe the solution you'd like

Concurrent executing inner_hits in FetchPhase.

In some case, if the source is too large, it will also cost much time to fetch document in FetchPhase, It also seem necessary to fetch doc concurrently in some scenarios.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

@kkewwei kkewwei added enhancement Enhancement or improvement to existing feature or request untriaged labels Dec 18, 2024
@sandeshkr419
Copy link
Contributor

@jed326 / @sohami - Do you guys have some thoughts here?

@jed326
Copy link
Collaborator

jed326 commented Dec 19, 2024

Thanks for the feature request @kkewwei!

Concurrent fetch is something I had briefly discussed with @sohami before, but it's not something we saw any use cases for (until now). When we were looking at concurrent search the performance on aggregations is what we were focusing on, and in a lot of cases users will do aggregations with either size=0 or just the default size so in those so concurrent fetch was not really needed or useful.

I'd be happy to review any designs or PRs for this!

@kkewwei
Copy link
Contributor Author

kkewwei commented Dec 19, 2024

@jed326 / @sohami - Do you guys have some thoughts here?

Thanks for the feature request @kkewwei!

Concurrent fetch is something I had briefly discussed with @sohami before, but it's not something we saw any use cases for (until now). When we were looking at concurrent search the performance on aggregations is what we were focusing on, and in a lot of cases users will do aggregations with either size=0 or just the default size so in those so concurrent fetch was not really needed or useful.

I'd be happy to review any designs or PRs for this!

@jed326, If you haven't started on it, can you assign it to me? I'm pleased to have a try. Of course, I will make a draft first

@jed326
Copy link
Collaborator

jed326 commented Dec 19, 2024

can you assign it to me?

@kkewwei done!

@kkewwei
Copy link
Contributor Author

kkewwei commented Dec 26, 2024

It seems that serial execution of each document's innerhit can speed up due to querycache, while concurrent executing will result in performance degradation. I'm still testing.

@kkewwei
Copy link
Contributor Author

kkewwei commented Dec 28, 2024

@jed326 @sohami, during I experimentation with concurrent execution in the innerhit phase, I discovered another logic that requires optimization.

It is known that TermQuery is never cached in QueryCache, but innerhit phase extensively utilizes TermQuery, resulting in inefficient(https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/UsageTrackingQueryCachingPolicy.java#L57). The following improvements can be made:

  1. In findNestedObjectMapper, if context is stance of InnerHitSubContext, we are able to rapidly identify the ObjectMapper, thus reducing the number of TermQuery executions.
  2. In getInternalNestedIdentity, childFilter, which is a simple TermQuery, will be cached by bitsetFilterCache, which
    can also help reduce redundant query executions.

The details is as follows, I will pull a pr about this.
kkewwei@9a47717de54

I have test several times, and restarted the node each time to eliminate cache interference. The detail tooks(ms) is as follows:
Baseline(main branch), avg: 836ms
841
853
828
774
811
914

Optimized:, avg: 551ms
569
506
666
522
526
518

Took descrease 34%

@jed326
Copy link
Collaborator

jed326 commented Dec 30, 2024

Thanks @kkewwei! Feel free to trigger some performance benchmarks on your PR as well, see: https://github.com/opensearch-project/OpenSearch/blob/main/PERFORMANCE_BENCHMARKS.md

@kkewwei kkewwei linked a pull request Jan 3, 2025 that will close this issue
3 tasks
@kkewwei
Copy link
Contributor Author

kkewwei commented Jan 7, 2025

In our product, offset in innerhit serves no practical purpose. When dealing with complex nested structures, computing it will cost much time, In the next step, I will test to gauge the potential time savings by omitting the offset computation. If the performance improves a lot, I will try to adding parameter to bypass computing offset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

4 participants