Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-46248: Few small improvements for the new query system #1146

Merged
merged 5 commits into from
Jan 31, 2025

Conversation

andy-slac
Copy link
Contributor

Checklist

  • ran Jenkins
  • added a release note for user-visible changes to doc/changes
  • (if changing dimensions.yaml) make a copy of dimensions.yaml in configs/old_dimensions

When building queries on top of materialized tables we can avoid
joining dataset searches that are already included in materializations,
except when we need something from tags/calibs tables.
The indices are created on combinations of dimensions. This should help
with the performance of follow-up queries in graph builder.
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.47%. Comparing base (d858534) to head (5903c60).
Report is 6 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1146      +/-   ##
==========================================
+ Coverage   89.46%   89.47%   +0.01%     
==========================================
  Files         366      366              
  Lines       49068    49115      +47     
  Branches     5950     5955       +5     
==========================================
+ Hits        43897    43944      +47     
  Misses       3753     3753              
  Partials     1418     1418              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This is a non-public API for now, solely for graph builder use.
Copy link
Contributor

@dhirving dhirving left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client-server stuff and model changes look fine to me.

self.db.temporary_table(make_table_spec(plan.final_columns, self.db, plan.postprocessing))
self.db.temporary_table(
make_table_spec(plan.final_columns, self.db, plan.postprocessing, make_indices=True)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall you saying earlier that adding the indexes before doing the inserts sometimes made the inserts much more expensive. Did that get resolved by other changes, or is it just an overall win even if it's occasionally worse.

And would it be worth it to try to move index creation to just after insert in this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a false alarm - I have added indices and filling temp table became terribly slow, the real reason was that query planner went into a dumb state on the same exact query (this was before I dropped DISTINCT). With disabled DISTINCT I have not noticed any significant slow down due to indices.
It is true that for bulk data insert it is better to load the data before adding indices. I tried to implement it, but it is not trivial with our code now, the simple idea did not work, and I did not want to start restructuring a bunch of code just for that. Still, if we see it later, it should be possible to delay index creation until after filling the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case we want to do index creation separately later - my initial idea was that I run the same _convertTableSpec but do not add Index instances to Table constructor arguments, instead the indices are returned as a separate list so you can later iterate and create them manually. It happens, that even if you do not add an Index to the Table() arguments, the metadata already knows about all indices for that table (as Index is created for a specific table) and will create that index anyways. So delaying just did not work.

@andy-slac andy-slac merged commit 440f5ac into main Jan 31, 2025
19 checks passed
@andy-slac andy-slac deleted the tickets/DM-46248 branch January 31, 2025 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants