-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-46248: Few small improvements for the new query system #1146
Conversation
When building queries on top of materialized tables we can avoid joining dataset searches that are already included in materializations, except when we need something from tags/calibs tables.
The indices are created on combinations of dimensions. This should help with the performance of follow-up queries in graph builder.
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #1146 +/- ##
==========================================
+ Coverage 89.46% 89.47% +0.01%
==========================================
Files 366 366
Lines 49068 49115 +47
Branches 5950 5955 +5
==========================================
+ Hits 43897 43944 +47
Misses 3753 3753
Partials 1418 1418 ☔ View full report in Codecov by Sentry. |
This is a non-public API for now, solely for graph builder use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client-server stuff and model changes look fine to me.
self.db.temporary_table(make_table_spec(plan.final_columns, self.db, plan.postprocessing)) | ||
self.db.temporary_table( | ||
make_table_spec(plan.final_columns, self.db, plan.postprocessing, make_indices=True) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall you saying earlier that adding the indexes before doing the inserts sometimes made the inserts much more expensive. Did that get resolved by other changes, or is it just an overall win even if it's occasionally worse.
And would it be worth it to try to move index creation to just after insert in this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a false alarm - I have added indices and filling temp table became terribly slow, the real reason was that query planner went into a dumb state on the same exact query (this was before I dropped DISTINCT). With disabled DISTINCT I have not noticed any significant slow down due to indices.
It is true that for bulk data insert it is better to load the data before adding indices. I tried to implement it, but it is not trivial with our code now, the simple idea did not work, and I did not want to start restructuring a bunch of code just for that. Still, if we see it later, it should be possible to delay index creation until after filling the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case we want to do index creation separately later - my initial idea was that I run the same _convertTableSpec
but do not add Index
instances to Table
constructor arguments, instead the indices are returned as a separate list so you can later iterate and create them manually. It happens, that even if you do not add an Index
to the Table()
arguments, the metadata already knows about all indices for that table (as Index
is created for a specific table) and will create that index anyways. So delaying just did not work.
Checklist
doc/changes
configs/old_dimensions