-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace UUIDs, implement batch insert #7
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at individual commits, and left a bunch of suggestions and questions scattered around.
Some might be irrelevant given more context in followup commits though.
pub trait Insertable<const FIELD_COUNT: usize> { | ||
const INSERT_QUERY_PRELUDE: &'static str; | ||
|
||
const INSERT_PLACEHOLDER: &'static str; | ||
|
||
fn param_bindings(&self) -> [&dyn rusqlite::ToSql; FIELD_COUNT]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FIELD_COUNT
is an input generic, which makes it a bit weird when you want to use it, as you have to explicitly specify <Ty as Insertable<XXX>>::insert(&value, …)
.
I believe if you turn the FIELD_COUNT
into an output associated const, similar to the other str
s, you should be able to just refer to it inside the fn like so:
fn param_bindings(&self) -> [&dyn rusqlite::ToSql; Self::FIELD_COUNT];
Although I’m a bit unsure if that actually works that way, maybe its not possible to use the associated const in that position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like #![feature(generic_const_exprs)]
would allow this, but idk if we wanna commit harder to requiring nightly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I believe generic_const_exprs
still has a long and bumpy road ahead of itself, so might be fair to avoid it.
Otherwise, we don’t have to optimize for perf that hard vs developer ergonomics, so its perfectly fine to allocate and use Vec<&dyn T>
.
-- This should be set to the hash of the `name` column so that we can | ||
-- distribute processing across multiple different hosts and they will | ||
-- all come up with the same ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned concerns around id collisions. Both hashes and random ints have potential for collision (although negligibly small).
Here in particular, context_type
should ideally be part of the hash as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you're right, but as a separate change (since this PR is pretty big) we probably want to get rid of ContextType
and rename context
to label
or test_case
or something which you also observed
c26f8f3
to
c434444
Compare
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## main #7 +/- ##
==========================================
+ Coverage 98.10% 99.21% +1.10%
==========================================
Files 17 17
Lines 6171 7249 +1078
==========================================
+ Hits 6054 7192 +1138
+ Misses 117 57 -60 ☔ View full report in Codecov by Sentry. |
i think eliminating the suggestions were all great, the ones i didn't take were because i couldn't get them building after fussing with them for a minute or some specific circumstance means i think the code should stay as it is. |
let mut models: Vec<LineSessionModels> = report_lines | ||
.iter() | ||
.flat_map(|line| create_model_sets_for_report_line(line, ctx)) | ||
.collect::<Vec<LineSessionModels>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already have a type annotation on models
, so no need to repeat that in the call to collect
. It would also be enough to leave a placeholder like so: models: Vec<_>
, as the collect
call only needs to know which collection to collect into, the inner T
can be inferred.
you can look at the individual commits to make reviewing easier
there are four main changes in this stack:
Insertable
traitUpload
context type /UploadDetails
table is replaced with a dedicatedRawUpload
modelas a result, my sample report went from taking 30s to process to taking 8s
bulk inserting UUIDs is painful because you can't benefit from any locality when updating indices for each new record. measurement models now have a joint PK instead:
raw_upload_id
, which is effectively constant, andlocal_*_id
, which is auto-incremented. it's not a UUID anymore, but the new PK is still unique enough that we can merge reports by blindly concatenating tables, and it's much faster to insertthere's more to do for perf; journaling/synchronization, WITHOUT ROWID, blah blah blah
sorry this is a mess; i started this stack before anyone else would be looking at this project. with @Swatinem coming aboard, i'll organize changes to be more reviewable in the future