-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up BIDSLayout.get(return_type='id', ...) #942
Conversation
Codecov ReportBase: 86.32% // Head: 86.35% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #942 +/- ##
==========================================
+ Coverage 86.32% 86.35% +0.02%
==========================================
Files 35 35
Lines 4022 4023 +1
Branches 973 974 +1
==========================================
+ Hits 3472 3474 +2
Misses 355 355
+ Partials 195 194 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks entirely sensible. Coverage looks reasonable. I assume you've tested it with non-entities like get_RepetitionTimes()
? Or, even better, we have tests that cover that case?
Co-authored-by: Chris Markiewicz <[email protected]>
…nto enh/speed_up_rtid
I just tested it and it works, but I added an additional Will merge once tests pass. |
Fixes #940
The issue is that
BIDSFile.get_entities
is somewhat slow, because it has to do a SQL query. A single query is fine, but in this instance its doing it times the number of files in the result (which with no other argument, is the total number of files indexed).Instead, I use
BIDSFile.entities
, which is much faster, because its anassociation_proxy
. Essentially, instead of doing a complex query that filters, it just pulls all the entities for every file.This results in 36x speedup for
get_subjects
in my example dataset (660ms -> 18ms)The crazy thing is this can be sped up even further, by doing a direct SQL query such as:
This only takes 2.14ms, which is over 300x faster.
The only issue is the latter query bypasses a lot of the Python logic built into pybids classes, which do somewhat important things such as formatting run ids as
PaddedInt
and ensuring the are returned in the correct type, so I was not able to use this implementation.On the whole, this makes me think that a lot of things that are "slow with pybids" need not be so...