-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor performance of get_subjects() even when using indexing without metadata #940
Comments
We just merged a bug-fix to the indexer (#936). Would you be willing to install master and see if the issue remains? The bug was related to traversing ignored directories despite them being ignored.
|
I just tried and unfortunately get_subjects() is still taking the same amount of time. |
Can you elaborate on what you mean by "querying for a single subject"? I do recall this being rather slow at times, so I'll take a look. My guess is that it has to do with taking the "unique" set of subjects, and it's doing too many pre-querying to do so. |
So on a random dataset I had, I couldn't quite replicate your issue, but I do see that it takes 2 orders of magnitudes more to do
The reason for this is that I'll see if there's any quick fixes for this, but if not there is a major refactor coming up which will speed up pybids significantly: #863 |
Yup, so by querying a single subject, I mean the same thing as in your timing example - using As far as replicating my issue, I believe your test showing a two order of magnitude difference in I appreciate you looking into this for me! |
Maybe there is some way for me to index on 'id'? |
Looks like it's slow because of this line: Line 679 in 1ba4f66
Essentially its asking for the meta-data of every indexed file, to later filter by only those that have the target (in this case
|
BTW, you are totally right that this can be instantaneous. If you run this query, you can get it in 2ms:
The problem is that I'm about to submit a patch that at least speeds it up 20x, by getting each file's entity using a faster method. |
This is excellent, thanks so much for diving into this, explaining the situation, and providing a patch. I have been working to transition a mid-size application away from interacting with BIDS directories through the file system, and instead through pybids exclusively. However, in the use case of getting all subject ids (a very frequent case), a simple file traversal had been much faster than pybids. Speeding up this crucial operation will smooth out the user experience where we are using it, and make me much more enthusiastic about implementing pybids where we haven't yet. |
I'm glad. Like I said though, we are working on a major refactor that should make an even bigger difference, so stay tuned. |
I am puzzled by how slow get_subjects() is for me. It takes about 10 seconds to retrieve the subject ids for a 67 subject BIDS directory. This is after some improvement achieved by using a saved index file and not indexing metadata - two of the most common performance enhancement suggestions I see here.
I don't understand why gathering the subject IDs in such a small database wouldn't be almost instantaneous. Meanwhile, when I query for a single subject, the result is as quick as you'd expect.
Here is how I fetch my layout, using version 15.5:
Is there some way I can configure the indexer to improve my performance here, or some other solution?
Thanks!
The text was updated successfully, but these errors were encountered: