-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hotfix cggnn scripts, streamline calls #208
Conversation
@jimmymathews I don't think When calling a single specimen, it seems to pull sparse entries from every study in the database, and then tries to match the specimen against ever study it just pulled, resulting in an error. (I'm not wholly confident on the latter part of that statement, but something's erroring.) This may be a relic of when FME supported extraction from every study at once? I'll look into it. (The rest of Here's the full output:
|
I made two changes to A couple of notes:
I'm not planning to make any changes for now since it all Just Works™️, but it might be better for maintainability to so I will if I'm asked to. |
I think this is due to some relatively recently added optimizations to the SparseMatrixPuller and friends system. |
One more issue: somewhere in FME or |
continuous_also: bool = False, | ||
) -> CompressedDataArrays: | ||
if specimen is not None: | ||
study = StudyAccess(self.cursor).get_study_from_specimen(specimen) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work to specialize to just the one study in case a specimen is provided.
Looks like FME doesn't preserve the I'd like to change it for index by HSID. Would that break anything downstream? SPT/spatialprofilingtoolbox/db/feature_matrix_extractor.py Lines 178 to 205 in eb3e928
|
I seem to remember deliberately omitting this ID from the feature matrix output. Seemed like unnecessary coupling between database internal format and the output data structure. |
I need the histological structure IDs for cell importance scoring because otherwise there's no way to map the CG-GNN output back to the cell it came from. So we're good to reverse this intentional omission? Or add a histological structure ID column instead of indexing on it. That might break even more things, though. |
Yeah, give it a try. |
Do you happen to know if you dropped the HSID in EDIT: I think it's in |
The cells are supposed to be sorted by histological structure ID, because of this line:
( So you could just pull all the histological structure IDs for a specimen and sort them. |
I'm leveraging this assumption for this but we've already used it elsewhere (e.g., StudyAccess): each specimen string is uniquely identified to a study. In other words, two studies in the same database are not allowed to have specimens with the same name. This could be a problem in the future. |
The new commit assumes that the cells are in the order specified in the Puller and assumes that specimen name strings are uniquely attached to a single study. I have my reservations about doing it this way instead of editing the Puller to retain HSIDs instead of discarding them, but this Just Works™️ so maybe we'll worry about it later. |
Editing the Puller to retain HSIDs as you say is fine with me, just might not be worth the effort at the moment. All tests pass for me, I'm ready to merge. |
While using
set cggnn explore_classes
andextract
for the first time, I noticed a few fixes and improvements I could make.explore_classes
didn't pass the config file location to the cursor correctly, resulting in an error. This is fixed.extract
now organizes files it extracts related to a study into a directory with that study's name, for easy organization.extract
does nothing if the files it intends to create are already created.extract
is amended to only have FME pull specimens that are of the strata the user requested, instead of all specimens from a study