-
Notifications
You must be signed in to change notification settings - Fork 927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support to handle .parquet
output from Vizgen
#7190
base: develop
Are you sure you want to change the base?
Conversation
Thanks for the comment.
|
..just to confirm: |
R/preprocessing.R
Outdated
workers.total = 12, | ||
DTthreads.pct = NULL, | ||
coord.space = "micron", | ||
use.cellpose.out = TRUE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we can infer use.cellpose.out
based on the presence of cell_boundaries.parquet
files in data.dir
? I want to make sure all of the older Vizgen outputs are still supported without requiring this parameter to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be possible.
Basically it will read cellpose_micron_space.parquet
or cellpose_mosaic_space.parquet
, else the older Vizgen output, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like that, but my understanding was the cellpose outputs generated by Vizgen would are always named cell_boundaries.parquet
based on their user guide: https://vizgen.com/wp-content/uploads/2022/12/91600001_MERSCOPE-Instrument-User-Guide_Rev-F.pdf (see page 45 and 46). Are your parquet output files generated by Vizgen, and if so, do you know what version of their software was used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our data was generated by Vizgen, then we got the link to download them. I will check with them again but it's probably the cell_boundaries.parquet
and cell_by_gene.csv
and cell_metadata.csv
files would be the official one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update the things we discussed here, plus some additional sanity checks are needed on cell_boundaries.parquet
see recent discussion #7080
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alikhuseynov! Vizgen has also agreed to review this PR and make any necessary changes so users can handle cellpose segmentations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presence of a parquet file doesn't mean its a cellpose segmentation. Parquet files could be generated by either of the segmentation algorithms (see pg 29 of guide above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presence of a parquet file doesn't mean its a cellpose segmentation. Parquet files could be generated by either of the segmentation algorithms (see pg 29 of guide above).
yes, the idea is to allow users to load segmentation data (Cellpose or not) if .parquet
is present. Some may have older data, ie .hdf5
files.
R/convenience.R
Outdated
) | ||
# add coords to seurat object | ||
LoadVizgen <- function(data.dir, fov = "vz", assay = 'Vizgen', | ||
add.zIndex = TRUE, update.object = TRUE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would useful to include the filter = '^Blank-' to filter out the blank genes.
Also, need to have mol.type = 'microns' as an input variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..could add optionallyfilter = '^Blank-'
, some users might want to keep the background genes for later checks.
..mol.type
arg can be used in LoadVizgen()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree some users may want to look at them, but they should be removed before downstream analysis (and this way it works with the current Seurat Vizgen tutorial).
mol.type is also called outside of ReadVizgen() and it's producing errors for me when not loaded into the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, will add filter = "^Blank-"
as default, if users want to keep them, then they should set filter = NA_character_
before loading data.
..will make sure that mol.type
works without errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working off your latest commit I added it mol.type = 'microns
to LoadVizgen() arguments and it worked for all my test cases so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! For the next commit, I will make sure mol.type
arg. work without specifying it in LoadVizgen()
..reads old-new data, optimized parallelization, few bugs fixed
..supports args: `mol.type`, `filter`, `z` added `verbose` for optional verbosity
The last commit should fix most of the bugs, supports old and new Vizgen data outputs, and optimized parallelization Test run would be this
sessionInfo()
|
This worked on all my test data and scenarios. The only other suggestion I would have is making |
.parquet
from Vizgen.parquet
output from Vizgen
Great, thanks! |
Any updates for this pull request? |
..sorry I was a bit busy, will be adding a new arg |
I think it looks great and looking forward to it's use |
..resolve some small conflict for `Read10X_Image()`
..adding of molecules to the object is optional `add.molecules = TRUE`
I did last updates, tested loading functions and tried to resolve some conflicts in |
Dear alikhuseynov,
|
Hi there, I haven't done any changes to Alternatively, plot your data first with |
Thank you for your kind suggestion. I've tried to swap the |
Hi alikhuseynov,
Maybe this pull request should be rebased on the 'seurat5' branch? |
All PRs go to the |
few small edits for cleaner code.
..inspired by [`SpatialFeatureExperiment`](https://github.com/pachterlab/SpatialFeatureExperiment) using`sf` package to work/filter segmentation geometries. Updates: - adding support to efficiently filter segmentation polygons (removing small artifacts etc..) - added helper function `.filter_polygons()` (a bit modified version from `SpatialFeatureExperiment:::.filter_polygons`) for filtering given `min.area` - minimal polygon area as a threshold - allow for multiple segmentation polygons per cell (..still using single z-plane)
..optimized parallelization `BiocParallel` for `.filter_polygons()`, ie using 2 workers less that total provided workers
Thanks @lambdamoses, I got inspired by Updates for this PR:
How to use example -> make Seurat objects (essentially the same with a small update) @dfhannum if you would like re-run it on some of your Vizgen data as before, to ensure the robustness, would be awesome. |
Hi Seurat Team, I am with Vizgen. I just tested @AustinHartman @dfhannum @alikhuseynov @tangboyun Thanks, |
Thanks @swangvzg! I'm glad it works.. It is still on @swangvzg & @dfhannum, btw, we are developing Bioc. |
I can confirm that it work with |
R/convenience.R
Outdated
@@ -20,11 +20,11 @@ NULL | |||
#' @rdname ReadAkoya | |||
#' | |||
LoadAkoya <- function( | |||
filename, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend removing all of the space-only changes so that reviewers can focus on changes that actually make a functional difference. Thank you so much for all of your work on this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think most of the space-only changes are now removed. Thanks
Dear Seurat Team,
Here is the PR to handle parquet files (see #7080)
Example to make Seurat objects from Cellpose output of Vizgen data..
Let me know if additional edits are needed..
Thanks
A.