Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for alevin-fry #579

Open
rob-p opened this issue Apr 2, 2022 · 5 comments
Open

Add support for alevin-fry #579

rob-p opened this issue Apr 2, 2022 · 5 comments

Comments

@rob-p
Copy link

rob-p commented Apr 2, 2022

Hi,

Thank you for developing this useful tool. I just came across the paper in Nature Communications; congratulations! My lab develops the alevin tool, which is currently supported in singleCellTK. We have been working for some time on a successor to alevin called alevin-fry, and the published version of the corresponding paper has (very) recently appeared.

It would be great to have support in singleCellTK for alevin-fry. Given that it reports base-level quality control metrics in a way that should be easy to parse and consume, this would hopefully be straightforward. Further, myself and some of the other alevin-fry developers (namely the lead author @DongzeHE) would be happy to help implement support for alevin-fry in singleCellTK. Is there any developer documentation, or guidance on how to write a module for singleCellTK to allow input from a new pre-processing tool? Thanks, and congrats again on the paper and tool!

Best,
Rob

@joshua-d-campbell
Copy link
Collaborator

Hi @rob-p, thanks for reaching out and congrats on the new version of Alevin and the corresponding manuscript! We would be happy and excited to work with you on this. Our current function for importing Alevin data which can be used as a starting template is here. We will try to run alevin-fry to examine the output in more detail. If you can point to example output, that would also be helpful.

Where are the QC metrics output? Are they all cell-level QC metrics or do you have any sample-level (e.g. percentage of aligned reads)? In our recent release, we started importing sample-level QC metrics from the CellRanger and STARsolo output. They are stored in the metadata slot of the SCE so they can be plotted in the QC reports as well.

Josh

@rob-p
Copy link
Author

rob-p commented Apr 5, 2022

Hi @joshua-d-campbell,

Thanks; I'm glad to hear this! Right now, the best place to look to see how to collect quality metrics from alevin-fry is probably @csoneson's AlevinQC package; specifically the .readAlevinFryQC_v0.5.0 function. This reads in pretty much all of the relevant QC info that is output by a full end-to-end run of salmon alevin -> alevin-fry generate-permit-list -> alevin-fry collate -> alevin-fry quant. The mapping directory will also contain some sample-level metrics in addition to the cell-level metrics output by fry during quantification. If you have a chance to take a look at what is there, let us know what you think the best way to proceed is.

Thanks!
Rob

@csoneson
Copy link

csoneson commented Apr 5, 2022

Hello! This looks great, I'd be happy to make adaptations also to alevinQC if needed. Just a minor note - the top-level user-facing (exported) function for reading alevin-fry output would be readAlevinFryQC - it should automatically figure out which version of alevin-fry was run and call the appropriate low-level reading function. There's a corresponding file for reading alevin QC output as well (readAlevinQC).

@DongzeHE
Copy link

DongzeHE commented Apr 5, 2022

Hey! for importing alevin-fry data, we have provided the loadFry() R function in the fishpond R package. This function takes an output folder generated by the alevin-fry quant command as the required input and returns a SingleCellExperiment object like the one returned by the importAlevin() in singleCellTK. If alevin-fry is run in the USA(abbrv. for "Unspliced-Spliced-Ambiguous") mode, by which it infers an unspliced, a spliced, and an ambiguous count for each gene in each cell separately, this function can also prepare the desired output format for single-cell, single-nucleus, and RNA velocity analysis according to an optional argument outputFormat. If needed, I would also be happy to help with adapting the loadFry function to singleCellTK. Thanks!

@rob-p
Copy link
Author

rob-p commented Apr 5, 2022

Hey! for importing alevin-fry data, we have provided the loadFry() R function in the fishpond R package. This function takes an output folder generated by the alevin-fry quant command as the required input and returns a SingleCellExperiment object like the one returned by the importAlevin() in singleCellTK. If alevin-fry is run in the USA(abbrv. for "Unspliced-Spliced-Ambiguous") mode, by which it infers an unspliced, a spliced, and an ambiguous count for each gene in each cell separately, this function can also prepare the desired output format for single-cell, single-nucleus, and RNA velocity analysis according to an optional argument outputFormat. If needed, I would also be happy to help with adapting the loadFry function to singleCellTK. Thanks!

Thanks @DongzeHE — but the use case here is for getting at the QC metrics more than just the final quantifications, so it will need access to some of the extra files that AlevinQC uses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants