-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beneficial to pool .bam files from seperate 10x single-cell lanes instead of demultilplexing seperately? #89
Comments
Hi, Good idea - it may enhance the estimation of the donors' genotype and link them across samples. On the other hand, as you mentioned the coverage for each cell won't change; therefore, it is usually unnecessary to pool multiple samples if the n_cells in each pool is reasonable (e.g., >3K). Yuanhua |
Yes, so that was my thought - I have prepared the pooled .bam-files, so will go ahead and try it and let you know the results. Do you have a feeling which mode of cellsnp-lite would yield the most improvement when doing this, i.e. SNP-based or chromosome-pileup? And how the --forcelearnGT parameter in Vireo might play in here? |
@koefoeden Did it go well? Do you have an example script to merge the bam files and rewriting the barcode IDs? Thanks in advance! |
It did not seem to change the results a lot, so we never went further with it. But in case you want to try it out, here are the two scripts I wrote. The first uses awk, sed and samtools to modify the barcodes.tsv.gz and atac_possorted_bam.bam files (and save them as new files) in the cellranger-arc output directory, i.e. argument 1, to instead be suffixed by the given $number parameter, i.e. argument 2, to avoid barcode clashing:
The second script basically merges the modified barcode and bam files, assuming that each cellranger-output directoy is labelled 1, 2, 3, 4, ,5, 6, 7, 8. in $cellranger_outs_dir, i.e. the first argument.
And then finally it should be possible to use Vireo/cellSNP as normal on these files. |
If you want something faster, you can use
Changing the previous script a bit:
Modifying cell barcodes in a 6.3G BAM file took 164 seconds |
ah, great - thanks for the tip! :) |
Thanks all for the tip! |
Hi - thanks for a nice tool! I have a quick question regarding the best approach for using cellsnp/vireo in our experimental setup:
We have generated data from 8 separate 10x Genomics single-cell reactions, which are all made from pools of the same 8 donors.
My idea is that by pooling the .bam files (and making sure that the barcodes are differentiated appropriately to avoid clashing) before demultiplexing, the demultiplexing will have access to 8x sequencing depth per individual. However, the sequencing depth per cell will of course stay the same, as we have 8x reads and 8x cells. My question is thus whether there is some kind of information sharing between the cells, which enables a better demultiplexing from merely having more cells from the same number of possible donors.
I hope my question makes sense!
The text was updated successfully, but these errors were encountered: