-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper steps for cellSNP and Vireo for large dataset #61
Comments
Hi, thanks for the questions. Q1: please use cellsnp-lite; it is re-implemented with C/C++ for much faster and memory-efficient performance. Q2: vireo supports loading the sparse matrices directly, so won't touch the large Hope these help. |
Hi things seems to work well now after using cellsnp-lite instead. Thanks! |
Hello,I met "Memoryerror" when I use viero mode2. My command is vireo -c |
Hi, thanks for sharing the issue. It looks similar to Q2 above, so try not using the sc_vcf but use the |
Hi. I have been working on a set of data with 20K cells, and I have few questions regarding how to approach the data.
Q1: For cellSNP, it was taking forever (more than 15days) to run cellSNP as one whole, so I follow the suggestion I saw and split the bam file by chromosome and got individual cellSNP output. I then merge them together. I wonder if there is a better/prefer way to merge them for Vireo.
What I am currently doing is: bcftools merge, then bcftools sort
Q2: For Vireo, I used the VCF file (1.8GB) I mentioned above as $CELL_DATA and I also have the $DONOR_GT_FILE (744KB) which I follow the suggestion to subset it using bcftools view. The issue is, it seems to be using a lot of memory, and it is hard for me to estimate the amount of memory space I need to reserve for this.
The command I used is: vireo -c $CELL_DATA -d $DONOR_GT_FILE -o $OUT_DIR
Please advice. Thanks!
The text was updated successfully, but these errors were encountered: