-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include barcode integer suffix in index. #8
Comments
Hi, Thank you for reporting this issue. In the case of different suffix numbers in a single BAM, the expected behaviour of LRez would be to consider as two distinct barcodes two barcodes that share the same nucleotide barcode sequence but have different suffix numbers, is that correct ? This may not be straightforward to implement in LRez, since LRez assumes all barcodes are purely nucleotide sequences and then encodes them into integers with a 2bit encoding. The suffix numbers could be converted to nucleotide words appended to the barcodes, but this would cost extra space for vast majority of the datasets with only the "-1" suffix, and to optimize the extra space, we would need to know in advance the maximal number of different integer suffixes for the given sample. Do you have an idea of this maximal number of different integer suffixes in practice ? In your opinion, does this situation (BAMs with multiple 10X libraries) occur frequently in practice ? Note that a temporary (though not very neat or practical) solution is be to pre-process the BAM by replacing -X suffixes by short nucleotide words specific to each library. Best, |
Thanks for the quick reply!
Yes this is correct as the same nucleotide barcode sequence could have been sampled in multiple library preparations.
I expected this might be an issue. As for the maximal number of expected suffixes I don't have a good answer here. Clearly most people only use BAMs with one suffix ("-1"). For me I have merged as much as 6 different libraries, that is 6 different suffixes in one BAM. I am however not sure how common this is for other people. A pretty safe estimate for a maximum numer of integer suffixes would probably be around 10.
Yes I suppose this would be a solution. An even simpler solution, and more practical for now, would be to just ignore any barcode suffix for the index. I am thinking that this would be a ok solution for now as I am not sure this is a big problem for other users. Also one could always confirm which suffix is present on an alignment after accessing it. |
Relates to #6.
As noted in the longranger docs (below) the suffix number can be any integer, not just "-1", as it is mean to allow for merging of different 10X libraries into the same BAM.
I run into this issue when trying to run
LRez index bam
on a BAM with multiple libraries which resulted in the following error:From what I can understand from the code this suffix is currently not include in the index. For LRez to work with BAMs that contain multiple libraries this would need to be fixed.
The text was updated successfully, but these errors were encountered: