-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pick out best reference from a sam file #65
Comments
You can sort them I think
Having complete coverage is the most important I'm pretty sure |
Yes. You could pick the best reference (by a wide margin), re-run the mapping automatically, but still save all the data from the original mapping so the PI can confirm the choice. |
refId position(1-based) reference depth readBases baseQauls
refId sequenceLength mappedReadCount unmappedReadCount for pileup: then: average depths (using sequenceLength) for idxstats: |
Seems that you could utilize
|
I'm not sure if it makes sence to include both avg. depth and # mapped reads, because they represent basically the same information (assuming all references are about the same length.) I'd propose the following weighted equation for deciding which is the best: (coverageRatio * 1.5) * (mappedReads / totalReads)
@mmelendrez thoughts? |
What would be a reasonable metric for choosing the best-fitting reference from a mapping?
I am thinking something like average # reads mapped at each position + total reads mapped - unmapped position. We could also take into account the quality and mapping quality of reads. We could use a dataframe for this:
www.github.com/averagehat/bioframes
The text was updated successfully, but these errors were encountered: