Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exome signature analysis #4

Open
drtamermansour opened this issue Oct 9, 2024 · 3 comments
Open

Exome signature analysis #4

drtamermansour opened this issue Oct 9, 2024 · 3 comments

Comments

@drtamermansour
Copy link

We are currently using canine cDNA to create a generic exome sketch. Each exome kit seems to have its own range of possible enrichment e.g. Agilent SureSelect achieves 45-75% coverage, Nimblegen Exome-1 achieves 30-60% coverage, and Nimblegen Exome-plus achieves 75-85% coverage. These numbers are always inflated by inclusion of off-target sequences (which should be proportional to the off-target sequences of non-exome sequences as well). In addition, these numbers might be missing regions designed to be targeted but outside the cDNA (e.g., some kits include regions from GWAS peaks, regulatory loci, disease causes markers, etc). Unfortunately, the BED files of these kits - specially if discontinued - are not always available. Can we use Snipe and the available annotation of some bioprojects make these BED files?

@drtamermansour
Copy link
Author

  1. handling off-targets:
    In WGS, the mean and median abundances on both the genome and exome scale are almost indifferent because of random sequencing
    On the other hand, in WXS, the genomic mean abundance is much higher than the median while both are very close for the exome. I think the genomic median abundance is representing the off-target abundance. We can use the median-trimmed sketches to represent the target captures.

@drtamermansour
Copy link
Author

  1. Pairwise comparison of median-trimmed sketches should show clusters for each kit. We can use random representatives from bioprojects with known annotations and make a test to see the clusters

@drtamermansour
Copy link
Author

  1. Once we have clusters, we can use them to build pan-exome sketches at different level of consensus (99%, 95%, 50%). Agilent SureSelect has a published BED file so we can use it to assess the performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant