This is the repository that contains the data and files needed to generate the 2nd version of CCRs. This version includes variants from gnomad exome and whole genome sequencing samples.
The goal of this project is to generate a new model for CCRs that will account for different SIFT/PolyPhen scores and VAF. This will allow for the new CCR model to identify regions enriched in Clinvar pathogenic variants.
Required files include:
- new_CCR_jason.ipynb = jupyter notebook for generation of new CCR models
- exter.py function to read the gff file
- pathoscore.py function from PathoScore repository
- PathoScore Clinvar truth sets: see here to make the Clinvary truth-set directory
- gnomad exome and wgs vcf files
- GFF transcript file
Output files:
- pathoscore results will be provided in the pathoscore_results output directory
- new_CCR.bed.gz = bed file with the CCR windows
- new_CCR.bed.gz.tbi = indexed bed file
- new_CCR.txt = txt file with the CCR windows