Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: processing of vcs from Delly #243

Open
jessmewald opened this issue Feb 26, 2024 · 1 comment
Open

feature request: processing of vcs from Delly #243

jessmewald opened this issue Feb 26, 2024 · 1 comment

Comments

@jessmewald
Copy link

Hi there,

We would like to process vcf outputs from the caller Delly with SvAnna, if possible. Below is a subset of the errors we encounter:

    _____       ___
   / ___/_   __/   |  ____  ____  ____ _
   \__ \| | / / /| | / __ \/ __ \/ __ `/
  ___/ /| |/ / ___ |/ / / / / / / /_/ /
 /____/ |___/_/  |_/_/ /_/_/ /_/\__,_/
 
 Structural Variant Annotation and Analysis
                           :: v1.0.5-SNAPSHOT ::
 
 
 16:06:45.640 [main] INFO  o.m.svanna.cli.cmd.PrioritizeCommand - Using 4 phenotype features supplied via CLI
 16:06:45.645 [main] INFO  o.m.svanna.cli.cmd.SvAnnaCommand - Spooling up SvAnna v1.0.5-SNAPSHOT using resources in /svanna_db_2304_hg38
 16:06:54.489 [main] INFO  o.m.svanna.cli.cmd.PrioritizeCommand - Reading variants from `NA12878_hg38_pbmm2_delly.vcf.gz`
 16:06:54.584 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-10991221:(DUP00000246)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:10991222-10994549 -><DUP>
 16:06:54.605 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-33051198:(DUP00000472)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:33051199-64384519 -><DUP>
 16:06:54.606 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-34635820:(DEL00000486)`: Illegal DEL changeLength:0. Should be < 0 given coordinates  1:34635821-34646375 -><DEL>
 16:06:54.621 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-72300640:(DEL00000752)`: Illegal DEL changeLength:0. Should be < 0 given coordinates  1:72300641-72346156 -><DEL>
 16:06:54.622 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-73129298:(DUP00000758)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:73129299-155634876 -><DUP>
 16:06:54.622 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-73130124:(DEL00000759)`: Illegal DEL changeLength:0. Should be < 0 given coordinates  1:73130125-155627485 -><DEL>
 16:06:54.642 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-143184588:(DUP00001195)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:143184589-143202283 -><DUP>
 16:06:54.642 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-143184588:(DUP00001196)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:143184589-143207373 -><DUP>
 16:06:54.643 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-143184612:(DUP00001197)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:143184613-143200532 -><DUP>
 16:06:54.643 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-143184612:(DUP00001199)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:143184613-143221923 -><DUP>
 16:06:54.643 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-143184612:(DUP00001200)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:143184613-143241846 -><DUP>
 16:06:54.645 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-143191627:(DEL00001254)`: Illegal DEL changeLength:0. Should be < 0 given coordinates  1:143191628-143211409 -><DEL>
 16:06:54.648 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-143200193:(DUP00001327)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:143200194-143203550 -><DUP>
 16:06:54.650 [main] WARN  o.m.svanna.io.parse.VcfVariantParser - Invalid variant `chr1-143206014:(DUP00001359)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:143206015-143208676 -><DUP>
 .....
 16:06:55.589 [main] INFO  o.m.svanna.cli.cmd.PrioritizeCommand - Read 32,785 variants
 16:06:55.589 [main] INFO  o.m.svanna.cli.cmd.PrioritizeCommand - Filtering out the variants with reciprocal overlap >80.0% occurring in more than 1.0% probands
 16:06:55.589 [main] INFO  o.m.svanna.cli.cmd.PrioritizeCommand - Filtering out the variants where ALT allele is supported by less than 3 reads
 16:07:18.997 [main] INFO  o.m.svanna.cli.cmd.PrioritizeCommand - Prioritizing 32,785 variants on 2 threads
 16:07:19.017 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.017 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.017 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.017 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.372 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.372 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.373 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.373 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.373 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.484 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 2
 16:07:19.484 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 2
 16:07:19.484 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 2
 16:07:19.484 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 2
 16:07:19.485 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 2
 16:07:19.485 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 2
 16:07:19.519 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.519 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.519 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.519 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.519 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.519 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.519 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 16:07:19.519 [svanna-worker-2] WARN  o.m.s.c.p.a.i.GeneSequenceImpactCalculator - Bad insertion with nonzero length 1
 ...
 16:07:30.831 [main] INFO  o.m.svanna.cli.cmd.PrioritizeCommand - Prioritization finished in 0m 11s (11,833 ms) processing on average 2,770.64 items/s
 16:07:30.832 [main] INFO  o.m.svanna.cli.cmd.PrioritizeCommand - Writing out the results
 16:07:30.838 [main] INFO  o.m.s.cli.writer.vcf.VcfResultWriter - Writing VCF results into NA12878_hg38_delly_svanna.vcf.gz
 16:07:34.893 [main] INFO  o.m.s.c.w.t.TabularResultWriter - Writing tabular results into NA12878_hg38_delly_svanna.csv.gz
 16:07:35.677 [main] INFO  o.m.s.c.writer.html.HtmlResultWriter - Writing HTML results to NA12878_hg38_delly_svanna.html
 16:07:41.997 [main] INFO  o.m.svanna.cli.cmd.PrioritizeCommand - We're done, bye!

And the header + a few lines of calls from Delly are below:

 ##fileformat=VCFv4.2
 ##FILTER=<ID=PASS,Description="All filters passed">
 ##fileDate=20231201
 ##ALT=<ID=DEL,Description="Deletion">
 ##ALT=<ID=DUP,Description="Duplication">
 ##ALT=<ID=INV,Description="Inversion">
 ##ALT=<ID=BND,Description="Translocation">
 ##ALT=<ID=INS,Description="Insertion">
 ##FILTER=<ID=LowQual,Description="Poor quality and insufficient number of PEs and SRs.">
 ##INFO=<ID=CIEND,Number=2,Type=Integer,Description="PE confidence interval around END">
 ##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="PE confidence interval around POS">
 ##INFO=<ID=CHR2,Number=1,Type=String,Description="Chromosome for POS2 coordinate in case of an inter-chromosomal translocation">
 ##INFO=<ID=POS2,Number=1,Type=Integer,Description="Genomic position for CHR2 in case of an inter-chromosomal translocation">
 ##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the structural variant">
 ##INFO=<ID=PE,Number=1,Type=Integer,Description="Paired-end support of the structural variant">
 ##INFO=<ID=MAPQ,Number=1,Type=Integer,Description="Median mapping quality of paired-ends">
 ##INFO=<ID=SRMAPQ,Number=1,Type=Integer,Description="Median mapping quality of split-reads">
 ##INFO=<ID=SR,Number=1,Type=Integer,Description="Split-read support">
 ##INFO=<ID=SRQ,Number=1,Type=Float,Description="Split-read consensus alignment quality">
 ##INFO=<ID=CONSENSUS,Number=1,Type=String,Description="Split-read consensus sequence">
 ##INFO=<ID=CONSBP,Number=1,Type=Integer,Description="Consensus SV breakpoint position">
 ##INFO=<ID=CE,Number=1,Type=Float,Description="Consensus sequence entropy">
 ##INFO=<ID=CT,Number=1,Type=String,Description="Paired-end signature induced connection type">
 ##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Insertion length for SVTYPE=INS.">
 ##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">
 ##INFO=<ID=PRECISE,Number=0,Type=Flag,Description="Precise structural variation">
 ##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
 ##INFO=<ID=SVMETHOD,Number=1,Type=String,Description="Type of approach used to detect SV">
 ##INFO=<ID=INSLEN,Number=1,Type=Integer,Description="Predicted length of the insertion">
 ##INFO=<ID=HOMLEN,Number=1,Type=Integer,Description="Predicted microhomology length using a max. edit distance of 2">
 ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 ##FORMAT=<ID=GL,Number=G,Type=Float,Description="Log10-scaled genotype likelihoods for RR,RA,AA genotypes">
 ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
 ##FORMAT=<ID=FT,Number=1,Type=String,Description="Per-sample genotype filter">
 ##FORMAT=<ID=RC,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the SV">
 ##FORMAT=<ID=RCL,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the left control region">
 ##FORMAT=<ID=RCR,Number=1,Type=Integer,Description="Raw high-quality read counts or base counts for the right control region">
 ##FORMAT=<ID=RDCN,Number=1,Type=Integer,Description="Read-depth based copy-number estimate for autosomal sites">
 ##FORMAT=<ID=DR,Number=1,Type=Integer,Description="# high-quality reference pairs">
 ##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality variant pairs">
 ##FORMAT=<ID=RR,Number=1,Type=Integer,Description="# high-quality reference junction reads">
 ##FORMAT=<ID=RV,Number=1,Type=Integer,Description="# high-quality variant junction reads">
 ##reference=/projects/clia/clia-LRS/hg38_noalt/hg38.no_alt.fa
 ##contig=<ID=chr1,length=248956422>
 ##contig=<ID=chr10,length=133797422>
 ##contig=<ID=chr11,length=135086622>
 ##contig=<ID=chr11_KI270721v1_random,length=100316>
 ##contig=<ID=chr12,length=133275309>
 ##contig=<ID=chr13,length=114364328>
 ##contig=<ID=chr14,length=107043718>
 ##contig=<ID=chr14_GL000009v2_random,length=201709>
 ##contig=<ID=chr14_GL000225v1_random,length=211173>
 ##contig=<ID=chr14_KI270722v1_random,length=194050>
 ##contig=<ID=chr14_GL000194v1_random,length=191469>
 ##contig=<ID=chr14_KI270723v1_random,length=38115>
 ##contig=<ID=chr14_KI270724v1_random,length=39555>
 ##contig=<ID=chr14_KI270725v1_random,length=172810>
 ##contig=<ID=chr14_KI270726v1_random,length=43739>
 ##contig=<ID=chr15,length=101991189>
 ##contig=<ID=chr15_KI270727v1_random,length=448248>
 ##contig=<ID=chr16,length=90338345>
 ##contig=<ID=chr16_KI270728v1_random,length=1872759>
 ##contig=<ID=chr17,length=83257441>
 ##contig=<ID=chr17_GL000205v2_random,length=185591>
 ##contig=<ID=chr17_KI270729v1_random,length=280839>
 ##contig=<ID=chr17_KI270730v1_random,length=112551>
 ##contig=<ID=chr18,length=80373285>
 ##contig=<ID=chr19,length=58617616>
 ##contig=<ID=chr1_KI270706v1_random,length=175055>
 ##contig=<ID=chr1_KI270707v1_random,length=32032>
 ##contig=<ID=chr1_KI270708v1_random,length=127682>
 ##contig=<ID=chr1_KI270709v1_random,length=66860>
 ##contig=<ID=chr1_KI270710v1_random,length=40176>
 ##contig=<ID=chr1_KI270711v1_random,length=42210>
 ##contig=<ID=chr1_KI270712v1_random,length=176043>
 ##contig=<ID=chr1_KI270713v1_random,length=40745>
 ##contig=<ID=chr1_KI270714v1_random,length=41717>
 ##contig=<ID=chr2,length=242193529>
 ##contig=<ID=chr20,length=64444167>
 ##contig=<ID=chr21,length=46709983>
 ##contig=<ID=chr22,length=50818468>
 ##contig=<ID=chr22_KI270731v1_random,length=150754>
 ##contig=<ID=chr22_KI270732v1_random,length=41543>
 ##contig=<ID=chr22_KI270733v1_random,length=179772>
 ##contig=<ID=chr22_KI270734v1_random,length=165050>
 ##contig=<ID=chr22_KI270735v1_random,length=42811>
 ##contig=<ID=chr22_KI270736v1_random,length=181920>
 ##contig=<ID=chr22_KI270737v1_random,length=103838>
 ##contig=<ID=chr22_KI270738v1_random,length=99375>
 ##contig=<ID=chr22_KI270739v1_random,length=73985>
 ##contig=<ID=chr2_KI270715v1_random,length=161471>
 ##contig=<ID=chr2_KI270716v1_random,length=153799>
 ##contig=<ID=chr3,length=198295559>
 ##contig=<ID=chr3_GL000221v1_random,length=155397>
 ##contig=<ID=chr4,length=190214555>
 ##contig=<ID=chr4_GL000008v2_random,length=209709>
 ##contig=<ID=chr5,length=181538259>
 ##contig=<ID=chr5_GL000208v1_random,length=92689>
 ##contig=<ID=chr6,length=170805979>
 ##contig=<ID=chr7,length=159345973>
 ##contig=<ID=chr8,length=145138636>
 ##contig=<ID=chr9,length=138394717>
 ##contig=<ID=chr9_KI270717v1_random,length=40062>
 ##contig=<ID=chr9_KI270718v1_random,length=38054>
 ##contig=<ID=chr9_KI270719v1_random,length=176845>
 ##contig=<ID=chr9_KI270720v1_random,length=39050>
 ##contig=<ID=chrM,length=16569>
 ##contig=<ID=chrX,length=156040895>
 ##contig=<ID=chrY,length=57227415>
 ##contig=<ID=chrY_KI270740v1_random,length=37240>
 ##bcftools_viewVersion=1.9+htslib-1.9
 ##bcftools_viewCommand=view NA12878_hg38_pbmm2_delly.bcf; Date=Fri Dec  1 20:46:30 2023
 ##bcftools_viewCommand=view NA12878_hg38_pbmm2_delly.vcf.gz; Date=Mon Feb 26 16:20:51 2024
 #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	UnnamedSample
 chr1	10862	INS00000000	G	GCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCATGCTAGCGCGTCCAGGGGAGGAGGCGTGGCA	67	LowQual	PRECISE;SVTYPE=INS;SVMETHOD=EMBL.DELLYv1.1.7;END=10862;SVLEN=68;PE=0;MAPQ=0;CT=NtoN;CIPOS=-13,13;CIEND=-13,13;SRMAPQ=16;INSLEN=68;HOMLEN=15;SR=4;SRQ=0.931035;CONSENSUS=CTAACCCGAACCCGAACCCGAACCCGAACCCGAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCCAACCCCTAACCCTAACCCTAACCCTAACCCGAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCTCACCCCAACCCCCACCCCCACCCCCACCCTCAACCCTCAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCTCTACCCCCACCCCCACCCCCACCCCCACCCCAACCCCAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCCTCGCGGTACCCTCAGCCGGCCCGCCCGCCCGGGTCTGACCTGAGGAGAACTGTGCTCCGCCTTCAGAGTACCACCGAAATCTGTGCAGAGGACAACGCAGCTCCGCCCTCGCGGTGCTCTCCGGGTCTGTGCTGAGGAGAACGCAACTCCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCAGAGAGGCGCGCCGCGCCGGCGCAGGCGCATGCTAGCGCGTCCAGGGGAGGAGGCGTGGCACAGGCGCAGAGACACATGCTAGCGCGCCCAGGGGAGGAGGCGTGGCGCAGGCGCAGAGAGGCGCGCCGTGCTGCCGCAGGCGCAGAGACACATGCTAGCGCGTCCAGGGGGTGGAGGCGTGGCGCAGGCGCAGAGACGCACGCCTACGGGCGGGGTTGGGGGGGGCGTGTGTTACAGGAGCAAAGTCGCACGGCGCCGGGCTGGGGGCGGGGGGGGGGGGGCGCCGTGCACGCGCAGAAACTCACGTCACGGCGGCGCGGCGCAGAGACGGGTGGAACCTCAGTAATCCGAAACGCCGGGATCGACAGCCCCTTGCTTGCAGCCGGGCACTACAGGACCCGCTTGCTCACGGTGCTGTGCCAGGGCGCCCCCTGCTGGCGACTAGGGCAACTGCAGGGCTCTCTTGCTTAGAGTGGTGGCCAGCGCCCCCTGCTGGCGCCGGGGCACTGCAGGGCCCTCTTGCTTACTGTATAGTGTGGGGCACGCCGCCTGCTGGCAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGAGTAGTGGCAGCACGCCCGCCTGCTGGCAGCTGGGGACACTGCCGGGCCCTCTTGCTCCAACAGTAGTGGCGGATTATAGGGAAACACCCGGAGCATATGCTGTTTGGTCTCAGTAGGCTCCTAAATATGGGATTCCTGGGTTTAAAAGTATAAAATAAATATGTTTAATTTGTTAACTGATTACCATCAGAATTGTACTGTTCTGTATCCCACCAGCAATGTCTAGGAATGCCTGTTTCTCCACAAAGTGTTTACTTTTGGATTTTTGCCAGTCTAACAGGTGAAGCCCTGGAGATTCTTATTAGTGATTTGGGCTGGGGCCTGGCCATGTGTATTTTTTTAAATTTCCACTGATGATTTTGCTGCATGGCCGGTGTTGAGAATGACTGCGCAAATTTGCCGGATTTCCTTTGCTGTTCCTGCATGTAGTTTAAACGAGATTGCCAGCACCGGGTATCATTCACCATTTTTATTTTCGTTAACTTGCCGTCAGC;CE=1.94211;CONSBP=969	GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV	0/1:-5.49588,0,-0.197838:4:LowQual:10342:15644:5302:2:0:0:0:4
 chr1	43177	DEL00000001	TAAATATGAAGAATATTATAAATCATATCAATAACCACAACATTCAAGCTGTCAGTTTGAATAGACAATGTAAATGACAAAACTACATACTCAACAAGATAACAGCAAACCAGCTTCGACAGCACGTTAAAGGGGTCATACAACATAATCGAGTAGAATTTATCTCTGAGATGCAAGAATGGTTCAAAATATGGAAACCAATAAATGTGATATGCCACACTAACAGAATAAAAAATAAAAATCATATTATCATCTCAATAGATGCAGAAAAAGCATTAACAAAAGTAAAC	T	48	LowQual	PRECISE;SVTYPE=DEL;SVMETHOD=EMBL.DELLYv1.1.7;END=43466;PE=0;MAPQ=0;CT=3to5;CIPOS=-2,2;CIEND=-2,2;SRMAPQ=12;INSLEN=0;HOMLEN=1;SR=4;SRQ=0.978193;CONSENSUS=GCTGAATTACCCATGCAAAACCTTAATACTTGACACTTATCACTACTTTATTCAAGAGCCTATTGTTCTCAACTCTGCTCATTAATACTATGCTTGGAGTATACAGTAAGATAAGAAACATAAATAAGAAGTGTACATTTGTTTCTTCCTGTTTTCTTCTGGCTATTGGATCAATTACATCCCATCTTAAGCTGACCCCTGTGTAATTAATCAATATCCGTTTTAAGCAGCAATCCATAGTTGTGCAGAAATTAGAAAACTGACCCACACAGAAAAACTAATTGTGAGAACCAATATTACACTAAATTCATTTGACAATTCTCAGCAAAGTGCTGGGTTGATCTCTATTTATGCTTTTCTTAAACACACAAAATACAAAAGTTAACCCATATGGAATGCAATGGAGGAAATCAATGACATATCAGATCTAGAAACTAATCAATTAGCAATCAGGAAGGAGTTGCGGTAGGAAGTCTGTGCTGTTGAATGTACACTAATCAATGATTCCTTAAATTATTCACAATAAAAAAAAAGATTAGAATAGTTTTTTTTTAAAAAAAGCCCAGAAACTAATCTAAGTTTTGTCTGGTAATAAAGGTATATTTTCAAAAGAGAGGTAAATAGATCCACATACTGTGGAGGGAATAAAATACTTTTTGAAAAACAAACAACAAGTTGGATTTTTAGACACATAGAAATTGAATATGTACATTTATAAATATTTTTGGATTGAACTATTTCAAAATTATACCATAAAATAACTTGTAAAAATGTAGGCAAAATGTATATAATTATGGCATGAGGTATGCAACTTTAAGCAAGGAAGCAAAAGCAGAAACCATGAAAAAAGTCTAAATTTTACCATATTGAATTTAAATTTTCAAAAACAAAAATAAAGACAAAGTGGGAAAAATATGTATGCTTCATGTGTGACAAGCCACTGATATTTTATTCTTTCATAATAAGACATCAGATAAAACAAATTAGGAATAGAAGGAATGTACCGCAACACAATAAAGGCCATATATAACAAGCCCACAGCTAACATCATAATAGTAAAATCATCACAATGGTAAAAAAAATGAAAGCTTTTCCTCTAAGGTCAGAAATAATATAAAGGTTCCCACTCTTGCTATTTCTATTCCATATAGTACTAAAAGTCCTAGCCAGGACAATTAGACAAAATAAAAATAAAAACACCCAAATTGGAAAGATAGAAGCAAACTTTTCTGTTTACAGATAACATAATCTTATATGTAGAAACCCCTTAAAACTTCAGCAAAAAAAAAAAAAAAAAAAAACTACAGAGCTAGTAAATTCAGTGAAGTTGCAGAATACAAAATCAACATACAAAAATCAGTAGTGTCTCTATACACTAATAAGGACTTAACAGAGAAAGAAGTTAAGAAAACAATACCACTAACAATAGAATCCAAAAAATAAAATACTTAGGAATAAATTTTACCAAACATCTGTACACTAAAAACTATAAAACATTGAAAAAAGAAGTTGAATAAGACACATATAAATAGAAAGCTATCTCATGTTAATAGATTAGAAAAAGTAATATTGTTAAGATGTCCTCACTACTTAAAGCAATTTATAGATCTAATGCATTTATTGCAATCTCTTCAAAATCCCAAAGGTATTTTTGACAGAAATAAAAAAAAAATTCTAAAATATGCATGAAACCACAAAAGACTGTGAATAGCTAAAGCAATCTTGAGCAAGATGAACAACACTGGAAGCATCACACTACCTTATTTCAAAATCTACTACAAAGCTATAGTGATCAAAGCAACATGATACTGTCATAAAAACACATAGATAAACCTATGGAATGGAATAAAGAGCACAGAAATAAGTCCACACATTTACATTCAATTGATTTTCAACAAC;CE=1.84833;CONSBP=950	GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV	1/1:-2.11242,-0.542892,0:6:LowQual:1006:872:1013:1:0:0:2:2
 chr1	54712	INS00000002	T	TTTTTTTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTC	14	LowQual	PRECISE;SVTYPE=INS;SVMETHOD=EMBL.DELLYv1.1.7;END=54712;SVLEN=53;PE=0;MAPQ=0;CT=NtoN;CIPOS=-7,7;CIEND=-7,7;SRMAPQ=1;INSLEN=53;HOMLEN=8;SR=10;SRQ=0.987275;CONSENSUS=TAAATAAAATGTGAACTTAGGCAAATTATAAATTAATAAAGTATATTTTTAAAATTTCCATTTTAATTTCTGTTTAAATTAGAATAAGAAACAAAAACAACTATGTAATACGTGTGCAAAGCCCTGAACTGAGATTTGACTTTACCTTGAGCTTTGTCAGTTTACGATGCTATTTCAGTTTTGTGCTCAGATTTGAGTGATTGCAGGAAGAGAATAAATTTCTTTAATGCTGTCAAGACTTTAAATAGATACAGACAGAGCATTTTCACTTTTTCCTGCATCTCTATTATTCTAAAAATGAGAACATTCCAAAAGTCAATCATCCAAGTTTATTCTAAATAGATGTGTAGAAATAACAGTTGTTTCACAGGAGACTAATCGCCCAAGGATATGTGTTTAGAGGTACTGGTTTCTTAAATAAGGTTTTCTAGTCAGGCAAAAGATTCCCTGGAGCTTATGCATCTGTGGTTGATATTTTGGGATAAGAATAAAGCTAGAAATGGTGAGGCATATTCAATTTCATTGAAGATTTCTGCATTCAAAATAAAAACTCTATTGAAGTTACACATACTTTTTTCATGTATTTGTTTCTACTGCTTTGTAAATTATAACAGCTCAATTAAGAGAAACCGTACCTATGCTGTTTTGTCCTGTGACTCTCCAAGAACCTTCCTAAGTTATTCTACTTAATTGCTTTATCACTCATATGAATGGGAATTTCTTCTCTTAATTGCTGCTAATCTCCCCCATCTTCAAATACTCTACCGGGCTTCTGGAACACCACAGCTTCCTGGCTTTTTCTCCTACCTCCTGGGCAAGTCCTTCCCTGTGTCTTTTGTTGAGTGTTCCTCATCTGCTTAACCACCAATCAACCTATTGCCCCTAATTTGATCTTTGGCCTGTTTTCACTTAGATTCTATCCCTACGTATCACCCATTCCCACAGCTTTAATTACCATCTAAACACTAGGGGCTCTCAAACCTTCTATTTTTTTTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTCTTCCTCCTTTTCTTTCCTTTTCTTTCTTTCATTCTTTCTTTCTTTTTTAAGGGGCAGGGTCTCACTATGTTACTGAGGCTGGTCTCAAACTCCTGACCTCAAGCAATCTGTCTGCTTCAGCCTCCCAAGTAGCTGAGAATACAGGGACAAGCCATTGCACCTGACCCTGGTACTATTTCTTGAGTTCCTGATCCACAGATCTAACCTCCTACTTTCCTGGATGCCACACAAGATCTTCCACTCAACAAGTCTGCAACTAAACTAGCCTTCCTCTTTTCAAACCTACTCTTCTTTCAGTGTTCTCAGTCACAAAAATTTGTACCAACTAGTTACCTAGTTGCACAACCCAAAATCTGGGAAAAATAATAGATTTCTTTCTCCATAGTACCCCAAAATCAATAAATCATCAAGTCTTATTCTACCTTCCAAAGAGCCTTACATATGTTCCTTTATTTTCATCTGTAACACCACTATTCCTGTCTAAGCCTACCTATGTCATTTTTGGAAGAGAATATAGTCACCTATGTGATCTTCCCACTTAAAATCCTATTATCTATGCTTCAGTAAAAGAAAAAAAATTTTTAATCTAAGTATGTAATTCTTTTGCTAAAGACACTTCACATGCTTCTGTGCCCTTAAACTGGTATGTTATCATGGTATAGTAGGCCATCCAAGACCTGGCTTCCTTCCTTTTTTTCAGTCTCAGAGAATAACGTACTCTTTCCCTGCAACTCCAGATCCAATTTGGTTTTCTTTTACTTGCCTGGAAACTTCAAATTCTATCAACTCTGGGGCTTTCCACTAGCTAATCATTTTGTATACAATATTTGTCCTTCATGTTTTGCCTCTTAACATCTCAGCTTTCAGTTTCATCATTTTACCAGGGAGGCCTCCCAGAACCTGAGTCCAGAAGAGTTCCTTCCATTGTATATTCCTCTAGCACTACCTAT;CE=1.90976;CONSBP=989	GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV	0/0:0,-3.28138,-23.3388:33:PASS:10976:22465:11489:2:0:0:7:8
 chr1	66534	INS00000003	T	TATATATTATATAAATATAATATATATAATATATATTATATAAATATAATATATATAATATAATATATATTATATAAATATAATATATATTTTATTATATAATATAATATATATAATATAATATAAATTATATAAATATAATATATATTTTATTATATAATATAATATATATTATATAATATAATATATTTTATTATATAAATATATATTATATTCTATATAATATAATATATATTTTATTATATAATATATATTATATATTTATAGAATATAATATATATTTTATTATATAATATATATTATATAATATAATATATATTATATTTATATATAACATATATTATTATATAAAATATGTACTATATATTATATAAA	26	LowQual	PRECISE;SVTYPE=INS;SVMETHOD=EMBL.DELLYv1.1.7;END=66534;SVLEN=374;PE=0;MAPQ=0;CT=NtoN;CIPOS=-7,7;CIEND=-7,7;SRMAPQ=5;INSLEN=374;HOMLEN=7;SR=5;SRQ=0.944791;CONSENSUS=TTTTGCTGTGATTCTTTAAAAAGCACCTTTAGACTTAGTGAGATAGCAAAAATATCCAAATAGGCCAAAAAATTGTGGCAATGTCCTCTCACTCAGGAAAATTCTGTGTGTTTTCTCTAATGGCCAAGGGAAAACTTGTGAGACTATAAAAGTTAGTCTCAGTACACAAAGCTCAGACTGGCTATTCCCAGATCTCTTCAGGTACATCTAGTCCATTCATAAAGGGCTTTTAATTAACCAAGTGGTTTACTAAAAAGGACAATTCACTACATATTATTCTCTTACAGTTTTTATGCTTCATTCTGTGAAAATTGCTGTAGTCTCTTCCAGTTATGAAGAAGGTAGGTGGAAACAAAGATAAAACACATATATTAGAAGAATGAATGAAATTGTAGCATTTTATTGACAATGAGATGATTCTATTAGTAGGAATCTATTCTGCATAATTCCATTTTGTGTTTACCTTCTGGAAAAATGAAAGGATTCTGTATGGTTAACTTAAATACTTAGAGGAATTAATATGAATAATGTTAGCAAGAATAACCCTTGTTATAAGTATTATGCCGGCAACAATTGTCGAGTCCTCCTCCTCACTCTTCTGGGCTAATTTGTTCTTTTCTCCCCATTTAATAGTCCTTTGCCCCATCTTTCCCCAGGTCCGGTGTTTTCTTACCCACCTCCTTCCCTCCTTTTTATAATACCAGTGAAACTTGGTTTGGAGCATTTCTTTCACATAAAGGTACAAATCATACTGCTAGAGTTGTGAGGATTTTTAGAGCTTTTGAAAGAATAAACTCATTTTAAAAACAGGAAAGCTAAGGCCCAGAGATTTTTAAATGATATTCCCATGATCACACTGTGAATTTGTGCCAGAACCCAAATGCCTACTCCCATCTCACTGAGACTTACTATAAGGACATAAGGCATTTTTATATATATATATTATATATACTATATATTTATATATATTACATATTATATATATAATATTATATATATAATATATATTATATTATATATATAATATATATAATATAATATATTATATATATTATATATATAATATATATAATATATTATATATATTATATATATAATATATATAATATATATAATATAATATATTTTATATATATATATTATATAATATATATATATTTTATATATATATTATATAATATAATATATATATTATATATATAATATATATATAATATAATATAATATAATATATTATATTATATAATATATGATATAAATATAATATATATTTTATTATATAATATAATATATATAATATAATATATATTATATAAATATAATATATATAATATATATTATATAAATATAATATATATAATATAATATATATTATATAAATATAATATATATTTTATTATATAATATAATATATATAATATAATATAAATTATATAAATATAATATATATTTTATTATATAATATAATATATATTATATAATATAATATATTTTATTATATAAATATATATTATATTCTATATAATATAATATATATTTTATTATATAATATATATTATATATTTATAGAATATAATATATATTTTATTATATAATATATATTATATAATATAATATATATTATATTTATATATAACATATATTATTATATAAAATATGTACTATATATTATATAAATATATTTATATATTATATAAATATATTTATATATTATATAAATATATATATTATATAAATATATTTATATATTATATAAATATATATATTATATATAATTCTAATGGTTGAATTCCAAGAATAATCTATGGCATGAAAGATTTTACCTGTCAACAGTGGCTGGCTCTTCATGGTTGCTACAATGAGTGTGTAAGATTCTGAAGGACTCCTTTAATAAGCCTAAACTTAATGTTCAACTTAGAATAAATACAATTCTTCTAAATTTTTTTGAATAATTTTTGAAAAGTCAGAAATGAGCTTTGAAAGAATTATGGTGGTGAAGGATCCCCTCAGCAGCACAAATTCAGGAGAGAGATGTCTTAACTACGTTAGCAAGAAATTCCTTTTGCTAAAGAATAGCATTCCTGAATTCTTACTAACAGCCATGATAGAAAGTCTTTTGCTACAGATGAGAACCCTCGGGTCAACCTCATCCTTGGCATATTTCATGTGAAGATATAACTTCAAGATTGTCCTTGCCTATCAATGAAATGAATTAATTTTATGTCAATGCATATTTAAGGTCTATTCTAAATTGCACACTTTGATTCAAAAGAAACAGTCCAACCAACCAGTCAGGACAGAAATTATCTCACAATAAAAATCCTATCATTTGTACTGTCAATGATTAGTATGATTATATTTATTACCGTGCTAAG;CE=1.773;CONSBP=1303	GT:GL:GQ:FT:RCL:RC:RCR:RDCN:DR:DV:RR:RV	0/0:0,-0.480512,-9.23288:6:LowQual:7991:16482:8491:2:0:0:6:5

Let us know if this would be possible, and what additional information you need from us.
Thanks!

@ielis
Copy link
Member

ielis commented Feb 29, 2024

Hi @jessmewald

I looked into this. However, based on the errors, there is probably little I can do, because I think the VCF does not follow the VCF 4.2 specification.

There seem to be some issues with the VCF - some SVLEN fields seem to be wrong. For instance, based on the output line

Invalid variant `chr1-10991221:(DUP00000246)`: Illegal DUP!changeLength:0. Should be > 0 given coordinates 1:10991222-10994549 -><DUP>

I expect the VCF to contain a symbolic duplication with DUP00000246 identifier that has SVLEN=0 in the INFO field. This looks odd since the coordinates 1:10,991,222-10,994,549 indicate presence of ~3.3kb duplication. Therefore, the field should be something like SVLEN=3326.

This is because the definition of the SVLEN info field includes the following:

##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">

*One value for each ALT allele. Longer ALT alleles (e.g. insertions) have positive values, shorter ALT alleles (e.g.
deletions) have negative values.

So, again, SVLEN should be positive for a duplication, where the ALT allele is longer.

Moreover, Delly seems to use the SVLEN field for another purpose, just to store the length of an insertion:

##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="Insertion length for SVTYPE=INS.">

However, SVLEN is a reserved VCF field, so it should be used for its purpose - to store the length difference for all symbolic variants, not just for insertions, and put some random trash for other variants.

I am not sure that SvAnna code base is the place to fix these errors. Hopefully, Delly authors will fix this bug and produce valid VCF files.

So, to fix this in the short term, you'll probably need to write a Python script to set the SVLEN field with a correct value calculated from the coordinates, and run the script as part of your pipeline, right after Delly variant calling. It should be possible to calculate the coordinates from the POS and END fields for all symbolic variants except for INS.
I can help with checking the script, I've been staring at variant coordinates long enough to develop some skills..
Please let me know if I can help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants