You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thanks for your software, I use your example data and encounted this problem, the vcf_annotation.gz cannot be read and procced to get Twp.txt file, could you help me figure out it, thankyou
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
***** *** vcfR *** *****
This is vcfR 1.15.0
browseVignettes('vcfR') # Documentation
citation('vcfR') # Citation
***** ***** ***** *****
Warning message:
The x argument of as_tibble.matrix() must have unique column names if .name_repair is omitted as of tibble 2.0.0.
ℹ Using compatibility .name_repair.
Scanning file to determine attributes.
File attributes:
meta lines: 34
header_line: 35
variant count: 259
column count: 10
Meta line 34 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
Character matrix gt rows: 259
Character matrix gt cols: 10
skip: 0
nrows: 259
row_num: 0
Processed variant: 259
All variants processed
[1] "CHROM" "POS" "qry_id" "REF"
[5] "ALT" "n_hits" "fragmts" "match_lengths"
[9] "repeat_ids" "matching_classes" "strands" "RM_id"
compute repeat proportion for each SVs...
Mammalian filters OFF, writing vcf...
The tag "~ID" is not defined in vcf_annotation.gz
Failed to read from standard input: unknown file type
Work dir:
~/TE/GraffiTE/test/GraffiTE_testset/work/3c/39e0978978f03d2ce4a63556c48160
Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run
-- Check '.nextflow.log' file for details
Follow the prompts to run:~/TE/GraffiTE/test/GraffiTE_testset/work/3c/39e0978978f03d2ce4a63556c48160/bash .command.run
$ bash .command.run
Mammalian filters OFF, writing vcf...
The tag "~ID" is not defined in vcf_annotation.gz
Failed to read from standard input: unknown file type
Looking further into the vcf_annotation file, there is no header
22 16147398 HG002.mat.svim_asm.INS.1 G GCCTCAGCCTCCCAAAGTGCTGGGATTATAAGCGTGAGCCACTGTGCCCAACCGATTTTTTTGTATTTTTAGTAAAGATGGGGGTTTCATCATCTTGGCTAGGCTGGTCTTGAACTCCTGATCTCGTGATCCACCCA 1 1 136 AluSg4 SINE/Alu C 209
22 16212443 HG002.mat.svim_asm.DEL.1 TTCTGTGAGATGAATGCACACATCACAAAGAAGTTTCTCAGAATGCTTCTGTCTAGTTTTTATGTGAAGATATTCCCTTTTCCACCACAGGCCTCAAAGCGCTCCAAATATCCACTCGCGGTTTCTGCAAAAAGAGTGTTTCAAAACTTCTCAATCAAAAGAAAGGTTCAAC T 0 0 0 None None None None
22 16287420 HG002.mat.svim_asm.DEL.2 GGCCCACTTTGTTCTTGCCGCTCCCCCTGCAGCAGGGGAAGCAGTGGCAGCACCACTTGCCCATCTTGCTCCTGAGTGTCTTCATAGCAGAGTCGTCGTGGTCTCCAGAAGT G 0 0 0 None None None None
22 16658445 HG002.mat.svim_asm.INS.2 A ATCATTTATTTTCTTTTTTTTCCCAGATCTTCGTGTTTTTTTTTAGATTTTTTTTTTTTTTATTTTACTTTAAGCTTTAGTGTACATGTGCACAATGTGCAGGTTAGTTACATATGTATACATGTGCCATGCTGGTGCCCTGCACCCACTAACTCGTCATCTAGCATTAGGTATATCTCCCAATGCTATCCCTCCCCCCACCCCCACCCCATAACAGTCCCCAGAGGGGGATATTCCCCTTCCTGTTTCCTTGTGATCTCATTGTTCAATTCCCACCTATGATTGAGAATATGCGGTGTTTGGTTTTTTGTTCTTGCAATAGTTTACTGAGAATGATGATTTCCAATTTCATCCATGTCCCTACAAAGGACATGAACTCATCATTTTTTATGGCTGCATAGTATTCCATGGTGTATATGTGCCACATTTTCTTAATCCAGTCTATCGTTGTTGGACATTTGGTTTGGTTCCAAGTCTGTGCTATTGTGAATAATGCCACAATAAACATACGTGTGCATGTGTCTTTATAGCAGCATGATTTATAGTCCTTTGGGTATATATCCAGTAATGGGATGGCTGGGTCAAATGGTATTTCCAGCTCTGGATCCCTGAGGAATCGCCACACTGACTTCCACAATGGTTGAACTAGTTTCCAGACCCACCAACAGTGTAAAAGTGTTCCTATTTCTCCACATCCTCTCCAGCACCTGTTGTTTCCTGACTTTTTAATAATCGCCATTCCAACTGGTGTGAGATGGTATCTCATTGTGGTATTGATTTGCATTTCTGTGATGGCCAGTGATGATGAGCATTTTTTCATGGGTTTTTTGGGTGCATAAATGCCTTCTTTTGAGAAGTGCCTGTTCATGTCCTTCGCCCACATTTTGATGGGGTTGTTTGTTTTTTTCTTGTAAATTTGTTTTAGTTCATTGTAGATTCTGGATATTAGCCCTTTGTCAGATGAGTAGGTTGTGAAAATTTTCTCCCATTTTGTGGGTTGCCTGTTCACTCTGATGGTAGTTTCTTTTGCTGTGCAGAAGCTCTTTAGTTTCATTAGATCCCATTTGTCAATTTTGTCTTTTGTTGCCATTGCTTTTGGTGTTTTAGACATGAAGTCCTTGCCCATGCCTATGTCCTGAATGGTAATGCCTAGGTTTTCTTCTAGGGTTTTTATGGTTTTAGGTCTAACATTTAAGTCTTTAATCCATCTTGAATTGATTTTTGTATAAGGTGTAAGGAAGGGATCCAGTTTCAGCTCTCTACATATGGCTAGCCAGTTTTCCCAGCACCATTTATTAAATAGGGAATCTTTTCCCCATTGCTTGTTTTTCTCAGGTTTGTCAAAGATCAGATAGTTGTAGATATGCAGCGTTATTTCTGAGGGCTCTGTTCTGTTCCATTGATCTATATCTCTGTTTTGGTACCAGTACTGTGCTGTTTTGGTTACTGTAGCCGTGTAGTATAGTTTGAAGTCAGGTACCATCATGCCTCCAGCTTTGTTCTTTTGGCTTAGGATTGACTTGGTGATGTGGGCTCTTTTTTGGTTCCATATGAACTTTAAAGTAGTTTTTTCCAATTCTGTGAAGAAAGTCATTGGTAGCTTGATGGGGATGGCATTAAATCTATAAATTACCTTGGGCAGTATGGCCATTTTCACGATATTGATTCTTCCTACCCATGAGCATGGAATGTTCTTCCATTAGTTTGTATCCTCTTTTATTTCCTTGAGCAGTGGTTTGTAGTTCTCCTTGAAGAGGCCCTTCACATCCCCTTTAAGTTGGATTCCTAGGTATTTTATTCTCTTTGAAGCAATTGTGAATGGGAGTTGACTCATGATTTGGCTCTCTGTTTGTCTGTTGTTGGTGTATAAGAATGCTTGTGATTTTTGTACATTGATTTTGTATCCTGAGACTTTGCTGAAGTTTCTTATCAGCTTAAGGAGATTTTGGGCTGAGACGATGGGGTTTTCTAGATATACAATGATGTCGTCTGCAAACAGGGGCAATTTGACTTCCTCTTTTCCTAATTGAATACCCTTTGTTTCCTTCTCCTGCGTAATTGCCCTGGCCAGAACTTCCAACACTATGTTGAATAGGAGTGGTGATAGAGGGCATCAATGTCTTGTGCCAGTTTTCAAAGGGAATGCTTCTAGTTTTTGCCCATTCATTATGATCTTGGCTGTGGGTTTGTCATAGATAGCTCTTATTATTTTGAAATACGTCCCATCAATACCTAATTTATTGAGAGTTTTTAGCATAAAGGGTTGTTGAATTTTGTCAAGGCCTTTTCTGCATCTATTGAGATAATCATGTGGTTTTTGTCTTTGGTTCTGTTTATATGCTGGTTTACATTTATTAATTTGTGTATATTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGGATAAGCTTTTTGATGTGCTGCTTGATTCGTTTTGCCGGTATTTTATTGAGGATTTTTGCATCAGTGTTCATCAAGGATATTGGTCTAAAATTCTCTTTTTTGGTTGTGTCTCTGGCCGGCTTTGGTATCAGAATGATGCTGGCCTCATAAAATGAGTTAGGAAGGATTCCCTCTTTTTCTATTGATTGGAATAGTTTCAGAAGGAATGGTACCAGTTCCTCCTTGTACCTCTGGTAGAATTCGGCTGTGAATCCATCTGGTCCTGGACTCTTTTTGGTTGGTAAGCTTTTGATTATTGCCACAATTTCAGATCCTGTTATTGGTCTATTCAGAGATTCAACTTCTTCCTGGTTTAGTCTTGGGAGAGGGTATGTGTCCAGGAATTTATCCCTTTCTTCTAGATTTTCTAGTTTATTTGTGTAGAGGTGATTGTAGCATTCTGTGATGGTAGTTTGTATTTCTGTGGGATCGGTGGTGATATCCCCTTTATCAGTTTTTATTGCATCTATTTGATTCTTCTCTCTTTTTTTCTTTATTAGTCTTGCTAGCGGTCTATCAATTTTGTTGATCCTTTCAAAAAACCAGCTCCTGGATTCATTAATTTTTTGAAGGGTTTTTTGTGTCTGTATTTCCTTCAGTTCTGCTCTGATTTTAGTTATTTCTTGCCTTCTGCTAGCTTTTGAATGTGTTTGCTCTTGTTTTTCTAGTTCTTTTAATTGTGATGTTAGGGTGTCAATTTTGGATCTTTCCTGCTTTCTCTTGTGGGCATTTAGTGCTATAAATTTCCCTCTACACACTGCTTTGACTGCATCCCGGAAATTCTGGTATGTTGTATCTTTGTTCTCATTGGTTTCAAAGAACATCTTTATTTCTGCCTTCATTTCGTTATGTACCCAGTAGTCATTCAGGAGCATGTTGTTCATTTTCCATGTAGTTGAGCGGTTTTGAGTGAGTTTCTTAATCCTGAGTTCTAGTTTGATTGCACTGTGGTCTGAGAGATACTTTGTTATAATTTCTGTTCTTTTACATTTGCTGAGGAGAGCTTTACTTCCAAGTATGTGGTCAATTTTGGAATAGGTGTGGTGTGGTGCTGAAAAACATGTATATTCTGTTGATTTGGGGTGGAGAGTTCTGTAGATGTCAATTAGGTCCGCTTGGTGCAGAGCTGAGTTCAATTCCTGGGTGTCCTTGTTGACTTTCTGTCTCGTTGATCTGTCTAATGTTGACAGTGGGGTGTTAAAGTCTCCCATTATTAATGTGTGGGAGTCTAAATCTCTTTGTAGGTCACTCAGGACTTGCTTTATGAATCTGGGTGCTCCTGTATTGTGTGCATATATATTTAAGATAGTTAGCTCTTCTTCTTGAATTGATCCCTTGACCATTATGTAATGGCCTTCTTTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTACCAGAGACTAGGATTGCAACCCCTGCCTTTTTTTGTTTTCCATTTGCTTGGTAGATCTTCCTCCCTCCTTTTATTTTGAGCCTATGTGTGTCTCTGCACGTGAGATGGGTTTCCTGAATACAGCACACTGATGGGTCTTGACTCTTTGTCCAATTTGCCAGTCTGTGTCTCTTAATTGGAGCATTTAATCCATTCACATTTAAAGTTAATATTGTTATGTGTGAATTTGATCCTGTCATTATGATGTTAGCTCGTTATTTTGCTCGTTAGTTGATGTAGTTTCTTCCTAGTCTCGATGGTCTTTACATTTTGGCATGATATTGCAGTGGCTGGTACCTGTTGTTCCTTTCCATGTTTAGCGCTTCCTTCAGGAGCTCTTTTAGGGCAGGCCTGGTGGTGACAAAATCTCTCAGCATTTGCTTGTCTGTAAAGTATTTTATTTCTCCTTCACTTATGAAGCTTAGTTTGGCTGGATATGAAATTCTCTGTTGAAAATTGTTTTCTTTAAGAATATTGAATATTGGCCCCCACTGCCTTCTGACTTGTAGGGTTTCTGCCGAGAGATCCGCTGTTAGTCTGATGGGCTTCCCTTTGAGGGTAACCCGACCTTTCTCTCTGGCTGCCCTTAACATTTTTTCCTTCATTTCAACTTTGGTGAATCTGACAAGTATGTGTCTTGGAGTTGCTCTTCTCGAGGAGTGTGGTGTTCTCTGTATTTCCTGAATCTGAACGTTGGCCTGCCTTGCTAGATTGGGGAAGTTCTCCTGGATAATATCTTGCAGAGTGTTTTCCAACTTGGTTTCATTCTCCCCATCACTTTCAGGTACACCAATCAGACTTAGATTTGGTCTTTTCACATAGTCCCATATTTCTTGGAGGCTTTTCTCATTTCTTTTTATTCTTTTTTCTCTAAACTTCCCTTCTCACTTCATTTCATTCATTTCATCTTCCATCGCTGATACCCTTTCTTCCAGTTGATTGCATCGGCTCCTGAGGCTTCTGCATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGCTTCATCAGCTCCTTTAAGCACTTCTCTGTATTCGTTATTCTAGTTTTACATTCTTCTAAATTTTTTTCAAAGTTTTCAACTTCTTTGCCTTTGGTTTGAATATCCTCCCATAGCTCGGAGTAATTTGATCATCTGAAGCCTTCTTCTCTCAGCTCGTCAAAGTCATTCTCCATCCAGCTTTGTTCCGTTGGTGGTGAGGAACTGCGTTCCTTTGGAGGAGGAGAGGTGCTCTGCTTTTTAGAGTTTCCAGTTTTTCTGTTCTCTTTTTTCCCCATCTTTGTGGTTTTATCTACTTTTGGTCTTTGATGATGGTGATGTACAAATGGGTTTTTGATGTGGATGTCCTTTCTGTTAGTTTTCCTTCTACCAGACAGGACCCTCAGCTCCAGGTCTGTTCGAATACCCTGCCGTGTGAGGTGTCAGTGTGCCCCCGCTGGGGGGTGCCTCTCAGTTAGGCTGCTCGGGTGTCAGAGGTCAGGGACCCACTTGAGGAGGCAGTCTGCCCGTTCTCAGATCTCCAGCTGCATTCTGGGAGAACCACTGCTCTCTTCAAAGCTGTCAGACAGGGACATTTAAGTCTGCAGAGGTTACTGCTGTCTTTTTGTTTGTCTGTGCCCTGCCCCCAGAGGTGGAGCCTACAGAGGCAGGCAGGCCTCTTTGAGCTGTGATGGGTTCCACCCAGTTTGAGATTCTCAGCTGCTTTGTTTACCTAAGCAATCCTGGGCAATGGCGGGCGACGGTCCCACAACCTCGCTGCTGCCTTGCAGTTTAAACTCAGACTGTTGTGCTAGCAATCAGCGAGACTCCGTGGGTGTAGGACCCTCCAAGCCAGGTGCGGGATATAATCTCGTGGTGCACCGTTTTTTAAGCCGGTCGGAAAAGCGCAGTAATCGGGTGGGAGTGACCGGATTTTCCAGATGCCGTCTGTCACTCTTTCTTTGACTCGGAAAGGGAACTCCCTTACCCCTTGCGCTTCCCAAGTGAGGCAATGCGTCGCCCTGCTTCAGCTCGCGTATGGTGAGCACACCCACTGACCTGCACCCACTGTCTGGCACTCCCTAGTGAGATGAACCTGGTACCTCAGATGGAAATGCAGAAATCACCCGACTTCTGCGTCGCTGATGCTGGGAGCTGTAGACCGGAGCTGTTCCTATTCGGCCATCTTGGCTCCTCCGCCTGAATATTAT 1 1 6020 L1PA2 LINE/L1 C 301
22 16676838 HG002.mat.svim_asm.INS.3 T TAAGAAACTCACTCAAGGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAAGAGATCGAGACCATCCCGGCTAAAACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAAATTAGCCGGGCGTAGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1 1 312 AluYa5 SINE/Alu + 329
vcf_annotation.gz doesn't open properly after replacing header
$ bcftools view vcf_annotation.gz.bac
Failed to read from vcf_annotation.gz.bac: unknown file type
The size of the TWP.txt file generated by this step in the repmask_vcf.sh script is zero
if --mammal if set, search for L1 5' inversion (Twin Priming and similar) and if SVA hits are within VNTR only (non retrotransposition polymorphism)
if [[ ${MAM} == "MAM" ]]
then
echo "Mammalian filters ON. Filtering..."
# FILTER 1: L1 Twin Priming and similar
rm TwP.txt &> /dev/null
# awk 1: find SVs IDs only matched by TwP, sort by SV ID
# join: join SV ID with its matched TEs from the OneCode output (sort by SV ID and reverse sort by strand to keep the order of "C,+")
# awk 2: write SV ID and coordiante of C L1 (odd line) and + L1 (even line) to one line
# awk 3: compare coordinate and annotate 5P_INV
awk '{
if ($6 == 2 && $11 == "C,+" && $10 ~ /LINE/) {
names = split($10, n, ",", seps);
ids = split($12, i, ",", seps);
Hi,
Thanks for your software, I use your example data and encounted this problem, the vcf_annotation.gz cannot be read and procced to get Twp.txt file, could you help me figure out it, thankyou
Run command
$ nextflow run https://github.com/cgroza/GraffiTE --reference hs37d5.chr22.fa --assemblies assemblies.csv --reads reads.csv --TE_library human_DFAM3.6.fasta
Command error:
Parse outputs...
Cleanup...
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
The following objects are masked from ‘package:base’:
Warning message:
The
x
argument ofas_tibble.matrix()
must have unique column names if.name_repair
is omitted as of tibble 2.0.0.ℹ Using compatibility
.name_repair
.Scanning file to determine attributes.
File attributes:
meta lines: 34
header_line: 35
variant count: 259
column count: 10
Meta line 34 read in.
All meta lines processed.
gt matrix initialized.
Character matrix gt created.
Character matrix gt rows: 259
Character matrix gt cols: 10
skip: 0
nrows: 259
row_num: 0
Processed variant: 259
All variants processed
[1] "CHROM" "POS" "qry_id" "REF"
[5] "ALT" "n_hits" "fragmts" "match_lengths"
[9] "repeat_ids" "matching_classes" "strands" "RM_id"
compute repeat proportion for each SVs...
Mammalian filters OFF, writing vcf...
The tag "~ID" is not defined in vcf_annotation.gz
Failed to read from standard input: unknown file type
Work dir:
~/TE/GraffiTE/test/GraffiTE_testset/work/3c/39e0978978f03d2ce4a63556c48160
Tip: you can replicate the issue by changing to the process work dir and entering the command
bash .command.run
-- Check '.nextflow.log' file for details
Follow the prompts to run:~/TE/GraffiTE/test/GraffiTE_testset/work/3c/39e0978978f03d2ce4a63556c48160/bash .command.run
$ bash .command.run
Mammalian filters OFF, writing vcf...
The tag "~ID" is not defined in vcf_annotation.gz
Failed to read from standard input: unknown file type
Looking further into the vcf_annotation file, there is no header
22 16147398 HG002.mat.svim_asm.INS.1 G GCCTCAGCCTCCCAAAGTGCTGGGATTATAAGCGTGAGCCACTGTGCCCAACCGATTTTTTTGTATTTTTAGTAAAGATGGGGGTTTCATCATCTTGGCTAGGCTGGTCTTGAACTCCTGATCTCGTGATCCACCCA 1 1 136 AluSg4 SINE/Alu C 209
22 16212443 HG002.mat.svim_asm.DEL.1 TTCTGTGAGATGAATGCACACATCACAAAGAAGTTTCTCAGAATGCTTCTGTCTAGTTTTTATGTGAAGATATTCCCTTTTCCACCACAGGCCTCAAAGCGCTCCAAATATCCACTCGCGGTTTCTGCAAAAAGAGTGTTTCAAAACTTCTCAATCAAAAGAAAGGTTCAAC T 0 0 0 None None None None
22 16287420 HG002.mat.svim_asm.DEL.2 GGCCCACTTTGTTCTTGCCGCTCCCCCTGCAGCAGGGGAAGCAGTGGCAGCACCACTTGCCCATCTTGCTCCTGAGTGTCTTCATAGCAGAGTCGTCGTGGTCTCCAGAAGT G 0 0 0 None None None None
22 16658445 HG002.mat.svim_asm.INS.2 A ATCATTTATTTTCTTTTTTTTCCCAGATCTTCGTGTTTTTTTTTAGATTTTTTTTTTTTTTATTTTACTTTAAGCTTTAGTGTACATGTGCACAATGTGCAGGTTAGTTACATATGTATACATGTGCCATGCTGGTGCCCTGCACCCACTAACTCGTCATCTAGCATTAGGTATATCTCCCAATGCTATCCCTCCCCCCACCCCCACCCCATAACAGTCCCCAGAGGGGGATATTCCCCTTCCTGTTTCCTTGTGATCTCATTGTTCAATTCCCACCTATGATTGAGAATATGCGGTGTTTGGTTTTTTGTTCTTGCAATAGTTTACTGAGAATGATGATTTCCAATTTCATCCATGTCCCTACAAAGGACATGAACTCATCATTTTTTATGGCTGCATAGTATTCCATGGTGTATATGTGCCACATTTTCTTAATCCAGTCTATCGTTGTTGGACATTTGGTTTGGTTCCAAGTCTGTGCTATTGTGAATAATGCCACAATAAACATACGTGTGCATGTGTCTTTATAGCAGCATGATTTATAGTCCTTTGGGTATATATCCAGTAATGGGATGGCTGGGTCAAATGGTATTTCCAGCTCTGGATCCCTGAGGAATCGCCACACTGACTTCCACAATGGTTGAACTAGTTTCCAGACCCACCAACAGTGTAAAAGTGTTCCTATTTCTCCACATCCTCTCCAGCACCTGTTGTTTCCTGACTTTTTAATAATCGCCATTCCAACTGGTGTGAGATGGTATCTCATTGTGGTATTGATTTGCATTTCTGTGATGGCCAGTGATGATGAGCATTTTTTCATGGGTTTTTTGGGTGCATAAATGCCTTCTTTTGAGAAGTGCCTGTTCATGTCCTTCGCCCACATTTTGATGGGGTTGTTTGTTTTTTTCTTGTAAATTTGTTTTAGTTCATTGTAGATTCTGGATATTAGCCCTTTGTCAGATGAGTAGGTTGTGAAAATTTTCTCCCATTTTGTGGGTTGCCTGTTCACTCTGATGGTAGTTTCTTTTGCTGTGCAGAAGCTCTTTAGTTTCATTAGATCCCATTTGTCAATTTTGTCTTTTGTTGCCATTGCTTTTGGTGTTTTAGACATGAAGTCCTTGCCCATGCCTATGTCCTGAATGGTAATGCCTAGGTTTTCTTCTAGGGTTTTTATGGTTTTAGGTCTAACATTTAAGTCTTTAATCCATCTTGAATTGATTTTTGTATAAGGTGTAAGGAAGGGATCCAGTTTCAGCTCTCTACATATGGCTAGCCAGTTTTCCCAGCACCATTTATTAAATAGGGAATCTTTTCCCCATTGCTTGTTTTTCTCAGGTTTGTCAAAGATCAGATAGTTGTAGATATGCAGCGTTATTTCTGAGGGCTCTGTTCTGTTCCATTGATCTATATCTCTGTTTTGGTACCAGTACTGTGCTGTTTTGGTTACTGTAGCCGTGTAGTATAGTTTGAAGTCAGGTACCATCATGCCTCCAGCTTTGTTCTTTTGGCTTAGGATTGACTTGGTGATGTGGGCTCTTTTTTGGTTCCATATGAACTTTAAAGTAGTTTTTTCCAATTCTGTGAAGAAAGTCATTGGTAGCTTGATGGGGATGGCATTAAATCTATAAATTACCTTGGGCAGTATGGCCATTTTCACGATATTGATTCTTCCTACCCATGAGCATGGAATGTTCTTCCATTAGTTTGTATCCTCTTTTATTTCCTTGAGCAGTGGTTTGTAGTTCTCCTTGAAGAGGCCCTTCACATCCCCTTTAAGTTGGATTCCTAGGTATTTTATTCTCTTTGAAGCAATTGTGAATGGGAGTTGACTCATGATTTGGCTCTCTGTTTGTCTGTTGTTGGTGTATAAGAATGCTTGTGATTTTTGTACATTGATTTTGTATCCTGAGACTTTGCTGAAGTTTCTTATCAGCTTAAGGAGATTTTGGGCTGAGACGATGGGGTTTTCTAGATATACAATGATGTCGTCTGCAAACAGGGGCAATTTGACTTCCTCTTTTCCTAATTGAATACCCTTTGTTTCCTTCTCCTGCGTAATTGCCCTGGCCAGAACTTCCAACACTATGTTGAATAGGAGTGGTGATAGAGGGCATCAATGTCTTGTGCCAGTTTTCAAAGGGAATGCTTCTAGTTTTTGCCCATTCATTATGATCTTGGCTGTGGGTTTGTCATAGATAGCTCTTATTATTTTGAAATACGTCCCATCAATACCTAATTTATTGAGAGTTTTTAGCATAAAGGGTTGTTGAATTTTGTCAAGGCCTTTTCTGCATCTATTGAGATAATCATGTGGTTTTTGTCTTTGGTTCTGTTTATATGCTGGTTTACATTTATTAATTTGTGTATATTGAACCAGCCTTGCATCCCAGGGATGAAGCCCACTTGATCATGGTGGATAAGCTTTTTGATGTGCTGCTTGATTCGTTTTGCCGGTATTTTATTGAGGATTTTTGCATCAGTGTTCATCAAGGATATTGGTCTAAAATTCTCTTTTTTGGTTGTGTCTCTGGCCGGCTTTGGTATCAGAATGATGCTGGCCTCATAAAATGAGTTAGGAAGGATTCCCTCTTTTTCTATTGATTGGAATAGTTTCAGAAGGAATGGTACCAGTTCCTCCTTGTACCTCTGGTAGAATTCGGCTGTGAATCCATCTGGTCCTGGACTCTTTTTGGTTGGTAAGCTTTTGATTATTGCCACAATTTCAGATCCTGTTATTGGTCTATTCAGAGATTCAACTTCTTCCTGGTTTAGTCTTGGGAGAGGGTATGTGTCCAGGAATTTATCCCTTTCTTCTAGATTTTCTAGTTTATTTGTGTAGAGGTGATTGTAGCATTCTGTGATGGTAGTTTGTATTTCTGTGGGATCGGTGGTGATATCCCCTTTATCAGTTTTTATTGCATCTATTTGATTCTTCTCTCTTTTTTTCTTTATTAGTCTTGCTAGCGGTCTATCAATTTTGTTGATCCTTTCAAAAAACCAGCTCCTGGATTCATTAATTTTTTGAAGGGTTTTTTGTGTCTGTATTTCCTTCAGTTCTGCTCTGATTTTAGTTATTTCTTGCCTTCTGCTAGCTTTTGAATGTGTTTGCTCTTGTTTTTCTAGTTCTTTTAATTGTGATGTTAGGGTGTCAATTTTGGATCTTTCCTGCTTTCTCTTGTGGGCATTTAGTGCTATAAATTTCCCTCTACACACTGCTTTGACTGCATCCCGGAAATTCTGGTATGTTGTATCTTTGTTCTCATTGGTTTCAAAGAACATCTTTATTTCTGCCTTCATTTCGTTATGTACCCAGTAGTCATTCAGGAGCATGTTGTTCATTTTCCATGTAGTTGAGCGGTTTTGAGTGAGTTTCTTAATCCTGAGTTCTAGTTTGATTGCACTGTGGTCTGAGAGATACTTTGTTATAATTTCTGTTCTTTTACATTTGCTGAGGAGAGCTTTACTTCCAAGTATGTGGTCAATTTTGGAATAGGTGTGGTGTGGTGCTGAAAAACATGTATATTCTGTTGATTTGGGGTGGAGAGTTCTGTAGATGTCAATTAGGTCCGCTTGGTGCAGAGCTGAGTTCAATTCCTGGGTGTCCTTGTTGACTTTCTGTCTCGTTGATCTGTCTAATGTTGACAGTGGGGTGTTAAAGTCTCCCATTATTAATGTGTGGGAGTCTAAATCTCTTTGTAGGTCACTCAGGACTTGCTTTATGAATCTGGGTGCTCCTGTATTGTGTGCATATATATTTAAGATAGTTAGCTCTTCTTCTTGAATTGATCCCTTGACCATTATGTAATGGCCTTCTTTGTCTCTTTTGATCTTTGTTGGTTTAAAGTCTGTTTTACCAGAGACTAGGATTGCAACCCCTGCCTTTTTTTGTTTTCCATTTGCTTGGTAGATCTTCCTCCCTCCTTTTATTTTGAGCCTATGTGTGTCTCTGCACGTGAGATGGGTTTCCTGAATACAGCACACTGATGGGTCTTGACTCTTTGTCCAATTTGCCAGTCTGTGTCTCTTAATTGGAGCATTTAATCCATTCACATTTAAAGTTAATATTGTTATGTGTGAATTTGATCCTGTCATTATGATGTTAGCTCGTTATTTTGCTCGTTAGTTGATGTAGTTTCTTCCTAGTCTCGATGGTCTTTACATTTTGGCATGATATTGCAGTGGCTGGTACCTGTTGTTCCTTTCCATGTTTAGCGCTTCCTTCAGGAGCTCTTTTAGGGCAGGCCTGGTGGTGACAAAATCTCTCAGCATTTGCTTGTCTGTAAAGTATTTTATTTCTCCTTCACTTATGAAGCTTAGTTTGGCTGGATATGAAATTCTCTGTTGAAAATTGTTTTCTTTAAGAATATTGAATATTGGCCCCCACTGCCTTCTGACTTGTAGGGTTTCTGCCGAGAGATCCGCTGTTAGTCTGATGGGCTTCCCTTTGAGGGTAACCCGACCTTTCTCTCTGGCTGCCCTTAACATTTTTTCCTTCATTTCAACTTTGGTGAATCTGACAAGTATGTGTCTTGGAGTTGCTCTTCTCGAGGAGTGTGGTGTTCTCTGTATTTCCTGAATCTGAACGTTGGCCTGCCTTGCTAGATTGGGGAAGTTCTCCTGGATAATATCTTGCAGAGTGTTTTCCAACTTGGTTTCATTCTCCCCATCACTTTCAGGTACACCAATCAGACTTAGATTTGGTCTTTTCACATAGTCCCATATTTCTTGGAGGCTTTTCTCATTTCTTTTTATTCTTTTTTCTCTAAACTTCCCTTCTCACTTCATTTCATTCATTTCATCTTCCATCGCTGATACCCTTTCTTCCAGTTGATTGCATCGGCTCCTGAGGCTTCTGCATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGCTTCATCAGCTCCTTTAAGCACTTCTCTGTATTCGTTATTCTAGTTTTACATTCTTCTAAATTTTTTTCAAAGTTTTCAACTTCTTTGCCTTTGGTTTGAATATCCTCCCATAGCTCGGAGTAATTTGATCATCTGAAGCCTTCTTCTCTCAGCTCGTCAAAGTCATTCTCCATCCAGCTTTGTTCCGTTGGTGGTGAGGAACTGCGTTCCTTTGGAGGAGGAGAGGTGCTCTGCTTTTTAGAGTTTCCAGTTTTTCTGTTCTCTTTTTTCCCCATCTTTGTGGTTTTATCTACTTTTGGTCTTTGATGATGGTGATGTACAAATGGGTTTTTGATGTGGATGTCCTTTCTGTTAGTTTTCCTTCTACCAGACAGGACCCTCAGCTCCAGGTCTGTTCGAATACCCTGCCGTGTGAGGTGTCAGTGTGCCCCCGCTGGGGGGTGCCTCTCAGTTAGGCTGCTCGGGTGTCAGAGGTCAGGGACCCACTTGAGGAGGCAGTCTGCCCGTTCTCAGATCTCCAGCTGCATTCTGGGAGAACCACTGCTCTCTTCAAAGCTGTCAGACAGGGACATTTAAGTCTGCAGAGGTTACTGCTGTCTTTTTGTTTGTCTGTGCCCTGCCCCCAGAGGTGGAGCCTACAGAGGCAGGCAGGCCTCTTTGAGCTGTGATGGGTTCCACCCAGTTTGAGATTCTCAGCTGCTTTGTTTACCTAAGCAATCCTGGGCAATGGCGGGCGACGGTCCCACAACCTCGCTGCTGCCTTGCAGTTTAAACTCAGACTGTTGTGCTAGCAATCAGCGAGACTCCGTGGGTGTAGGACCCTCCAAGCCAGGTGCGGGATATAATCTCGTGGTGCACCGTTTTTTAAGCCGGTCGGAAAAGCGCAGTAATCGGGTGGGAGTGACCGGATTTTCCAGATGCCGTCTGTCACTCTTTCTTTGACTCGGAAAGGGAACTCCCTTACCCCTTGCGCTTCCCAAGTGAGGCAATGCGTCGCCCTGCTTCAGCTCGCGTATGGTGAGCACACCCACTGACCTGCACCCACTGTCTGGCACTCCCTAGTGAGATGAACCTGGTACCTCAGATGGAAATGCAGAAATCACCCGACTTCTGCGTCGCTGATGCTGGGAGCTGTAGACCGGAGCTGTTCCTATTCGGCCATCTTGGCTCCTCCGCCTGAATATTAT 1 1 6020 L1PA2 LINE/L1 C 301
22 16676838 HG002.mat.svim_asm.INS.3 T TAAGAAACTCACTCAAGGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGATCACGAGGTCAAGAGATCGAGACCATCCCGGCTAAAACGGTGAAACCCCGTCTCTACTAAAAATACAAAAAAATTAGCCGGGCGTAGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCAGGAGAATGGCGTGAACCCGGGAGGCGGAGCTTGCAGTGAGCCGAGATCCCGCCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1 1 312 AluYa5 SINE/Alu + 329
vcf_annotation.gz doesn't open properly after replacing header
$ bcftools view vcf_annotation.gz.bac
Failed to read from vcf_annotation.gz.bac: unknown file type
The size of the TWP.txt file generated by this step in the repmask_vcf.sh script is zero
if --mammal if set, search for L1 5' inversion (Twin Priming and similar) and if SVA hits are within VNTR only (non retrotransposition polymorphism)
if [[ ${MAM} == "MAM" ]]
then
echo "Mammalian filters ON. Filtering..."
# FILTER 1: L1 Twin Priming and similar
rm TwP.txt &> /dev/null
# awk 1: find SVs IDs only matched by TwP, sort by SV ID
# join: join SV ID with its matched TEs from the OneCode output (sort by SV ID and reverse sort by strand to keep the order of "C,+")
# awk 2: write SV ID and coordiante of C L1 (odd line) and + L1 (even line) to one line
# awk 3: compare coordinate and annotate 5P_INV
awk '{
if ($6 == 2 && $11 == "C,+" && $10 ~ /LINE/) {
names = split($10, n, ",", seps);
ids = split($12, i, ",", seps);
The text was updated successfully, but these errors were encountered: