Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools merge incorrectly drops symbolic ALTs via vertical merge #2362

Open
ACEnglish opened this issue Jan 28, 2025 · 0 comments
Open

bcftools merge incorrectly drops symbolic ALTs via vertical merge #2362

ACEnglish opened this issue Jan 28, 2025 · 0 comments

Comments

@ACEnglish
Copy link

I have discovered that symbolic variants' END position isn't being considered when running bcftools merge, thus creating a vertical merge.
The documentation states that the command isn't intended for vertical merges, which I believe implies it will not perform a vertical merge, but it is performing a vertical merge, sometimes.

Example

A.vcf

##fileformat=VCFv4.1
##contig=<ID=chr1,length=248956422>
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample
chr1	147022730	SV1	N	<DEL>	.	PASS	SVLEN=-570334;END=147593064	GT	0/1

B.vcf

##fileformat=VCFv4.1
##contig=<ID=chr1,length=248956422>
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Other
chr1	147022730	SV2	N	<DEL>	.	PASS	SVLEN=-990414;END=148013144	GT	1/1

bcftools merge --no-index -m none A.vcf B.vcf

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_mergeVersion=1.21+htslib-1.21
##bcftools_mergeCommand=merge --no-index -m none A.vcf B.vcf; Date=Tue Jan 28 15:57:35 2025
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample	Other
chr1	147022730	SV1;SV2	N	<DEL>	.	PASS	SVLEN=-570334;END=147593064	GT	0/1	1/1

A temporary work around is to use -m id

##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=chr1,length=248956422>
##INFO=<ID=SVLEN,Number=1,Type=Integer,Description="SV length">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of structural variation">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##bcftools_mergeVersion=1.21+htslib-1.21
##bcftools_mergeCommand=merge --no-index -m id A.vcf B.vcf; Date=Tue Jan 28 15:58:26 2025
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample	Other
chr1	147022730	SV1	N	<DEL>	.	PASS	SVLEN=-570334;END=147593064	GT	0/1	./.
chr1	147022730	SV2	N	<DEL>	.	PASS	SVLEN=-990414;END=148013144	GT	./.	1/1

However, assigning unique IDs to variants across files/experiments is non-trivial.

Note that vertical merging happens with/without --no-index.

Original reporter: ACEnglish/truvari#256

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant