Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allel.read_vcf RuntimeError: VCF file is missing mandatory header line ("#CHROM...") #376

Open
nathanvranken opened this issue Feb 16, 2022 · 3 comments

Comments

@nathanvranken
Copy link

When trying to read some data from a vcf file of simulated data using msprime, I keep running into the same error.

Here is the head of the vcf file (20220112_msprime_length10000000.0_split100000.0_gene_flow100.0_chr2.vcf.gz):

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##source=tskit 0.2.3
##contig=<ID=2,length=10000000>
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  S.0     S.1     S.2     S.3     S.4     I.0     I.1     I.2     I.3     I.4     M.0     M.1     M.2     M.3     M.4     C.0     C.1     C.2     C.3     C.4
2       785     .       0       1       .       PASS    .       GT      0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     1|1     1|1     1|1     1|1     1|1
2       1002    .       0       1       .       PASS    .       GT      0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|0     0|1     0|0     0|0     0|0     0|0     0|0     0|0     0|0

When running the following code:
allel.read_vcf(vcf, fields=['calldata/GT'], samples=None, alt_number=1, region='2:0-10000', tabix='tabix')

I consistently get the following error:
RuntimeError: VCF file is missing mandatory header line ("#CHROM...")

Does anyone have an idea what might be causing this error?

@hardingnj
Copy link
Collaborator

Hi @nathanvranken ,

I'm able to copy and paste that into a text file, bgzip, index then load using allel.read_vcf without any problem.

Things to try:

@mckeowr1
Copy link

I ran into the same issue and was able to resolve it by re-zipping with bcftools view $vcf -output-type z --output-file $new . I did notice that if I did not create a new .tbi and .cbi index it gave me the same error. It'd be interesting to know when the index files are used.

@alimanfoo
Copy link
Contributor

Hi @nathanvranken, @mckeowr1, this may possibly be caused by having an index file (.tbi) which is older than the VCF file. In that case, tabix issues a warning to stderr, and then proceeds. However, currently scikit-allel sends stderr to stdout, so this warning gets parsed as if it was within the VCF file, and interrupts the usual parsing of headers. The offending line is here.

A workaround is to touch the index file so it's newer than the VCF.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants