Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

header error in variant_calls.snps.phrased.vcf.gz #36

Open
AzizHN opened this issue Jun 23, 2023 · 6 comments
Open

header error in variant_calls.snps.phrased.vcf.gz #36

AzizHN opened this issue Jun 23, 2023 · 6 comments

Comments

@AzizHN
Copy link

AzizHN commented Jun 23, 2023

Hello I ran this command in order to detect variants in my mapped ONT reads (mapped with minimap2)
NanoCaller --mode all --sequencing ont --haploid_genome --bam sorted_mapped_reads.bam --ref genes.fna

I got this as a result:

2023-06-23 12:27:16.562651: Starting NanoCaller.

NanoCaller command and arguments are saved in the following file: /home/aziz/mapping/SRR23337893/args

2023-06-23 12:27:16.947255: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
SNP Calling Progress: 100%|███████████████████████| 2/2 [00:00<00:00, 6.89it/s]

2023-06-23 12:27:18.763662: Combining SNP calls.

2023-06-23 12:27:18.764897: Compressing and indexing SNP calls.
Writing to /tmp/bcftools.dkVQT8
Merging 1 temporary files
Cleaning
Done

2023-06-23 12:27:18.824115: SNP calling completed. Time taken= 0.4034

Indel Calling Progress: 100%|█████████████████████| 2/2 [00:00<00:00, 3.99it/s]

2023-06-23 12:27:19.487620: Compressing and indexing indel calls.
Checking the headers and starting positions of 2 files
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
Failed to parse header: /home/aziz/mapping/SRR23337893/variant_calls.snps.phased.vcf.gz

2023-06-23 12:27:20.501190: Indel calling completed. Time taken= 1.6770

2023-06-23 12:27:20.501373: Total Time Elapsed: 3.94 seconds

It seems that everything is going well, but there was a problem in the header in the file variant_calls.snps.phased.vcf.gz
2023-06-23 12:27:19.487620: Compressing and indexing indel calls.
Checking the headers and starting positions of 2 files
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
Failed to parse header: /home/aziz/mapping/SRR23337893/variant_calls.snps.phased.vcf.gz

Does this error can influence my results, does anyone have an idea about it ? Thanks in advance

@umahsn
Copy link
Collaborator

umahsn commented Jun 23, 2023

Hi,

Can you check if there any any intermediate files in /home/aziz/mapping/SRR23337893/ under intermediate_snp_files or intermediate_phase_files subfolders, or if there is any variant_calls.snps.vcf.gz file created? It seems very suspicious that SNP calling took only 0.4s so I am wondering if that step did not run correctly.

@AzizHN
Copy link
Author

AzizHN commented Jun 26, 2023

Hello @umahsn, thank you for your reply,
Yes I have so many intermediate subfolders :
intermediate_indel_files containing 2 files (variant_calls.6.indel.vcf and variant_calls.raw.indel.vcf)
intermediate_phase_files containing 4 files (2X refsequenceID.snps.phased.vcf.gz and 2X refsequenceID.snps.phased.vcf.gz.tbi) ( I have 2 ref seqs in my fasta ref file)
intermediate_snp_files containing 2 files (combined.snps.vcf and variant_calls.3.snps.vcf).

And yes, there are a variant_calls.snps.vcf.gz created (514 octets): a 7-lines header and 9-lines variants table.

My input files are a BAM file (555,1 Ko) and my ref is a fasta file (3,6 Ko)

@umahsn
Copy link
Collaborator

umahsn commented Jun 27, 2023

Hi, I think there might be a problem with passing the filenames internally within NanoCaller for haploid genomes. Let me check this and get back to you.

@umahsn
Copy link
Collaborator

umahsn commented Jun 27, 2023

Can you tell me if /home/aziz/mapping/SRR23337893/variant_calls.snps.phased.vcf.gz or refsequenceID.snps.phased.vcf.gz files are empty and if they have a header?

@AzizHN
Copy link
Author

AzizHN commented Jun 28, 2023

Hello @umahsn thanks you for your response.
The phased files are always empty !!

@umahsn
Copy link
Collaborator

umahsn commented Jul 11, 2023

Hi,

I checked the issue and it turns out that presence of colon symbol ":" in the names of reference sequences is causing the problem. NanoCaller uses a linux system commands to run whatsapp for phasing and bcftools for VCF file manipulation. As a result, if a file VCF file that is named after a reference sequence that has colon in the name, then linux is not able to resolve the path to the file correctly. Once I replace colon with some other symbol in the reference and BAM files, it runs correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants