Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genotyping dir is missing #55

Open
soyboy-hub opened this issue Jan 28, 2025 · 17 comments
Open

Genotyping dir is missing #55

soyboy-hub opened this issue Jan 28, 2025 · 17 comments

Comments

@soyboy-hub
Copy link

soyboy-hub commented Jan 28, 2025

Hello, Im trying to genotype my lines with ONT long reads, I go pipeline with this parameters:

TOOLS/nextflow run TOOLS/GraffiTE/main.nf
--longreads ARABIDOPSIS_DATA/WGS_reads.csv
--reads ARABIDOPSIS_DATA/WGS_reads.csv
--TE_library ARABIDOPSIS_DATA/telib.fasta
--reference ARABIDOPSIS_DATA/TAIR10_1.fna
--graph_method graphaligner
--out ARABIDOPSIS_DATA/GraffiTE_out
--cores 40
-with-singularity graffite_latest.sif

WGS_reads.csv:
path,sample,type
ARABIDOPSIS_DATA/demux/barcode01.trim.fastq,sam1,ont
ARABIDOPSIS_DATA/demux/barcode02.trim.fastq,sam2,ont

Everything is okay, but Genotyping dir is missing, I have only 3rd step of analysis and file with unspecified polymorphisms.
Thank you for any suggestions!

@soyboy-hub
Copy link
Author

UPD: I found text of the error (didn't see because of the screen session):

[a7/2e764f] map_longreads (1) | 2 of 2 ✔
[63/73f18d] sniffles_sample_call (2) | 2 of 2 ✔
[3f/63cd8d] sniffles_population_call (1) | 1 of 1 ✔
[91/ee8787] repeatmask_VCF (1) | 1 of 1 ✔
[5b/87a25f] tsd_prep (1) | 1 of 1 ✔
[65/132d67] tsd_search (1) | 2 of 2 ✔
[e8/87d8bf] tsd_report (1) | 1 of 1 ✔
[fa/07ab9c] make_graph (1) | 0 of 1
[- ] graph_align_reads -
[- ] vg_call -
[- ] merge_VCFs -
ERROR ~ Error executing process > 'make_graph (1)'

Caused by:
Process make_graph (1) terminated with an error exit status (139)

Command executed:

bcftools sort -Oz -o sorted.vcf.gz pangenome.vcf
tabix sorted.vcf.gz
mkdir index

  export TMPDIR=/home/kirillt
  vg construct -a  -r TAIR10_1.fna -v pangenome.vcf -m 1024 > index/index.v

vg snarls index/index.vg > index/index.pb

Command exit status:
139

Command output:
(empty)

Command error:
Writing to /tmp/bcftools.JblTEp
Merging 1 temporary files
Cleaning
Done
index file TAIR10_1.fna.fai not found, generating...
warning:[vg::Constructor] Lowercase characters found in NC_003070.9; coercing to uppercase.
warning:[vg::Constructor] Lowercase characters found in variant; coercing to uppercase:
NC_003070.9 270495 Sniffles2.INS.1M0 a aAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG 60 PASS PRECISE;SVTYPE=INS;SVLEN=260;END=270495;SUPPORT=16;COVERAGE=25,25,25,24,24;STRAND=+-;AC=4;STDEV_LEN=4;STDEV_POS=66.973;SUPP_VEC=11;n_hits=2;fragmts=1,1;match_lengths=214,218;repeat_ids=AT2TE24415,AT2TE57390;matching_classes=DNA/ATDNAI27T9C,DNA/VANDAL17;RM_hit_strands=C,C;RM_hit_IDs=309,310;total_match_length=251;total_match_span=0.976654
warning:[vg::Constructor] Unsupported IUPAC ambiguity codes found in NC_003070.9; coercing to N.
warning:[vg::Constructor] Lowercase characters found in NC_003071.7; coercing to uppercase.
warning:[vg::Constructor] Unsupported IUPAC ambiguity codes found in NC_003071.7; coercing to N.
warning:[vg::Constructor] Lowercase characters found in NC_003074.8; coercing to uppercase.
warning:[vg::Constructor] Unsupported IUPAC ambiguity codes found in NC_003074executor > local (11)
[a7/2e764f] map_longreads (1) | 2 of 2 ✔
[63/73f18d] sniffles_sample_call (2) | 2 of 2 ✔
[3f/63cd8d] sniffles_population_call (1) | 1 of 1 ✔
[91/ee8787] repeatmask_VCF (1) | 1 of 1 ✔
[5b/87a25f] tsd_prep (1) | 1 of 1 ✔
[65/132d67] tsd_search (1) | 2 of 2 ✔
[e8/87d8bf] tsd_report (1) | 1 of 1 ✔
[fa/07ab9c] make_graph (1) | 1 of 1, failed: 1 ✘
[- ] graph_align_reads -
[- ] vg_call -
[- ] merge_VCFs -
ERROR ~ Error executing process > 'make_graph (1)'

Caused by:
Process make_graph (1) terminated with an error exit status (139)

Command executed:

bcftools sort -Oz -o sorted.vcf.gz pangenome.vcf
tabix sorted.vcf.gz
mkdir index

  export TMPDIR=/home/kirillt
  vg construct -a  -r TAIR10_1.fna -v pangenome.vcf -m 1024 > index/index.v

vg snarls index/index.vg > index/index.pb

Command exit status:
139

Command output:
(empty)

Command error:
Writing to /tmp/bcftools.JblTEp
Merging 1 temporary files
Cleaning
Done
index file TAIR10_1.fna.fai not found, generating...
warning:[vg::Constructor] Lowercase characters found in NC_003070.9; coercing to uppercase.
warning:[vg::Constructor] Lowercase characters found in variant; coercing to uppercase:
NC_003070.9 270495 Sniffles2.INS.1M0 a aAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG 60 PASS PRECISE;SVTYPE=INS;SVLEN=260;END=270495;SUPPORT=16;COVERAGE=25,25,25,24,24;STRAND=+-;AC=4;STDEV_LEN=4;STDEV_POS=66.973;SUPP_VEC=11;n_hits=2;fragmts=1,1;match_lengths=214,218;repeat_ids=AT2TE24415,AT2TE57390;matching_classes=DNA/ATDNAI27T9C,DNA/VANDAL17;RM_hit_strands=C,C;RM_hit_IDs=309,310;total_match_length=251;total_match_span=0.976654
warning:[vg::Constructor] Unsupported IUPAC ambiguity codes found in NC_003070.9; coercing to N.
warning:[vg::Constructor] Lowercase characters found in NC_003071.7; coercing to uppercase.
warning:[vg::Constructor] Unsupported IUPAC ambiguity codes found in NC_003071.7; coercing to N.
warning:[vg::Constructor] Lowercase characters found in NC_003074.8; coercing to uppercase.
warning:[vg::Constructor] Unsupported IUPAC ambiguity codes found in NC_003074.8; coercing to N.
warning:[vg::Constructor] Lowercase characters found in NC_003075.7; coercing to uppercase.
warning:[vg::Constructor] Unsupported IUPAC ambiguity codes found in NC_003075.7; coercing to N.
warning:[vg::Constructor] Lowercase characters found in NC_003076.8; coercing to uppercase.
.command.sh: line 9: 20 Segmentation fault vg snarls index/index.vg > index/index.pb

Work dir:
/home/kirillt/work/fa/07ab9cefae9a6393c0506ac126923d

Container:
/home/kirillt/./graffite_latest.sif

Tip: you can replicate the issue by changing to the process work dir and entering the command bash .command.run

-- Check '.nextflow.log' file for details

@cgroza
Copy link
Owner

cgroza commented Jan 28, 2025 via email

@soyboy-hub
Copy link
Author

It seems vg could not index the snarls in the graph. Possibly due to insufficient memory. Could you try increasing the memory using the --make_graph_memory parameter? Cristian Groza

Done, with the following parameters:

TOOLS/nextflow run TOOLS/GraffiTE/main.nf -with-singularity graffite_latest.sif
--longreads ARABIDOPSIS_DATA/WGS_At_paper_2025_11_samples/test.csv
--reads ARABIDOPSIS_DATA/WGS_At_paper_2025_11_samples/test.csv
--TE_library ARABIDOPSIS_DATA/WGS_At_paper_2025_11_samples/telib.fasta
--reference ARABIDOPSIS_DATA/TAIR10_1.fna
--graph_method graphaligner
--out ARABIDOPSIS_DATA/WGS_At_paper_2025_11_samples/GraffiTE_out
--cores 40
--make_graph_memory 30

Still have the same error.

@soyboy-hub
Copy link
Author

I have also changed memory parameters directly in nextflow file, still same error

@cgroza
Copy link
Owner

cgroza commented Jan 28, 2025

30G might still not be enough for vg snarls.

Also, when passing the parameter, it might worth specifying the unit by passing like --make_graph_memory 60G.

@soyboy-hub
Copy link
Author

30G might still not be enough for vg snarls.

Also, when passing the parameter, it might worth specifying the unit by passing like --make_graph_memory 60G.

Have the same error with this additional parameter

@cgroza
Copy link
Owner

cgroza commented Jan 28, 2025

Could you tell us more about the machine and environment in which you are running?

@soyboy-hub
Copy link
Author

Could you tell us more about the machine and environment in which you are running?

my machine (server) running under Ubuntu 20.04.6 LTS (GNU/Linux 5.3.0-18-generic x86_64), singularity version 3.8.3+231-g9dceb4240

@soyboy-hub
Copy link
Author

UPD: the graphaligner method fails as I mentioned above, the giraffe method isn't working at all (producing error "Unsupported --graph_method. --graph_method must be pangenie, giraffe or graphaligner."), while pangenie method performing well on my machine.

@cgroza
Copy link
Owner

cgroza commented Jan 28, 2025

That was indeed an error in our code. I just pushed a commit fixing that.
Could you also add how much memory you have?

All else failing, could you share your VCF?

Giraffe will also fail because it requires the same step that is currently failing for you.

@soyboy-hub
Copy link
Author

That was indeed an error in our code. I just pushed a commit fixing that. Could you also add how much memory you have?

All else failing, could you share your VCF?

Giraffe will also fail because it requires the same step that is currently failing for you.

I have 504G of memory. Which .vcf I should share? That produced on the third step?

@cgroza
Copy link
Owner

cgroza commented Jan 28, 2025

That is more than sufficient. Yes, pangenome.vcf would be perfect. I can try that step on my end, or look for a flaw in the output.

@soyboy-hub
Copy link
Author

That is more than sufficient. Yes, pangenome.vcf would be perfect. I can try that step on my end, or look for a flaw in the output.

Yes, here it is
https://drive.google.com/file/d/1jmfPwdhhvBv2dugEeICTQAvS0BZHN0tz/view?usp=share_link

@cgroza
Copy link
Owner

cgroza commented Jan 29, 2025

Hi

Thank you for sharing.

I ran your VCF containing 614 variants using

nextflow run ../GraffiTE/main.nf -profile standard -with-singularity ../graffite_latest.sif  --reference GCF_000001735.4_TAIR10.1_genomic.fna   --graph_method  graphaligner --graffite_vcf ~/pangenome.vcf --genotype true

GraffiTE successfully completed the make_graph step, including the vg snarls step without segfault.

Therefore, the problem could still be a resource allocation issue on your server?

@cgroza
Copy link
Owner

cgroza commented Jan 29, 2025

I am attaching the graph files and index for your own use:

https://drive.google.com/file/d/1KGZ8MaKwAFEGzvKzTiyRlOrgmjvOwNo2/view?usp=drive_link

@soyboy-hub
Copy link
Author

I am attaching the graph files and index for your own use:

https://drive.google.com/file/d/1KGZ8MaKwAFEGzvKzTiyRlOrgmjvOwNo2/view?usp=drive_link

Thank you so much for help! How can I check are there any allocation issues? We have many users on the server but I can use all threads (for example).

@clemgoub
Copy link
Collaborator

clemgoub commented Feb 5, 2025

I hope this will be helpful: when using the cluster profile, and while nextflow is submitting job, you can open a second terminal window and look the job status (for example using squeue if you use Slurm). That how sometimes I identify discrepancies between my expected request and the actual one.

Let us know how things go on your end,

Cheers,

Clément

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants