Problem with grist output #278

marsfro · 2023-04-19T10:00:26Z

Hello everyone!
Could you please help me with it:
I launched grist next way:
genome-grist run conf-tutorial.yml summarize_gather summarize_mapping

conf-tutorial.yml
samples:

S1e8747dc_1
outdir: output_S1e8747dc/
sourmash_databases:
gtdb-rs207.genomic-reps.dna.k31.zip

Grist found only 1 genome and in mapping folder only 1 bam file
The output of sourmash fron this sample was 21 genomes and and the genome found by the grist is not among them.

What's wrong?
My log file:

Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads

copy_sample_genomes_to_output_wc 1 1 1
gather_reads_wc 1 1 1
make_combined_info_csv_wc 1 1 1
make_gather_notebook_wc 1 1 1
make_mapping_notebook_wc 1 1 1
set_kernel 1 1 1
smash_trim_wc 1 1 1
sourmash_gather_wc 1 1 1
sourmash_prefetch_wc 1 1 1
summarize_gather 1 1 1
summarize_mapping 1 1 1
summarize_samtools_depth_wc 2 1 1
total 13 1 1

Select jobs to execute...

[Fri Apr 14 00:36:17 2023]
rule smash_trim_wc:
input: output_S1e8747dc/trim/S1e8747dc_1.trim.fq.gz
output: output_S1e8747dc/sigs/S1e8747dc_1.trim.sig.zip
jobid: 3
reason: Missing output files: output_S1e8747dc/sigs/S1e8747dc_1.trim.sig.zip
wildcards: sample=S1e8747dc_1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/be8b6eadbe483a28b8a41700266a1d23_
[Fri Apr 14 03:00:37 2023]
Finished job 3.
1 of 13 steps (8%) done
Select jobs to execute...

[Fri Apr 14 03:00:37 2023]
Job 6:
Find all potentially relevant database matches for S1e8747dc_1

Reason: Missing output files: output_S1e8747dc/gather/S1e8747dc_1.prefetch.csv.gz; Input files updated by another job: output_S1e8747dc/sigs/S1e8747dc_1.trim.sig.zip

Activating conda environment: .snakemake/conda/be8b6eadbe483a28b8a41700266a1d23_
Touching output file output_S1e8747dc/gather/S1e8747dc_1.prefetch.csv.gz.
Touching output file output_S1e8747dc/gather/S1e8747dc_1.known.sig.zip.
Touching output file output_S1e8747dc/gather/S1e8747dc_1.unknown.sig.zip.
[Fri Apr 14 03:49:03 2023]
Finished job 6.
2 of 13 steps (15%) done
Select jobs to execute...

[Fri Apr 14 03:49:03 2023]
Job 2:
Run gather for S1e8747dc_1

Reason: Missing output files: output_S1e8747dc/gather/S1e8747dc_1.gather.csv.gz; Input files updated by another job: output_S1e8747dc/gather/S1e8747dc_1.prefetch.csv.gz, output_S1e8747dc/sigs/S1e8747dc_1.trim.sig.zip

Activating conda environment: .snakemake/conda/be8b6eadbe483a28b8a41700266a1d23_
[Fri Apr 14 03:49:21 2023]
Finished job 2.
3 of 13 steps (23%) done
Select jobs to execute...

[Fri Apr 14 03:49:21 2023]
localcheckpoint gather_reads_wc:
input: output_S1e8747dc/gather/S1e8747dc_1.gather.csv.gz
output: output_S1e8747dc/gather/.gather.S1e8747dc_1
jobid: 9
reason: Missing output files: output_S1e8747dc/gather/.gather.S1e8747dc_1; Input files updated by another job: output_S1e8747dc/gather/S1e8747dc_1.gather.csv.gz
wildcards: sample=S1e8747dc_1
resources: tmpdir=/tmp
Downstream jobs will be updated after completion.

Touching output file output_S1e8747dc/gather/.gather.S1e8747dc_1.
[Fri Apr 14 03:49:21 2023]
Finished job 9.
4 of 13 steps (31%) done
Select jobs to execute...

[Fri Apr 14 03:49:21 2023]
checkpoint copy_sample_genomes_to_output_wc:
input: genbank_cache/GCF_014648495.1_genomic.fna.gz, genbank_cache/GCF_014648495.1.info.csv
output: output_S1e8747dc/genomes/.genomes.S1e8747dc_1
jobid: 8
reason: Missing output files: output_S1e8747dc/genomes/.genomes.S1e8747dc_1
wildcards: sample=S1e8747dc_1
resources: tmpdir=/tmp
Downstream jobs will be updated after completion.

Touching output file output_S1e8747dc/genomes/.genomes.S1e8747dc_1.
[Fri Apr 14 03:49:21 2023]
Finished job 8.
5 of 23 steps (22%) done
Select jobs to execute...

[Fri Apr 14 03:49:21 2023]
rule minimap_wc:
input: output_S1e8747dc/genomes/GCF_014648495.1_genomic.fna.gz, output_S1e8747dc/trim/S1e8747dc_1.trim.fq.gz
output: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
jobid: 25
reason: Missing output files: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
wildcards: sample=S1e8747dc_1, ident=GCF_014648495.1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/052c7d4415d4fa072e20f9c6e1aa5026_
[Fri Apr 14 05:38:07 2023]
Finished job 25.
6 of 23 steps (26%) done
Select jobs to execute...

[Fri Apr 14 05:38:07 2023]
rule samtools_count_wc:
input: output_S1e8747dc/genomes/GCF_014648495.1_genomic.fna.gz, output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
output: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.count_mapped_reads.txt
jobid: 27
reason: Missing output files: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.count_mapped_reads.txt; Input files updated by another job: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
wildcards: dir=mapping, sample=S1e8747dc_1, ident=GCF_014648495.1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/9b3a1923d8812e952bfc5c5b9669e4d4_
[Fri Apr 14 05:38:36 2023]
Finished job 27.
7 of 23 steps (30%) done
Select jobs to execute...

[Fri Apr 14 05:38:36 2023]
rule samtools_mpileup_wc:
input: output_S1e8747dc/genomes/GCF_014648495.1_genomic.fna.gz, output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
output: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bcf, output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.vcf.gz, output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.vcf.gz.csi
jobid: 26
reason: Missing output files: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.vcf.gz; Input files updated by another job: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
wildcards: dir=mapping, sample=S1e8747dc_1, ident=GCF_014648495.1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/9b3a1923d8812e952bfc5c5b9669e4d4_
[Fri Apr 14 06:13:50 2023]
Finished job 26.
8 of 23 steps (35%) done
Select jobs to execute...

[Fri Apr 14 06:13:50 2023]
rule bam_to_fastq_wc:
input: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
output: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.mapped.fq.gz
jobid: 31
reason: Missing output files: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.mapped.fq.gz; Input files updated by another job: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
wildcards: bam=S1e8747dc_1.x.GCF_014648495.1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/052c7d4415d4fa072e20f9c6e1aa5026_
[Fri Apr 14 06:24:29 2023]
Finished job 31.
9 of 23 steps (39%) done
Select jobs to execute...

[Fri Apr 14 06:24:29 2023]
rule bam_to_depth_wc:
input: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
output: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.depth.txt
jobid: 24
reason: Missing output files: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.depth.txt; Input files updated by another job: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.bam
wildcards: dir=mapping, bam=S1e8747dc_1.x.GCF_014648495.1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/052c7d4415d4fa072e20f9c6e1aa5026_
[Fri Apr 14 06:25:29 2023]
Finished job 24.
10 of 23 steps (43%) done
Select jobs to execute...

[Fri Apr 14 06:25:29 2023]
rule extract_leftover_reads_wc:
input: output_S1e8747dc/gather/S1e8747dc_1.gather.csv.gz, output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.mapped.fq.gz
output: output_S1e8747dc/leftover/.leftover.S1e8747dc_1
jobid: 30
reason: Missing output files: output_S1e8747dc/leftover/.leftover.S1e8747dc_1; Input files updated by another job: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.mapped.fq.gz
wildcards: sample=S1e8747dc_1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/be8b6eadbe483a28b8a41700266a1d23_
Touching output file output_S1e8747dc/leftover/.leftover.S1e8747dc_1.
[Fri Apr 14 07:04:28 2023]
Finished job 30.
11 of 23 steps (48%) done
Select jobs to execute...

[Fri Apr 14 07:04:28 2023]
rule summarize_samtools_depth_wc:
input: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.depth.txt, output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.vcf.gz, output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.count_mapped_reads.txt
output: output_S1e8747dc/mapping/S1e8747dc_1.summary.csv
jobid: 13
reason: Missing output files: output_S1e8747dc/mapping/S1e8747dc_1.summary.csv; Input files updated by another job: output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.count_mapped_reads.txt, output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.vcf.gz, output_S1e8747dc/mapping/S1e8747dc_1.x.GCF_014648495.1.depth.txt
wildcards: dir=mapping, sample=S1e8747dc_1
resources: tmpdir=/tmp

[Fri Apr 14 07:04:31 2023]
Finished job 13.
12 of 23 steps (52%) done
Select jobs to execute...

[Fri Apr 14 07:04:31 2023]
rule map_leftover_reads_wc:
input: output_S1e8747dc/mapping/S1e8747dc_1.summary.csv, output_S1e8747dc/genomes/GCF_014648495.1_genomic.fna.gz, output_S1e8747dc/leftover/.leftover.S1e8747dc_1
output: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.bam
jobid: 29
reason: Missing output files: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.bam; Input files updated by another job: output_S1e8747dc/mapping/S1e8747dc_1.summary.csv, output_S1e8747dc/leftover/.leftover.S1e8747dc_1
wildcards: sample=S1e8747dc_1, ident=GCF_014648495.1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/052c7d4415d4fa072e20f9c6e1aa5026_
[Fri Apr 14 08:00:10 2023]
Finished job 29.
13 of 23 steps (57%) done
Select jobs to execute...

[Fri Apr 14 08:00:10 2023]
rule samtools_count_wc:
input: output_S1e8747dc/genomes/GCF_014648495.1_genomic.fna.gz, output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.bam
output: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.count_mapped_reads.txt
jobid: 33
reason: Missing output files: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.count_mapped_reads.txt; Input files updated by another job: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.bam
wildcards: dir=leftover, sample=S1e8747dc_1, ident=GCF_014648495.1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/9b3a1923d8812e952bfc5c5b9669e4d4_
[Fri Apr 14 08:00:38 2023]
Finished job 33.
14 of 23 steps (61%) done
Select jobs to execute...

[Fri Apr 14 08:00:38 2023]
rule samtools_mpileup_wc:
input: output_S1e8747dc/genomes/GCF_014648495.1_genomic.fna.gz, output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.bam
output: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.bcf, output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.vcf.gz, output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.vcf.gz.csi
jobid: 32
reason: Missing output files: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.vcf.gz; Input files updated by another job: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.bam
wildcards: dir=leftover, sample=S1e8747dc_1, ident=GCF_014648495.1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/9b3a1923d8812e952bfc5c5b9669e4d4_
[Fri Apr 14 08:35:46 2023]
Finished job 32.
15 of 23 steps (65%) done
Select jobs to execute...

[Fri Apr 14 08:35:46 2023]
rule bam_to_depth_wc:
input: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.bam
output: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.depth.txt
jobid: 28
reason: Missing output files: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.depth.txt; Input files updated by another job: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.bam
wildcards: dir=leftover, bam=S1e8747dc_1.x.GCF_014648495.1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/052c7d4415d4fa072e20f9c6e1aa5026_
[Fri Apr 14 08:36:47 2023]
Finished job 28.
16 of 23 steps (70%) done
Select jobs to execute...

[Fri Apr 14 08:36:47 2023]
rule summarize_samtools_depth_wc:
input: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.depth.txt, output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.vcf.gz, output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.count_mapped_reads.txt
output: output_S1e8747dc/leftover/S1e8747dc_1.summary.csv
jobid: 14
reason: Missing output files: output_S1e8747dc/leftover/S1e8747dc_1.summary.csv; Input files updated by another job: output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.depth.txt, output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.vcf.gz, output_S1e8747dc/leftover/S1e8747dc_1.x.GCF_014648495.1.count_mapped_reads.txt
wildcards: dir=leftover, sample=S1e8747dc_1
resources: tmpdir=/tmp

[Fri Apr 14 08:36:49 2023]
Finished job 14.
17 of 23 steps (74%) done
Select jobs to execute...

[Fri Apr 14 08:36:49 2023]
rule make_combined_info_csv_wc:
input: output_S1e8747dc/genomes/GCF_014648495.1.info.csv
output: output_S1e8747dc/gather/S1e8747dc_1.genomes.info.csv
jobid: 7
reason: Missing output files: output_S1e8747dc/gather/S1e8747dc_1.genomes.info.csv
wildcards: sample=S1e8747dc_1
resources: tmpdir=/tmp

[Fri Apr 14 08:36:49 2023]
Finished job 7.
18 of 23 steps (78%) done
Select jobs to execute...

[Fri Apr 14 08:36:49 2023]
rule set_kernel:
input: /home/mfrolova/anaconda3/envs/grist/lib/python3.7/site-packages/genome_grist/conf/env/papermill.yml
output: output_S1e8747dc/.kernel.set
jobid: 10
reason: Missing output files: output_S1e8747dc/.kernel.set
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/6ecac7d573969eb57b185be5d53a8113_
Touching output file output_S1e8747dc/.kernel.set.
[Fri Apr 14 08:36:50 2023]
Finished job 10.
19 of 23 steps (83%) done
Select jobs to execute...

[Fri Apr 14 08:36:50 2023]
rule make_gather_notebook_wc:
input: /home/mfrolova/anaconda3/envs/grist/lib/python3.7/site-packages/genome_grist/conf/../notebooks/report-gather.ipynb, output_S1e8747dc/gather/S1e8747dc_1.gather.csv.gz, output_S1e8747dc/gather/S1e8747dc_1.genomes.info.csv, output_S1e8747dc/.kernel.set
output: output_S1e8747dc/reports/report-gather-S1e8747dc_1.ipynb, output_S1e8747dc/reports/report-gather-S1e8747dc_1.html
jobid: 1
reason: Missing output files: output_S1e8747dc/reports/report-gather-S1e8747dc_1.html; Input files updated by another job: output_S1e8747dc/gather/S1e8747dc_1.gather.csv.gz, output_S1e8747dc/.kernel.set, output_S1e8747dc/gather/S1e8747dc_1.genomes.info.csv
wildcards: sample=S1e8747dc_1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/6ecac7d573969eb57b185be5d53a8113_
[Fri Apr 14 08:36:54 2023]
Finished job 1.
20 of 23 steps (87%) done
Select jobs to execute...

[Fri Apr 14 08:36:54 2023]
localrule summarize_gather:
input: output_S1e8747dc/reports/report-gather-S1e8747dc_1.html
jobid: 0
reason: Input files updated by another job: output_S1e8747dc/reports/report-gather-S1e8747dc_1.html
resources: tmpdir=/tmp

[Fri Apr 14 08:36:54 2023]
Finished job 0.
21 of 23 steps (91%) done
Select jobs to execute...

[Fri Apr 14 08:36:54 2023]
rule make_mapping_notebook_wc:
input: /home/mfrolova/anaconda3/envs/grist/lib/python3.7/site-packages/genome_grist/conf/../notebooks/report-mapping.ipynb, output_S1e8747dc/mapping/S1e8747dc_1.summary.csv, output_S1e8747dc/leftover/S1e8747dc_1.summary.csv, output_S1e8747dc/gather/S1e8747dc_1.gather.csv.gz, output_S1e8747dc/gather/S1e8747dc_1.genomes.info.csv, output_S1e8747dc/.kernel.set
output: output_S1e8747dc/reports/report-mapping-S1e8747dc_1.ipynb, output_S1e8747dc/reports/report-mapping-S1e8747dc_1.html
jobid: 12
reason: Missing output files: output_S1e8747dc/reports/report-mapping-S1e8747dc_1.html; Input files updated by another job: output_S1e8747dc/gather/S1e8747dc_1.gather.csv.gz, output_S1e8747dc/.kernel.set, output_S1e8747dc/mapping/S1e8747dc_1.summary.csv, output_S1e8747dc/gather/S1e8747dc_1.genomes.info.csv, output_S1e8747dc/leftover/S1e8747dc_1.summary.csv
wildcards: sample=S1e8747dc_1
resources: tmpdir=/tmp

Activating conda environment: .snakemake/conda/6ecac7d573969eb57b185be5d53a8113_
[Fri Apr 14 08:37:01 2023]
Finished job 12.
22 of 23 steps (96%) done
Select jobs to execute...

[Fri Apr 14 08:37:01 2023]
localrule summarize_mapping:
input: output_S1e8747dc/reports/report-mapping-S1e8747dc_1.html, output_S1e8747dc/reports/report-gather-S1e8747dc_1.html
jobid: 11
reason: Input files updated by another job: output_S1e8747dc/reports/report-mapping-S1e8747dc_1.html, output_S1e8747dc/reports/report-gather-S1e8747dc_1.html
resources: tmpdir=/tmp

[Fri Apr 14 08:37:01 2023]
Finished job 11.
23 of 23 steps (100%) done
Complete log: .snakemake/log/2023-04-14T003616.351359.snakemake.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with grist output #278

Problem with grist output #278

marsfro commented Apr 19, 2023

Problem with grist output #278

Problem with grist output #278

Comments

marsfro commented Apr 19, 2023