read2tree issue on mixed dataset #82

masudermann · 2024-07-15T22:37:07Z

Description of the bug

This is a read2tree specific issue, but I wanted to share it in case we see similar issues with other datasets.

Command used and terminal output

# Main command
CHANGELOG.md  CODE_OF_CONDUCT.md  docs         LICENSE                modules.json  null                  pyproject.toml     seqtk_sample  test                    workflows
(nf-core) marthasudermann@pop-os:~/pathogensurveillance$ nextflow run main.nf -profile mixed,docker -resume
Nextflow 24.04.3 is available - Please consider updating your version to it
N E X T F L O W  ~  version 23.10.1
Launching `main.nf` [sick_leibniz] DSL2 - revision: cc83aa0c27


------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/plantpathsurveil v1.0dev
------------------------------------------------------
Core Nextflow options
  runName                   : sick_leibniz
  containerEngine           : docker
  launchDir                 : /home/marthasudermann/pathogensurveillance
  workDir                   : /home/marthasudermann/pathogensurveillance/work
  projectDir                : /home/marthasudermann/pathogensurveillance
  userName                  : marthasudermann
  profile                   : mixed,docker
  configFiles               : /home/marthasudermann/pathogensurveillance/nextflow.config

Input/output options
  sample_data               : test/data/metadata/mixed.csv
  out_dir                   : test/output/mixed
  download_bakta_db         : true

Institutional config options
  config_profile_name       : Test profile of mixed (fungi, oomycete, bacteria, nematode) SRA files
  config_profile_description: Test profile of mixed (fungi, oomycete, bacteria, nematode) SRA files

Generic options
  trace_dir                 : null/pipeline_info

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/plantpathsurveil for your analysis please cite:

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/plantpathsurveil/blob/master/CITATIONS.md
------------------------------------------------------
#Errors I am getting

ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:BUSCO_PHYLOGENY:READ2TREE (oomycete)'

Caused by:
  Process `PATHOGENSURVEILLANCE:BUSCO_PHYLOGENY:READ2TREE (oomycete)` terminated with an error exit status (2)

Command executed:

  # This creates the reference folder
  read2tree --standalone_path oomycete_busco_markers/ --dna_reference oomycete_dna_ref.fa --output_path oomycete_read2tree --reference
  
  # Add each paired end shortread sample
  for R1 in paired_1_01.fa; do
     	R2=$(echo $R1 | sed 's/^paired_1_/paired_2_/')
     	read2tree \
          \
         --threads 8 \
     	--standalone_path oomycete_busco_markers/ \
         --dna_reference oomycete_dna_ref.fa \
     	--output_path oomycete_read2tree \
     	--reads $R1 $R2
  done
  
  # Add each single end shortread sample
  for R1 in ; do
     	read2tree \
          \
         --threads 8 \
     	--standalone_path oomycete_busco_markers/ \
         --dna_reference oomycete_dna_ref.fa \
     	--output_path oomycete_read2tree \
     	--reads $R1
  done
  
  # Add each long read sample
  for R1 in ; do
     	read2tree \
          \
         --threads 8 \
     	--standalone_path oomycete_busco_markers/ \
         --dna_reference oomycete_dna_ref.fa \
     	--output_path oomycete_read2tree \
         --read_type long
     	--reads $R1
  done
  
  # Build tree
  read2tree \
      \
     --threads 8 \
  --standalone_path oomycete_busco_markers/ \
     --dna_reference oomycete_dna_ref.fa \
  --output_path oomycete_read2tree -\
  -merge_all_mappings \
  --tree
  
  cat <<-END_VERSIONS > versions.yml
  "PATHOGENSURVEILLANCE:BUSCO_PHYLOGENY:READ2TREE":
     	read2tree: $(echo $(read2tree --version))
  END_VERSIONS

Command exit status:
  2

Command output:
  --- Load OGs with min 0 species from oma oomycete_busco_markers - mode = marker_genes ---
  2024-07-15 21:29:26,401 - read2tree.OGSet - INFO - --- Load ogs and find their corresponding DNA seq from oomycete_dna_ref.fa ---
  2024-07-15 21:29:26,401 - read2tree.OGSet - INFO - Loading oomycete_dna_ref.fa into memory. This might take a while . . . 
  2024-07-15 21:29:26,435 - read2tree.OGSet - INFO - : Gathering of DNA seq for 249 OGs took 0.02776622772216797.
  --- Generating reference for mapping ---
  2024-07-15 21:29:26,436 - read2tree.ReferenceSet - INFO - : Extracted 4 reference species form 249 ogs took 0.0007266998291015625
  --- Alignment of 249 OGs ---
  2024-07-15 21:29:56,377 - read2tree.Aligner - INFO - : Alignment of 249 OGs took 29.93824291229248.
  --- Re-load ogs and find their corresponding DNA seq from output folder ---
  --- Generating reference for mapping from folder ---
  --- Mapping of reads to reference sequences ---
  2024-07-15 21:29:57,076 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PERBE reference species ---
  2024-07-15 21:32:43,397 - read2tree.Mapper - INFO - paired_1_01: Mapped 221245 / 45873312 reads to PERBE_OGs.fa
  2024-07-15 21:32:43,453 - read2tree.Mapper - INFO - paired_1_01: Mapping to PERBE_OGs.fa references took 166.3768367767334.
  2024-07-15 21:32:45,102 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PERMA reference species ---
  2024-07-15 21:35:31,387 - read2tree.Mapper - INFO - paired_1_01: Mapped 116880 / 45873312 reads to PERMA_OGs.fa
  2024-07-15 21:35:31,440 - read2tree.Mapper - INFO - paired_1_01: Mapping to PERMA_OGs.fa references took 166.33702945709229.
  2024-07-15 21:35:32,207 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PERDE reference species ---
  2024-07-15 21:38:18,619 - read2tree.Mapper - INFO - paired_1_01: Mapped 128630 / 45873312 reads to PERDE_OGs.fa
  2024-07-15 21:38:18,688 - read2tree.Mapper - INFO - paired_1_01: Mapping to PERDE_OGs.fa references took 166.48007535934448.
  2024-07-15 21:38:19,859 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PHYNI reference species ---
  2024-07-15 21:41:09,594 - read2tree.Mapper - INFO - paired_1_01: Mapped 124871 / 45873312 reads to PHYNI_OGs.fa
  2024-07-15 21:41:09,652 - read2tree.Mapper - INFO - paired_1_01: Mapping to PHYNI_OGs.fa references took 169.79250645637512.
  2024-07-15 21:41:10,498 - read2tree.Mapper - INFO - paired_1_01: Mapping to all references took 673.4247016906738.
  --- Add inferred mapped sequence back to OGs ---
  2024-07-15 21:41:10,668 - read2tree.OGSet - INFO - paired_1_01: Appending 222 reconstructed sequences to present OG took 0.0045795440673828125.
  --- Add inferred mapped sequence back to alignment ---
  2024-07-15 21:41:11,089 - read2tree.Aligner - INFO - paired_1_01: Appending 213 reconstructed sequences to present Alignments took 0.39875292778015137.

Command error:
  
  Mapping reads to species:  50%|█████     | 2/4 [05:35<05:34, 167.48s/ species]2024-07-15 21:35:32,207 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PERDE reference species ---
  [E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_q_oc3inv/PERDE_OGs.fa.bam'
  2024-07-15 21:38:18,619 - read2tree.Mapper - INFO - paired_1_01: Mapped 128630 / 45873312 reads to PERDE_OGs.fa
  2024-07-15 21:38:18,688 - read2tree.Mapper - INFO - paired_1_01: Mapping to PERDE_OGs.fa references took 166.48007535934448.
  
  Mapping reads to species:  75%|███████▌  | 3/4 [08:22<02:47, 167.56s/ species]2024-07-15 21:38:19,859 - read2tree.Mapper - INFO - paired_1_01: --- Mapping of reads to PHYNI reference species ---
  [E::idx_find_and_load] Could not retrieve index file for '/tmp/ngm_q_oc3inv/PHYNI_OGs.fa.bam'
  2024-07-15 21:41:09,594 - read2tree.Mapper - INFO - paired_1_01: Mapped 124871 / 45873312 reads to PHYNI_OGs.fa
  2024-07-15 21:41:09,652 - read2tree.Mapper - INFO - paired_1_01: Mapping to PHYNI_OGs.fa references took 169.79250645637512.
  
  Mapping reads to species: 100%|██████████| 4/4 [11:13<00:00, 168.77s/ species]
  Mapping reads to species: 100%|██████████| 4/4 [11:13<00:00, 168.35s/ species]
  2024-07-15 21:41:10,498 - read2tree.Mapper - INFO - paired_1_01: Mapping to all references took 673.4247016906738.
  
  Adding mapped seq to alignments:   0%|          | 0/249 [00:00<?, ? alignments/s]
  Adding mapped seq to alignments: 100%|██████████| 249/249 [00:00<00:00, 1963123.49 alignments/s]
  
  Adding mapped seq to OG:   0%|          | 0/249 [00:00<?, ? OGs/s]
  Adding mapped seq to OG: 100%|██████████| 249/249 [00:00<00:00, 2300400.21 OGs/s]
  --- Add inferred mapped sequence back to OGs ---
  
  Adding mapped seq to OG:   0%|          | 0/249 [00:00<?, ? OGs/s]
  Adding mapped seq to OG: 100%|██████████| 249/249 [00:00<00:00, 77110.28 OGs/s]
  2024-07-15 21:41:10,668 - read2tree.OGSet - INFO - paired_1_01: Appending 222 reconstructed sequences to present OG took 0.0045795440673828125.
  --- Add inferred mapped sequence back to alignment ---
  
  Adding mapped seq to alignments:   0%|          | 0/249 [00:00<?, ? alignments/s]
  Adding mapped seq to alignments:  33%|███▎      | 81/249 [00:00<00:00, 807.58 alignments/s]
  Adding mapped seq to alignments:  65%|██████▌   | 162/249 [00:00<00:00, 698.92 alignments/s]
  Adding mapped seq to alignments:  94%|█████████▎| 233/249 [00:00<00:00, 581.05 alignments/s]
  Adding mapped seq to alignments: 100%|██████████| 249/249 [00:00<00:00, 624.97 alignments/s]
  2024-07-15 21:41:11,089 - read2tree.Aligner - INFO - paired_1_01: Appending 213 reconstructed sequences to present Alignments took 0.39875292778015137.
  usage: read2tree [-h] [--version] [--output_path OUTPUT_PATH]
                   --standalone_path STANDALONE_PATH [--reads READS [READS ...]]
                   [--read_type READ_TYPE] [--threads THREADS] [--split_reads]
                   [--split_len SPLIT_LEN] [--split_overlap SPLIT_OVERLAP]
                   [--split_min_read_len SPLIT_MIN_READ_LEN] [--sample_reads]
                   [--genome_len GENOME_LEN] [--coverage COVERAGE]
                   [--min_cons_coverage MIN_CONS_COVERAGE]
                   [--dna_reference DNA_REFERENCE] [--sc_threshold SC_THRESHOLD]
                   [--ngmlr_parameters NGMLR_PARAMETERS] [--check_mate_pairing]
                   [--debug] [--sequence_selection_mode SEQUENCE_SELECTION_MODE]
                   [-s SPECIES_NAME] [--tree] [--merge_all_mappings] [-r]
                   [--min_species MIN_SPECIES] [--single_mapping SINGLE_MAPPING]
                   [--ref_folder REF_FOLDER]
                   [--remove_species_mapping REMOVE_SPECIES_MAPPING]
                   [--remove_species_ogs REMOVE_SPECIES_OGS] [--keep_all_ogs]
                   [--ignore_species IGNORE_SPECIES]
  read2tree: error: The number of completed mappings (1) is too little to perform a merge.

Work dir:
  /home/marthasudermann/pathogensurveillance/work/78/53a5214c3b5dd116ec616895015311

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

No response

zachary-foster · 2024-11-07T18:43:51Z

no longer using read2tree

masudermann added the bug Something isn't working label Jul 15, 2024

zachary-foster closed this as completed Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read2tree issue on mixed dataset #82

read2tree issue on mixed dataset #82

masudermann commented Jul 15, 2024

zachary-foster commented Nov 7, 2024

read2tree issue on mixed dataset #82

read2tree issue on mixed dataset #82

Comments

masudermann commented Jul 15, 2024

Description of the bug

Command used and terminal output

Relevant files

System information

zachary-foster commented Nov 7, 2024