Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FASTP error: input files don't contain identical amount of reads #110

Closed
cahuparo opened this issue Nov 4, 2024 · 3 comments
Closed

FASTP error: input files don't contain identical amount of reads #110

cahuparo opened this issue Nov 4, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@cahuparo
Copy link
Contributor

cahuparo commented Nov 4, 2024

Description of the bug

fastp seems to have issues handling uneven pair read data:

ERROR ~ Error executing process > 'PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:FASTP (GCA_001063175_1_ASM106317v1_genomic)'

Caused by:
  Process `PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:FASTP (GCA_001063175_1_ASM106317v1_genomic)` terminated with an error exit status (255)


Command executed:

  [ ! -f  GCA_001063175_1_ASM106317v1_genomic_1.fastq.gz ] && ln -sf SRR1655712.fastq.gz GCA_001063175_1_ASM106317v1_genomic_1.fastq.gz
  [ ! -f  GCA_001063175_1_ASM106317v1_genomic_2.fastq.gz ] && ln -sf SRR1655712_1.fastq.gz GCA_001063175_1_ASM106317v1_genomic_2.fastq.gz
  fastp \
      --in1 GCA_001063175_1_ASM106317v1_genomic_1.fastq.gz \
      --in2 GCA_001063175_1_ASM106317v1_genomic_2.fastq.gz \
      --out1 GCA_001063175_1_ASM106317v1_genomic_1.fastp.fastq.gz \
      --out2 GCA_001063175_1_ASM106317v1_genomic_2.fastp.fastq.gz \
      --json GCA_001063175_1_ASM106317v1_genomic.fastp.json \
      --html GCA_001063175_1_ASM106317v1_genomic.fastp.html \
       \
       \
       \
      --thread 4 \
      --detect_adapter_for_pe \
       \
      2> >(tee GCA_001063175_1_ASM106317v1_genomic.fastp.log >&2)
 
  cat <<-END_VERSIONS > versions.yml
  "PATHOGENSURVEILLANCE:GENOME_ASSEMBLY:FASTP":
      fastp: $(fastp --version 2>&1 | sed -e "s/fastp //g")
  END_VERSIONS

Command exit status:
  255

Command output:
  (empty)

Command error:
  Detecting adapter sequence for read1...
  No adapter detected for read1
 
  Detecting adapter sequence for read2...
  No adapter detected for read2
 
 
  WARNNIG: different read numbers of the 206 pack
  Read1 pack size: 218
  Read2 pack size: 1000
 
  ERROR: input files don't contain identical amount of reads

Work dir:
  /nfs7/BPP/Chang_Lab/paradarc/ps_pipeline_validation/scripts/pathogensurveillance/work/7f/07a5910f230b21e809665e7b2033a4

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

Is this a sample error? Do we want to troubleshoot that or should the user drop that sample?

Command used and terminal output

No response

Relevant files

No response

System information

No response

@cahuparo cahuparo added the bug Something isn't working label Nov 4, 2024
@zachary-foster
Copy link
Contributor

Are we sure that these are in fact paired reads? Could we somehow be getting two single-end runs or something? I dont think it is normal to have unpaired reads so an error might be appropriate. Can you take a look at the sample and see if anything obviously off, like very different file sizes?

@cahuparo
Copy link
Contributor Author

cahuparo commented Nov 5, 2024

I investigated more today. It looks like it downloaded or produced 3 SRAs.

> ls -ltr /nfs7/BPP/Chang_Lab/paradarc/ps_pipeline_validation/scripts/pathogensurveillance/path_surveil_data/reads/SRR1655712*
-rw-r--r-- 1 paradarc bpp 16896478 Nov  1 15:03 /nfs7/BPP/Chang_Lab/paradarc/ps_pipeline_validation/scripts/pathogensurveillance/path_surveil_data/reads/SRR1655712.fastq.gz
-rw-r--r-- 1 paradarc bpp 78616787 Nov  1 15:03 /nfs7/BPP/Chang_Lab/paradarc/ps_pipeline_validation/scripts/pathogensurveillance/path_surveil_data/reads/SRR1655712_2.fastq.gz
-rw-r--r-- 1 paradarc bpp 76902111 Nov  1 15:03 /nfs7/BPP/Chang_Lab/paradarc/ps_pipeline_validation/scripts/pathogensurveillance/path_surveil_data/reads/SRR1655712_1.fastq.gz

and you can see in the FASTP command it took the one file with one of the paired end read files...we may need to purge or throw errors if more than 2 SRAs were downloaded.

zachary-foster pushed a commit to grunwaldlab/pathogensurveillance that referenced this issue Nov 7, 2024
@zachary-foster
Copy link
Contributor

This should be fixed now. It will pick the paired end reads in this case and ignore the other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants