Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear errors when run with incorrect endedness #190

Open
jeffkaufman opened this issue Feb 10, 2025 · 1 comment
Open

Unclear errors when run with incorrect endedness #190

jeffkaufman opened this issue Feb 10, 2025 · 1 comment
Assignees
Labels
bug Something isn't working priority_2

Comments

@jeffkaufman
Copy link
Member

I accidentally created a sample sheet in single end mode, and then ran an Illumina data set through. I don't know what I think is correct behavior here: I could see either failing with a clear error or running all the way through on just the forward reads I supplied. But what I got instead was an error that didn't make much sense. On dev (124abcf) I got: gzip: null.gz: No such file or directory and:

  Executing jgi.BBDuk [in=stdin.fastq, ref=virus-genomes-masked.fasta.gz, out=[sample]_1_viral_bbduk_pass.fastq.gz, outm=[sample]_1_viral_bbduk_fail.fastq.gz, stats=[sample]_1_viral_bbduk.stats.txt, minkmerhits=1, k=24, interleaved=t, t=8, -Xmx16g]
  Version 39.01
  Set INTERLEAVED to true
  Set threads to 8
  0.015 seconds.
  Initial:
  Memory: max=16464m, total=16464m, free=15948m, used=516m
  Added 77992608 kmers; time:   57.584 seconds.
  Memory: max=16464m, total=16464m, free=12894m, used=3570m
  Input is being processed as paired
  java.lang.AssertionError:
  Error in stdin.fastq, line 1150, with these 4 lines:
  @[readid]
  [sequence]
  +
        at stream.FASTQ.quadToRead_slow(FASTQ.java:735)
        at stream.FASTQ.toReadList(FASTQ.java:642)
        at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107)
        at stream.FastqReadInputStream.hasMore(FastqReadInputStream.java:73)
        at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:669)
        at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:658)
  Fusion Info:
      ami-id: ami-0ba0650e6cd15c2b4
      instance-id: i-0f55f284adc2d19ee
      instance-type: c4.8xlarge
      fusion_version: 2.4.8-11743c7
      clone_namespace: false
      kernel_version: 6.1
      disk_cache_size: 999Gb
      max_open_files: 1048576

More context: https://twist.com/a/197793/inbox/t/6738804/

@willbradshaw
Copy link
Contributor

Copying my comment from Twist:

I think actually this isn’t a problem with incorrect endedness (the pipeline will automatically infer endedness from the structure of the samplesheet) but rather that the main RUN pipeline just isn’t properly set up to run single-end data yet. (Simon hasn’t implemented several parts of the workflow for single-end data yet, and parts of the pipeline assume paired-end.)

So I actually think what we should do is put a roadblock early in the main RUN workflow, where if it infers you’re running single-end data it just stops and errors out.

@willbradshaw willbradshaw added bug Something isn't working priority_2 labels Feb 10, 2025
@willbradshaw willbradshaw self-assigned this Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority_2
Projects
None yet
Development

No branches or pull requests

2 participants