-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cellranger_count.py: capture stderr and stdout #341
Comments
Also, it appears that since This is important for retrying processes with escalated resources:
...since exit values of 1 will not trigger a retry. |
Thanks for raising the issue, agree stderr/stdout and exit code should be forwarded. I will follow up on this eventually, but I have only very limited time I can put into nf-core at the moment -- so if you want to speed it up a PR to modules would be appreciated :) |
@grst do you know if I am using
I tried updating the cellranger_count.py#!/usr/bin/env python3
"""
Automatically rename staged files for input into cellranger count.
Copyright (c) Gregor Sturm 2023 - MIT License
"""
from subprocess import run, CalledProcessError, SubprocessError
from pathlib import Path
from textwrap import dedent
import shlex
import re
import sys
def chunk_iter(seq, size):
"""
Iterate over `seq` in chunks of `size`
Args:
seq: iterable, the sequence to chunk
size: int, the size of the chunks
Returns:
generator: the chunks of `seq`
"""
return (seq[pos : pos + size] for pos in range(0, len(seq), size))
def run_subprocess(command):
"""
Run a subprocess command and return stdout and stderr as strings.
Args:
command: list of strings, the command to run
Returns:
tuple of strings: (stdout, stderr)
"""
try:
# Run the command and capture stdout and stderr as strings
result = run(
command,
check=True,
capture_output=True,
text=True
)
return result.stdout, result.stderr
except CalledProcessError as e:
# Print the error message and exit with the return code of the subprocess
print(f"Command '{e.cmd}' failed with return code {e.returncode}")
print(f"#--- STDOUT ---#\\n{e.stdout}")
print(f"#--- STDERR ---#\\n{e.stderr}")
sys.exit(e.returncode)
except SubprocessError as e:
# Print the error message and exit with return code 1
print(f"Subprocess error: {str(e)}")
sys.exit(1)
# Set the sample ID to the pipeline meta.id
sample_id = "${meta.id}"
# Get fastqs, ordered by path. Files are staged into
# - "fastq_001/{original_name.fastq.gz}"
# - "fastq_002/{oritinal_name.fastq.gz}"
# - ...
# Since we require fastq files in the input channel to be ordered such that a R1/R2 pair
# of files follows each other, ordering will get us a sequence of [R1, R2, R1, R2, ...]
fastqs = sorted(Path(".").glob("fastq_*/*"))
assert len(fastqs) % 2 == 0
# Target directory in which the renamed fastqs will be placed
fastq_all = Path("./fastq_all")
fastq_all.mkdir(exist_ok=True)
# Match R1 in the filename, but only if it is followed by a non-digit or non-character
# match "file_R1.fastq.gz", "file.R1_000.fastq.gz", etc. but
# do not match "SRR12345", "file_INFIXR12", etc
filename_pattern = r'([^a-zA-Z0-9])R1([^a-zA-Z0-9])'
for i, (r1, r2) in enumerate(chunk_iter(fastqs, 2), start=1):
# double escapes are required because nextflow processes this python 'template'
if re.sub(filename_pattern, r'\\1R2\\2', r1.name) != r2.name:
raise AssertionError(
dedent(
f"""\
We expect R1 and R2 of the same sample to have the same filename except for R1/R2.
This has been checked by replacing "R1" with "R2" in the first filename and comparing it to the second filename.
If you believe this check shouldn't have failed on your filenames, please report an issue on GitHub!
Files involved:
- {r1}
- {r2}
"""
)
)
r1.rename(fastq_all / f"{sample_id}_S1_L{i:03d}_R1_001.fastq.gz")
r2.rename(fastq_all / f"{sample_id}_S1_L{i:03d}_R2_001.fastq.gz")
# Run `cellranger count`
run_subprocess(
[
"cellranger", "count",
"--id", "${prefix}",
"--fastqs", str(fastq_all),
"--transcriptome", "${reference.name}",
"--localcores", "${task.cpus}",
"--localmem", "${task.memory.toGiga()}",
*shlex.split("""${args}""")
]
)
# Output `cellranger count` version information
proc_stdout,proc_stderr = run_subprocess(
["cellranger", "-V"]
)
version = proc_stdout.replace("cellranger cellranger-", "")
# Write the version information to a file
with open("versions.yml", "w") as f:
f.write('"${task.process}":\\n')
f.write(f' cellranger: "{version}"\\n') ...but the |
No idea. But it shouldn't be hard to capture the exit code from the subprocess call and then do |
Description of feature
cellranger_count.py currently just uses
subprocess.run
for runningcellranger count
, but it does not capture and write out the subprocess stdout and stderr, so all that is returned to the user during a failed job is:It would be helpful if stderr and stdout were captured and returned. For example:
An alternative:
The text was updated successfully, but these errors were encountered: