Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: CLIN-3411 only publish main outputs #48

Merged
merged 1 commit into from
Dec 11, 2024

Conversation

LysianeBouchard
Copy link
Contributor

@LysianeBouchard LysianeBouchard commented Dec 10, 2024

Before this PR, the output of all processes was automatically published to the output directory. While convenient for testing and debugging, this approach wastes resources in production.

This PR addresses the issue by publishing outputs only at key steps (after normalization, VEP, and Exomiser) by default. You can still publish all process outputs by setting publish_all=true. When we use the test profile, this will be the case, i.e. publish_all will be set to true.

The naming logic for the different process output folders is preserved.

Local Tests

Check integrity of schema file
nf-core schema docs
nf-core schema lint

Run the pipeline locally with the test profile

rm -rf results
nextflow run main.nf -profile test,docker

We should see the output from all processes in the results folder.
Save the list of files: ls -R results >new
Then re-run on main branch and ensure the list of files is identical with diff command. The only differences should be for the files having timestamps in their names in the pipeline info folder.

Remove publish_all=True from test profile

  • Comment the line setting publish_all to true in test config and rerun locally with the test profile:
rm -rf results
nextflow run main.nf -profile test,docker

This time, we should see only the subfolders pipeline_info, splitmultiallelics, ensemblvep and exomiser.
Inspect each folder and double check that the list of files is as expected. Pay attention to .tbi files in ensemblvep folder.

Check that nextflow prioritize the publish_all option at command line correctly
scenario1: false in test config, but true at command line: we should see all output
scenario2: true in test config, but false at command line: we should see only main outputs

Test in juno

  • test in juno, with and without publish_all output set to true

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • Reference Data Documentation in docs/reference_data.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

It is possible to publish the output of all processes as before by setting the parameter `publish_all=true`.

The naming logic for the different process output folders is preserved.
@LysianeBouchard LysianeBouchard force-pushed the feat/CLIN-3411-only-publish-main-outputs branch from cd58d34 to 9fc82cc Compare December 10, 2024 16:37
@LysianeBouchard LysianeBouchard marked this pull request as ready for review December 10, 2024 17:18
Copy link
Contributor

@dmorais dmorais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature prevento all outputs to be published, preserving only the key outputs. This saves space on disk and discards unecessary intermediate files.
It also sets the publish_all to true in the test profile, which is ideal for debuggin.
FInally publish_all can be set to true for true runs via config.

@LysianeBouchard LysianeBouchard merged commit a04b33a into main Dec 11, 2024
3 checks passed
@LysianeBouchard LysianeBouchard deleted the feat/CLIN-3411-only-publish-main-outputs branch December 11, 2024 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants