feat: CLIN-3411 only publish main outputs #48
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before this PR, the output of all processes was automatically published to the output directory. While convenient for testing and debugging, this approach wastes resources in production.
This PR addresses the issue by publishing outputs only at key steps (after normalization, VEP, and Exomiser) by default. You can still publish all process outputs by setting publish_all=true. When we use the test profile, this will be the case, i.e. publish_all will be set to true.
The naming logic for the different process output folders is preserved.
Local Tests
Check integrity of schema file
nf-core schema docs
nf-core schema lint
Run the pipeline locally with the test profile
We should see the output from all processes in the results folder.
Save the list of files:
ls -R results >new
Then re-run on main branch and ensure the list of files is identical with diff command. The only differences should be for the files having timestamps in their names in the pipeline info folder.
Remove publish_all=True from test profile
This time, we should see only the subfolders
pipeline_info
,splitmultiallelics
,ensemblvep
andexomiser
.Inspect each folder and double check that the list of files is as expected. Pay attention to .tbi files in ensemblvep folder.
Check that nextflow prioritize the publish_all option at command line correctly
scenario1: false in test config, but true at command line: we should see all output
scenario2: true in test config, but false at command line: we should see only main outputs
Test in juno
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.docs/reference_data.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).