Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate and rank SNVs per family #502

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

fellen31
Copy link
Collaborator

@fellen31 fellen31 commented Nov 12, 2024

This PR:

  • Annotates and ranks SNVs per family instead of per project. This is because genmod does not support compound scoring with multiple families in the VCF. Someone that still wants annotated variants per sample can run each sample as a separate family.
  • Changes the output documentation and structure to match sample and family for all variants
  • Puts the validating of the samplesheet into functions, and removed the use of ifEmpty. This is because these error messages would always show up when you had an error, even if that error was unrelated.
  • Removed support for automatically creating an echvar database with SNVs and INDELs, since this requires all variants to be combined into one VCF. Might add this back into the future.
  • Removes the containts_affected logic from the snv-calling workflow, since this was previously changed to be checked before pipeline start, and is now done in one of the functions described in point 3.

Closes #501 and #276.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@fellen31 fellen31 force-pushed the variants-per-family branch 8 times, most recently from 5c783b3 to 2e36008 Compare November 12, 2024 16:23
@fellen31 fellen31 linked an issue Nov 12, 2024 that may be closed by this pull request
@fellen31 fellen31 force-pushed the variants-per-family branch 2 times, most recently from 279ecc2 to 2dbe94d Compare November 12, 2024 17:17
@fellen31 fellen31 marked this pull request as ready for review November 12, 2024 17:40
@fellen31 fellen31 requested a review from a team as a code owner November 12, 2024 17:40
Copy link

@Lucpen Lucpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tiny typo and one question:
why not make a switch for this?
Removed support for automatically creating an echvar database with SNVs and INDELs, since this requires all variants to be combined into one VCF. Might add this back into the future.

subworkflows/local/short_variant_calling/main.nf Outdated Show resolved Hide resolved
@fellen31
Copy link
Collaborator Author

fellen31 commented Nov 15, 2024

Just a tiny typo and one question: why not make a switch for this? Removed support for automatically creating an echvar database with SNVs and INDELs, since this requires all variants to be combined into one VCF. Might add this back into the future.

My idea initially was that it would be nice if you could create a reference database or panel of normals while running the pipeline. For SNVs and INDELs, yes it would be fairly easy to make a switch. But we need to add a module to merge VCFs together before creating a database (and then it should really be a workflow).

But for creating SVs, CNVs, STRs and methylation databases we would need a different merging strategies. It would perhaps be nice, but I need to think about it some more. It might work for a 100 samples, but perhaps you would never want to start 1000+ samples at once, and then creating the databases within the pipeline becomes unnecessary, because you would need to combine samples from multiple pipeline runs anyway.

In short, I'm removing the functionality because it's easier at the moment. It's not something that we need in production, and not something I'm sure is desirable to have in the pipeline in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Genmod only annotates compounds in the first family Run RANK_VARIANTS per family
2 participants