Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add genome/reference definition section to configuration #512

Open
tedil opened this issue May 30, 2024 · 0 comments
Open

Add genome/reference definition section to configuration #512

tedil opened this issue May 30, 2024 · 0 comments
Assignees

Comments

@tedil
Copy link
Contributor

tedil commented May 30, 2024

Currently, steps/tools define their own reference naming conventions.
This can lead to incompatibilities between steps, accidentally selecting incorrect references etc.
Unifying the way references are handled could help prevent that.

However, tools sometimes rely on specific aliases. Therefore, we could add support for aliases, something along the lines of:

reference:
  name: GRCh38
  aliases:
    - ensembl: GRCh38
    - ucsc: hg38
    - panel_of_normals.purecn: hg38

In principle, we could also directly specify the source and release directly, such that the reference sequence can be downloaded (and cached) automatically, e.g. for ensembl:

  reference:
    […]
    source:
      ensembl:
        species: homo_sapiens
        release: 111
        build: GRCh38

or local:

  reference:
    […]
    source:
      local:
        path: /path/to/ref.fasta
        species: homo_sapiens
        release: 111
        build: GRCh38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants