Skip to content

Sustainability & Reproducibility

Stephan Reichl edited this page Aug 25, 2024 · 6 revisions

To ensure sustainable development, implicit documentation, and reproducibility each {module} has to fulfill the following requirements:

  • GitHub repository for development and version control
    • descriptive name (i.e., what it does and purpose e.g, dea_limma) and split by underscores '_'
    • README according to the provided template
    • repository structure according to Snakemake's recommendation
    • releases (i.e., versions) according to the semantic versioning scheme
    • Workflow rulegraph in workflow/dags/rulegraph.svg
      snakemake --rulegraph --forceall | dot -Tsvg > workflow/dags/rulegraph.svg
    • GitHub page displaying the README
    • LICENSE file (recommendation: MIT)
    • CITATION.cff file
    • (optional, but recommended) add example data and configurations for other users as a starting point
    • (optional, but recommended) provide resources and/or external data sources (e.g., reference data) as links or, Zenodo or Git Large File Storage
  • Zenodo repository to ensure compatibility, citability, and long-term archiving
    • via automated GitHub hook
    • every GitHub release will trigger the creation of a new release in the Zenodo repository, and thereby a new DOI
    • the Zenodo repository will be annotated using the provided information in the CITATION.cff file in your GitHub repository
    • there is one permanent DOI that can be used to reference/cite all releases/versions of a given repository. We recommend using this DOI and the release version for referencing e.g., in publications
    • add the dynamic DOI badge to the GitHub repository
    • add the permanent project DOI to the README in the introduction, in the methods, at the bottom (Zenodo link), and to the CITATION.cff.
  • Snakemake Workflow Catalog entry
    • increase visibility by fulfilling the requirements for Standardized Usage
    • every GitHub release will trigger the entry to be updated
  • Snakemake Report for implicit documentation and collaboration
    • following the specified Report structure to enhance reproducibility (via export of used software and configuration) and ensure module compatibility
  • Result directory
    • following the specified Result structure to enhance reproducibility (via export of used software and configuration) and ensure module compatibility
  • Software Management with conda for reproducibility
    • specify the version of every entry in your conda environment specification files (workflow/envs/*.yaml)
  • (COMING SOON) Containerization with Docker/Singularity for OS-level virtualization
    • this final virtualization frontier is to be explored and implemented across all MR.PARETO modules
    • automated containerization has been supported since Snakemake 6.0.0 (released 2021-02-26)
  • Finally, add the {module} to the summary table with all modules in this repository's README under Modules.

Checklist

  • GitHub repository with README, LICENSE, CITATION.cff, Snakemake Workflow Catalog entry, conda YAMLspecifications with exact versions
  • Zenodo repository via GitHub webhook
  • GitHub release to trigger Zenodo DOI generation
  • Add general and version-specific DOI to the GitHub README, CITATION.cff and MR.PARETO Modules
  • Final GitHub release with minor version bump including generated DOI