Skip to content

Commit

Permalink
updated implementing a stage (#83)
Browse files Browse the repository at this point in the history
  • Loading branch information
DSchreyer authored Jan 16, 2025
1 parent b2389cb commit 98f0c4f
Showing 1 changed file with 65 additions and 1 deletion.
66 changes: 65 additions & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ It creates the root directory of your project with all the necessary configurati
We consider a _stage_ an individual step in your analysis, usually a script with defined inputs and outputs.
Stages can be organized in _folders_ with arbitrary structures. `dso create` initializes folders and stages
from predefined templates. We recommend naming stages with a numeric prefix, e.g. `01_` to declare the
order of scripts, but this is not a requirement.
order of scripts, but this is not a requirement. Currently, two stage templates have been implemented that
use either a quarto document or bash script to conduct the analysis.

```bash
cd test_project
Expand Down Expand Up @@ -97,8 +98,71 @@ dso compile-config
### Overwriting Parameters

When multiple `params.in.yaml` files (such as those at the project, folder, or stage level) contain the same configuration, the value specified at the more specific level (e.g., stage) takes precedence over the value set at the broader level (e.g., project). This makes the analysis adaptable and enhances modifiability across the project.

## Implementing a stage

A stage is a single step in your analysis and usually generates some kind of output data from input data. The input data can also be supplied by previous stages. To create a stage, use the `dso create stage` command and select either the _bash_ or _quarto_ template as a starting-point.

The essential files of a stage are:

* `dvc.yaml`: The DVC configuration file that defines your data pipelines, dependencies, and outputs.
* `params.yaml`: Auto-generated configuration file.
* `params.in.yaml`: Modifiable configuration file containing stage-specific configurations.
* `src/<stage_name>.qmd`(optional): A Quarto file containing your script that runs the analysis for this stage.

### dvc.yaml

The `dvc.yaml` file contains information about the parameters, inputs, outputs, and commands used and executes in your stage.

#### Configuring the `dvc.yaml`

Configurations stored in the `params.yaml` of a stage can be directly used within the `dvc.yaml`:

```bash
stages:
01_preprocessing:
# Parameters used in this stage, defined in params.yaml
params:
- dso
- thresholds
# Dependencies required for this stage, can be defined in the params.yaml (define with ${...})
deps:
- src/01_preprocessing.qmd
- ${file_with_abs_path}
- ${samplesheet}

# Outputs generated by this stage
outs:
- output
- report/01_preprocessing.html
```

### Quarto Stage

By default, a Quarto stage includes the following cmd in the `dvc.yaml` file:



```
# Command to render the Quarto script and move the HTML report to the report folder
cmd:
- dso exec quarto .
```

### Bash Stage

A Bash stage, by default, does not include an additional script. Bash code can be directly embedded in the `dvc.yaml` file:

```
cmd:
- |
bash -euo pipefail << EOF
# add bash code here
EOF
```

### R

### Python
Expand Down

0 comments on commit 98f0c4f

Please sign in to comment.