Skip to content

Commit

Permalink
Doc: Lysiane's comments
Browse files Browse the repository at this point in the history
Changed wording in CONTRIBUTING.md, updated usage.md, removed the fasta
samplesheet
  • Loading branch information
FelixAntoineLeSieur committed Jul 11, 2024
1 parent b73876a commit 55ededd
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 45 deletions.
8 changes: 4 additions & 4 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ If you'd like to write some code for ferlab/postprocessing, the standard workflo
2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [ferlab/postprocessing repository](https://github.com/ferlab/postprocessing) to your GitHub account
3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions)
4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10).
5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged
5. Submit a Pull Request against the main branch and wait for the code to be reviewed and merged

If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/).

Expand Down Expand Up @@ -51,13 +51,13 @@ Each `nf-core` pipeline should be set up with a minimal set of test-data.
If there are any failures then the automated tests fail.
These tests are run both with the latest available version of `Nextflow` and also the minimum required version that is stated in the pipeline code.

## Patch
## Hotfix

:warning: Only in the unlikely and regretful event of a release happening with a bug.

- On your own fork, make a new branch `patch` based on `upstream/master`.
- On your own fork, make a new branch `fix` based on `origin/main`.
- Fix the bug, and bump version (X.Y.Z+1).
- A PR should be made on `master` from patch to directly this particular bug.
- A PR should be made on `main` from fix to directly this particular bug.

## Pipeline contribution conventions

Expand Down
3 changes: 0 additions & 3 deletions assets/samplesheet.csv

This file was deleted.

40 changes: 2 additions & 38 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,54 +8,18 @@

## Samplesheet input

You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a tab-separated file with the first column being the family ID. There are 2 possible format, V1 and V2 (See README for details). V1 includes only the family ID and the sample files as columns, while V2 has the sequencingType as a second column in addition the the familyID and sample files.
You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use the --input parameter to specify its location. The samplesheet has to be a tab-separated file (.tsv) with the first column being the family ID and the second being the sequencing type (either Whole Genome Sequencing (WGS) or Whole Exome Sequencing (WES)). Use the following columns to supply the paths to the gvcfs files for the same familyID.

```bash
--input '[path to samplesheet file]'
```

### Multiple runs of the same sample

The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes:

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz
```

### Full samplesheet

The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below.

A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice.

```csv title="samplesheet.csv"
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz
CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz
TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz,
TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz,
TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz,
TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz,
```

| Column | Description |
| --------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `sample` | Custom sample name. This entry will be identical for multiple sequencing libraries/runs from the same sample. Spaces in sample names are automatically converted to underscores (`_`). |
| `fastq_1` | Full path to FastQ file for Illumina short reads 1. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |
| `fastq_2` | Full path to FastQ file for Illumina short reads 2. File has to be gzipped and have the extension ".fastq.gz" or ".fq.gz". |

An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.

## Running the pipeline

The typical command for running the pipeline is as follows:

```bash
nextflow run ferlab/postprocessing --input ./samplesheet.csv --outdir ./results --genome GRCh37 -profile docker
nextflow run ferlab/postprocessing --input ./samplesheet.tsv --outdir ./results --genome GRCh37 -profile docker
```

This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
Expand Down

0 comments on commit 55ededd

Please sign in to comment.