diff --git a/README.md b/README.md index d5ea09d9e..db2787a73 100644 --- a/README.md +++ b/README.md @@ -85,13 +85,13 @@ Additional functionality contained by the pipeline currently includes: 5. Start running your own ancient DNA analysis! - nextflow run nf-core/eager -profile --reads'*_R{1,2}.fastq.gz' --fasta '.fasta' + nextflow run nf-core/eager -profile --reads '*_R{1,2}.fastq.gz' --fasta '.fasta' 6. Once your run has completed successfully, clean up the intermediate files. nextflow clean -f -k -NB. You can see an overview of the run in the MultiQC report located at `/MultiQC/multiqc_report.html` +NB. You can see an overview of the run in the MultiQC report located at `./results/MultiQC/multiqc_report.html` Modifications to the default pipeline are easily made using various options as described in the documentation. diff --git a/docs/output.md b/docs/output.md index 979ec2297..f888aa8f6 100644 --- a/docs/output.md +++ b/docs/output.md @@ -25,24 +25,24 @@ The output of EAGER2 consists of two main components: output files (e.g. BAM or The directory structure of EAGER2 is as follows ```bash -/ +results/ ├── MultiQC/ ├── / ├── / ├── / ├── pipeline_info/ -├── reference_genome/ -└── work/ +└── reference_genome/ +work/ ``` -* The parent directory `` directory contains the (cleaned-up) output from a particular software module. This is the second most important set of directories. This contains output files such as FASTQ, BAM, statistics, and/or plot files of a specific module (see the [Output Files](#output-files) section for more detail). The latter two are only needed when you need finer detail about that particular module. +* A `` directory contains the (cleaned-up) output from a particular software module. This is the second most important set of directories. This contains output files such as FASTQ, BAM, statistics, and/or plot files of a specific module (see the [Output Files](#output-files) section for more detail). The latter two are only needed when you need finer detail about that particular module. ### Secondary Output Directories @@ -50,7 +50,10 @@ These are less important directories which are used less often, normally in the * `pipeline_info` contains back-end reporting of the pipeline itself such as run times and computational statistics. You rarely need this information other than for curiosity or when bug-reporting. * `reference_genome` contains either text files describing the location of specified reference genomes, and if not already supplied when running the pipeline, auxilary indexing files. This is often useful when re-running other samples using the same reference genome, but is otherwise often not otherwise important. -* The `work` directory contains all the `nextflow` processing directories. This is where `nextflow` actually does all the work, but in an efficient programatic procedure that is not intuitive to human-readers. Due to this, the directory is often not important to a user as all the useful output files are linked to the module directories (see above). Otherwise, this directory maybe useful when a bug-reporting. + +* The `work` directory contains all the `nextflow` processing directories. This is where `nextflow` actually does all the work, but in an efficient programmatic procedure that is not intuitive to human-readers. Due to this, the directory is often not important to a user as all the useful output files are linked to the module directories (see above). Otherwise, this directory maybe useful when a bug-reporting. + +> :warning: Note that `work/` will be created wherever you are running the `nextflow run` command from, unless you specify the location with `-w`, i.e. it will not by default be in `outdir`!. ## MultiQC Report @@ -82,7 +85,7 @@ The default columns are as follows: * **%GC** This is from Post-AdapterRemoval FastQC. Represents the average GC of all preprocessed reads in your adapter trimmed (paired end) merged FASTQ file. * **Reads Mapped** This is from Samtools. This is the raw number of preprocessed reads mapped to your reference genome _prior_ map quality filtering and deduplication. * **Reads Mapped** This is from Samtools. This is the raw number of preprocessed reads mapped to your reference genome _after_ map quality filtering and deduplication (note the column name does not distinguish itself from prior-map quality filtering, but the post-filter column is always second) -* **Endogenous DNA (%)** This is from the endorS.py tool. It displays a percentage of mapped reads over total reads that went into mapped (i.e. the percentage DNA content of the library that matches the reference). +* **Endogenous DNA (%)** This is from the endorS.py tool. It displays a percentage of mapped reads over total reads that went into mapped (i.e. the percentage DNA content of the library that matches the reference). Assuming a perfect ancient sample with no modern contamination, this would be the the amount of true ancient DNA in the sample. However this value _most likely_ include contamination and will not entirely be the true 'endogenous' content. * **Endogenous DNA Post (%)** This is from the endorS.py tool. It displays a percentage of mapped reads _after_ BAM filtering (e.g. for mapping quality) over total reads that went into mapped (i.e. the percentage DNA content of the library that matches the reference). This column will only be displayed if BAM filtering is turned on and is based on the original mapping for total reads, and mapped reads as calculated from the post-filtering BAM. * **Duplication Rate** This is from DeDup. This is the percentage of overall number of mapped reads that were an exact duplicate of another read. The number of reads removed by DeDup can be calculating this number by mapped reads (if no map quality filtering was applied!) * **Coverage** This is from Qualimap. This is the median number of times a base on your reference genome was covered by a read (i.e. depth coverage).. This average includes bases with 0 reads covering that position. diff --git a/main.nf b/main.nf index cd933aa4e..8384bca76 100644 --- a/main.nf +++ b/main.nf @@ -33,7 +33,7 @@ def helpMessage() { --genome Name of iGenomes reference (required if not fasta reference). Output options: - --outdir The output directory where the results will be saved. + --outdir The output directory where the results will be saved. Default: ${params.outdir} -w The directory where intermediate files will be stored. Recommended: '/work/' BAM Input: @@ -400,7 +400,7 @@ if (params.run_multivcfanalyzer) { } } -// MALT sanity checking +// Metagenomic sanity checking if (params.run_metagenomic_screening) { if ( !params.bam_discard_unmapped ) { exit 1, "Metagenomic classification can only run on unmapped reads. Please supply --bam_discard_unmapped and --bam_unmapped_type 'fastq'" @@ -418,7 +418,7 @@ if (params.run_metagenomic_screening) { exit 1, "Metagenomic classification requires a path to a database directory. Please specify one with --database '/path/to/database/'." } - if (params.malt_mode != 'BlastN' && params.malt_mode != 'BlastP' && params.malt_mode != 'BlastX') { + if (params.metagenomic_tool == 'malt' && params.malt_mode != 'BlastN' && params.malt_mode != 'BlastP' && params.malt_mode != 'BlastX') { exit 1, "Unknown MALT mode specified. Options: 'BlastN', 'BlastP', 'BlastX'. You gave '${params.malt_mode}'!" }