Is it possible to give some demo examples? #1

Dx-wmc · 2024-10-19T04:17:57Z

hi, can you provide some examples for demonstration? The current introduction is a bit confusing to me.

lfenske-93 · 2024-10-21T08:10:44Z

Hi, what kind of examples would you like to see?

In general, this workflow is not necessarily intended to be reproduced. It was used to process the data set on this paper:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001421

And we are currently expanding it to process further data sets within the All The Bacteria project:
https://allthebacteria.readthedocs.io/en/latest/

The already processed data from the workflow can currently be found in our web repository, which makes it easy to browse and download the data:
https://bakrep.computational.bio/

If you are generally interested in seeing how the workflow is run or what the data structure that is entered must look like, I can go into this in more detail.

Greetings,
Linda

Dx-wmc · 2024-10-21T14:04:00Z

Thank you for your patient reply. I would like to see a brief example of a nextflow running script, including the input metadata and corresponding results. This would be very helpful for me to configure and use.

lfenske-93 · 2024-10-22T11:59:53Z

Okay sure, I'll try to give a short example.

The nexflow script used could be found here: nextflow/661k.nf

The command to process the required data for the project was as follows:

 nextflow run .nextflow/661k.nf -c ./bakrep/nextflow/nextflow.config -profile cluster --samples /shared/new-run/metadata.tsv 
 --setupdir /mnt/scratch/ --data assemblies/ --results results/ -with-conda

An example how the metadata.tsv looks like, can be found in the repository: metadata_ena_661K_filtered_head51.tsv
Via the setupdir parameter you need to provide a path to the specific databases used by the different tools. Default paths are stored in the nextflow/config.nf.

The input data for the workflow consisted of the assembly FASTA files available at the following link:
http://ftp.ebi.ac.uk/pub/databases/ENA2018-bacteria-661k/Assemblies/

For each processed assembly file, the following result files will be generated:

Assembly-statistics: sample.assemblyscan.json
CheckM2 quality control: sample.checkm2.json
Bakta annotation: sample.bakta.json, sample.bakta.ffn, sample.bakta.faa, sample.bakta.gbff.gz, sample.bakta.gff3
Taxonomic classification: sample.gtdbtk.json
Multilocus sequence typing: sample.mlst.json

At the moment I work on a updated version of the worflow to process the latest data from the All the Bacteria project.
If you are generally interested in the whole project you can take a look at the current updates and information here:
https://allthebacteria.readthedocs.io/en/latest/faq.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to give some demo examples? #1

Is it possible to give some demo examples? #1

Dx-wmc commented Oct 19, 2024

lfenske-93 commented Oct 21, 2024

Dx-wmc commented Oct 21, 2024

lfenske-93 commented Oct 22, 2024

Is it possible to give some demo examples? #1

Is it possible to give some demo examples? #1

Comments

Dx-wmc commented Oct 19, 2024

lfenske-93 commented Oct 21, 2024

Dx-wmc commented Oct 21, 2024

lfenske-93 commented Oct 22, 2024