Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to give some demo examples? #1

Open
Dx-wmc opened this issue Oct 19, 2024 · 3 comments
Open

Is it possible to give some demo examples? #1

Dx-wmc opened this issue Oct 19, 2024 · 3 comments

Comments

@Dx-wmc
Copy link

Dx-wmc commented Oct 19, 2024

hi, can you provide some examples for demonstration? The current introduction is a bit confusing to me.

@lfenske-93
Copy link
Contributor

Hi, what kind of examples would you like to see?

In general, this workflow is not necessarily intended to be reproduced. It was used to process the data set on this paper:
https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001421

And we are currently expanding it to process further data sets within the All The Bacteria project:
https://allthebacteria.readthedocs.io/en/latest/

The already processed data from the workflow can currently be found in our web repository, which makes it easy to browse and download the data:
https://bakrep.computational.bio/

If you are generally interested in seeing how the workflow is run or what the data structure that is entered must look like, I can go into this in more detail.

Greetings,
Linda

@Dx-wmc
Copy link
Author

Dx-wmc commented Oct 21, 2024

Thank you for your patient reply. I would like to see a brief example of a nextflow running script, including the input metadata and corresponding results. This would be very helpful for me to configure and use.

@lfenske-93
Copy link
Contributor

Okay sure, I'll try to give a short example.

The nexflow script used could be found here: nextflow/661k.nf

The command to process the required data for the project was as follows:

 nextflow run .nextflow/661k.nf -c ./bakrep/nextflow/nextflow.config -profile cluster --samples /shared/new-run/metadata.tsv 
 --setupdir /mnt/scratch/ --data assemblies/ --results results/ -with-conda  

An example how the metadata.tsv looks like, can be found in the repository: metadata_ena_661K_filtered_head51.tsv
Via the setupdir parameter you need to provide a path to the specific databases used by the different tools. Default paths are stored in the nextflow/config.nf.

The input data for the workflow consisted of the assembly FASTA files available at the following link:
http://ftp.ebi.ac.uk/pub/databases/ENA2018-bacteria-661k/Assemblies/

For each processed assembly file, the following result files will be generated:

Assembly-statistics: sample.assemblyscan.json
CheckM2 quality control: sample.checkm2.json
Bakta annotation: sample.bakta.json, sample.bakta.ffn, sample.bakta.faa, sample.bakta.gbff.gz, sample.bakta.gff3
Taxonomic classification: sample.gtdbtk.json
Multilocus sequence typing: sample.mlst.json

At the moment I work on a updated version of the worflow to process the latest data from the All the Bacteria project.
If you are generally interested in the whole project you can take a look at the current updates and information here:
https://allthebacteria.readthedocs.io/en/latest/faq.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants