Skip to content

Commit

Permalink
Update docs #11, #5
Browse files Browse the repository at this point in the history
  • Loading branch information
cb-Hades committed Jul 11, 2024
1 parent d65f8b9 commit 192aad0
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 7 deletions.
14 changes: 14 additions & 0 deletions docs/source/hqtb/hqtb-config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,20 @@
Below, the configuration file with the underlying defaults, is shown.

.. code-block:: yaml
# Configuration file for the SPECIMEN HQTB pipeline
# Meaning of the default parameters:
# The value __USER__ indicates parameters required to be specified by the user
# The value USER indicates parameters required only in specific cases
# Meta info:
# model: USER
# organism: USER
# date: USER
# author: USER
# Input for the pipeline
# ----------------------
# Information about the genome to be used to generate the new model
subject:
Expand Down
14 changes: 7 additions & 7 deletions docs/source/hqtb/run-pipeline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,8 +71,7 @@ If you are just starting a new project and do not have all the data ready to go,
| The function above creates the following directory structure for your project.
| The 'contains' column lists what is supposed to be inside the according folder.
The tags manual/semi/automated report how these files are added to the folder (automated = by the setup function, manual = by the user).
``TODO``: Was bedeutet semi?
The tags manual/semi/automated report how these files are added to the folder (automated = by the setup function; semi = multiple steps neccessary, some by the program, some by the user; manual = by the user).
The tags required/optional report whether this input is necessary to run the pipeline or if it is an optional input.
.. table::
Expand Down Expand Up @@ -103,8 +102,9 @@ If you are just starting a new project and do not have all the data ready to go,

.. note::

Regarding the annotated_genomes folder, the program currently only supports the file types ``GBFF`` and ``FAA`` + ``FNA``.
``TODO``: Für welche Dateien in contains gilt das?
Regarding the annotated_genomes folder, the program currently only supports
the file types ``GBFF`` and ``FAA`` + ``FNA`` (from the NCBI and PROKKA annotation pipelines respectively)
as genome annotation formats.

Further details for collecting the data:

Expand All @@ -118,16 +118,16 @@ Further details for collecting the data:
- One way to build a DIAMOND reference database is to download a set of reference sequences from the NCBI database, e.g. in the **FAA** format.
- Use the function :code:`specimen.util.util.create_DIAMOND_db_from_folder('/User/path/input/directory', '/User/Path/for/output/', name = 'database', extention = 'faa')` to create a DIAMOND database
- To speed up the mapping, create an additional mapping file from the e.g. ``GBFF`` files from NCBI using :code:`specimen.util.util.create_NCBIinfo_mapping('/User/path/input/directory', '/User/Path/for/output/', extention = 'gbff')`
- To ensure correct mapping to KEGG, an additional information file can be created by constructing a CSV file with the following columns: 'NCBI genome', 'organism', 'locus_tag' (start) and 'KEGG.organism'
``TODO``: Was ist hier mit start gemeint?
- To ensure correct mapping to KEGG, an additional information file can be created by constructing a CSV file with the following columns: 'NCBI genome', 'organism', 'locus_tag' (only the part until the seperator '_', the part, that is the same for all locus tags) and 'KEGG.organism'

- The information of the first three columns can be taken from the previous two steps while
- For the last column the user needs to check, if the genomes have been entered into KEGG and have an organism identifier.
- This file is purely optional for running the pipeline but potentially leads to better results.

- medium:

The media, either for analysis or gap filling can be entered into the pipeline via a config file (each). ``TODO``: Muss wirklich für jedes Medium eine neue Datei erstellt werden?
The media, either for analysis or gap filling can be entered into the pipeline via a config file.
The same media file can be used for both or one file for each step can be entered into the pipeline.
The config files are from the `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome` toolbox and access its in-build medium database.
Additionally, the config files allow for manual adjustment / external input.

Expand Down

0 comments on commit 192aad0

Please sign in to comment.