Skip to content

Commit

Permalink
Update docs #5
Browse files Browse the repository at this point in the history
  • Loading branch information
cb-Hades committed Aug 27, 2024
1 parent fc76ad3 commit 1434e21
Show file tree
Hide file tree
Showing 15 changed files with 67 additions and 62 deletions.
10 changes: 5 additions & 5 deletions docs/source/cmpb/about-pipeline.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
About ``CMPB``
==============

The CarveMe + ModelPolisher based (CMPB) pipeline curates a model using refineGEMs and ModelPolisher.
The CarveMe + ModelPolisher based (CMPB) workflow curates a model using refineGEMs and ModelPolisher.
The starting point is either the input files for CarveMe (future update) or an already built model.

This pipeline aims at minimising the user's workload by concatenating steps that could be done individually with the
This workflow aims at minimising the user's workload by concatenating steps that could be done individually with the
integrated tools.

.. _cmpb-overview:

Overview of the ``CMPB`` Pipeline
Overview of the ``CMPB`` worfklow
---------------------------------

The following image shows an overrview of the steps of the pipeline:
The following image shows an overrview of the steps of the worfklow:

.. _cmpb_workflow:

Expand All @@ -26,7 +26,7 @@ The following steps are executed in the workflow:
.. hint::
All steps can also be performed individually.

Many of the steps of the pipeline can be fine tuned and turned off/on.
Many of the steps of the worfklow can be fine tuned and turned off/on.
Check the :doc:`configuration file <cmpb-config>` for a full list of all parameters.

- Step 0: Possible inputs
Expand Down
16 changes: 8 additions & 8 deletions docs/source/cmpb/run-pipeline.rst
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
Run the ``CMPB`` Pipeline
Run the ``CMPB`` Workflow
=========================

This page explains how to run the complete ``CMPB`` (CarveMe + ModelPolisher based) pipeline
This page explains how to run the complete ``CMPB`` (CarveMe + ModelPolisher based) worfklow
and how to collect the neccessary data.

For more information about the steps of the pipeline,
For more information about the steps of the worfklow,
see :ref:`cmpb-overview`.

``CMPB``: Quickstart
--------------------

.. warning::

Currently, the pipeline can only be run with an already generated model as input.
Currently, the workflow can only be run with an already generated model as input.
The CarveMe connection will be added in a future update.

The pipeline can either be run directly from the command line or its functions can be called from inside a Python script.
The worfklow can either be run directly from the command line or its functions can be called from inside a Python script.
The input in both cases is a :doc:`configuration file <cmpb-config>` that contains all information needed (data file paths and parameters) to run it.

The configuration can be downloaded using the command line:
Expand All @@ -34,9 +34,9 @@ To download the configuration file using Python, use:
specimen.setup.download_config(filename='./my_basic_config.yaml', type='cmpb')
After downloading the configuration file, open it with an editor and change the parameters as needed.
Missing entries will be reported when starting the pipeline.
Missing entries will be reported when starting the worfklow.

To run the pipeline using the configuration file, use
To run the worfklow using the configuration file, use

.. code-block:: bash
:class: copyable
Expand All @@ -55,7 +55,7 @@ from inside a Python script or Jupyter Notebook with "config.yaml" being the pat
``CMPB``: Collecting Data
-------------------------

The pipeline has two obligatory parameters:
The worfklow has two obligatory parameters:

- Path to a model

Expand Down
2 changes: 1 addition & 1 deletion docs/source/dev-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The version in this release has as of yet only be tested on prokaryote genomes.

.. note::

The pipeline has yet to be tested on different species and specifically Eukarya, if it works without problems on those as well.
The workflow has yet to be tested on different species and specifically Eukarya, if it works without problems on those as well.

``dev``: ONGOING
^^^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion docs/source/help.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Which types of organism work with ``SPECIMEN``?

``SPECIMEN`` was originally written for prokaryota. However, adapting ``SPECIMEN`` to work with
other organism types is something we hope to archieve with a future update.
Currently, the pipelines have yet to be tested on types of organism other than prokaryota.
Currently, the workflows have yet to be tested on types of organism other than prokaryota.

Which namespaces can I use?
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
14 changes: 7 additions & 7 deletions docs/source/hqtb/about-pipeline.rst
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
About HQTB
==========

The high-quality template based (``HTQB``) pipeline curates the model starting from the annotated
The high-quality template based (``HTQB``) workflow curates the model starting from the annotated
genome of your strain of interest and an already curated, ideally very high-quality template model
of a closely related strain (species) and additional database information.

This type of pipeline aims to build upon already existing knowledge to speed up model curation
This type of workflow aims to build upon already existing knowledge to speed up model curation
and minimize the need to perform steps again that have already been done in a similar context.

.. _overview-hqtb:

Overview of the ``HQTB`` Pipeline
Overview of the ``HQTB`` Workflow
---------------------------------

The following image shows an overrview of the steps of the pipeline:
The following image shows an overrview of the steps of the workflow:

.. image:: ../images/hqtb_pipeline-overview.png

The pipeline consists of five main steps:
The workflow consists of five main steps:

.. toctree::
:maxdepth: 3
Expand Down Expand Up @@ -56,12 +56,12 @@ The wrapper function allows the curation of multiple models sequentially using t
boudary parameters.

.. hint::
Many of the steps of the pipeline can be fine tuned and turned off/on.
Many of the steps of the workflow can be fine tuned and turned off/on.
Check the :doc:`configuration file <hqtb-config>` for a full list of all parameters.

.. note::

All steps of the pipeline can be run separatly via the command line or
All steps of the workflow can be run separatly via the command line or
the Python integration (see :doc:`run-pipeline`).

All accessible functions are listed in the :ref:`Contents of SPECIMEN` section.
Expand Down
22 changes: 11 additions & 11 deletions docs/source/hqtb/run-pipeline.rst
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
Run the ``HQTB`` Pipeline
Run the ``HQTB`` Workflow
=========================

This page explains how to run the complete ``HTQB`` (high-quality template based) pipeline
This page explains how to run the complete ``HTQB`` (high-quality template based) worfklow
and how to collected the neccessary data.

For more information about the steps of the pipeline,
For more information about the steps of the worfklow,
see :ref:`overview-hqtb`.

``HQTB``: Quickstart
--------------------

The pipeline can either be run directly from the command line or its functions can be called from inside a Python script.
The workflow can either be run directly from the command line or its functions can be called from inside a Python script.
The input in both cases is a configuration file that contains all information needed (data file paths and parameters) to run it.

The `configuration <hqtb-config.html>`__ can be downloaded using the command line:
Expand All @@ -34,9 +34,9 @@ To download the configuration file using Python, use:
As with the command line access, the type can be changed to ``hqtb-advanced``.

After downloading the configuration file, open it with an editor and change the parameters as needed.
Missing entries will be reported when starting the pipeline.
Missing entries will be reported when starting the worfklow.

To run the pipeline using the configuration file, use
To run the worfklow using the configuration file, use

.. code-block:: bash
:class: copyable
Expand All @@ -54,7 +54,7 @@ from inside a Python script or Jupyter Notebook with "config.yaml" being the pat

.. note::

Additionally, the pipeline can be run with a wrapper to susequently build multiple models for different genomes using the same parameters.
Additionally, the worfklow can be run with a wrapper to susequently build multiple models for different genomes using the same parameters.
The wrapper can be accessed using :code:`specimen hqtb run wrapper "config.yaml"` or :code:`specimen.workflow.wrapper_pipeline(config_file='/User/path/to/config.yaml', parent_dir="./")`.


Expand All @@ -72,7 +72,7 @@ If you are just starting a new project and do not have all the data ready to go,
| The function above creates the following directory structure for your project.
| The 'contains' column lists what is supposed to be inside the according folder.
The tags manual/semi/automated report how these files are added to the folder (automated = by the setup function; semi = multiple steps neccessary, some by the program, some by the user; manual = by the user).
The tags required/optional report whether this input is necessary to run the pipeline or if it is an optional input.
The tags required/optional report whether this input is necessary to run the workflow or if it is an optional input.
.. table::
:align: center
Expand Down Expand Up @@ -122,12 +122,12 @@ Further details for collecting the data:

- The information of the first three columns can be taken from the previous two steps while
- For the last column the user needs to check, if the genomes have been entered into KEGG and have an organism identifier.
- This file is purely optional for running the pipeline but potentially leads to better results.
- This file is purely optional for running the worfklow but potentially leads to better results.

- medium:

The media, either for analysis or gap filling can be entered into the pipeline via a config file.
The same media file can be used for both or one file for each step can be entered into the pipeline.
The media, either for analysis or gap filling can be entered into the workflow via a config file.
The same media file can be used for both or one file for each step can be entered into the workflow.
The config files are from the `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome` toolbox and access its in-build medium database.
Additionally, the config files allow for manual adjustment / external input.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/hqtb/step-desc/analysis.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Step 5: Analyse the Model
=========================

The fifth and final step of the pipeline is a short analysis of the curated model,
The fifth and final step of the workflow is a short analysis of the curated model,
generating a set of tables and graphics to nicely display the model content. Most steps,
excluding the statistical model analysis, are optional and can be skipped.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/hqtb/step-desc/bidirect_blast.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
Step 1: Bidirectional BLAST
===========================

The first step of the pipeline is to perform a bidirectional BLAST using `DIAMOND <https://github.com/bbuchfink/diamond>`__
The first step of the workflow is to perform a bidirectional BLAST using `DIAMOND <https://github.com/bbuchfink/diamond>`__
on the input and template genomes. The aim is to identify genes that are found
in both genomes. The bidirectional BLAST ensures a high certainty value. The idea
is based on the workflow described by :footcite:t:`norsigian2020workflow`.

Below is an overview of this step of the pipeline:
Below is an overview of this step of the workflow:

.. image:: ../../images/modules/1_bbh.png

Expand Down
4 changes: 2 additions & 2 deletions docs/source/hqtb/step-desc/gen_draft.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
Step 2: Generate a Draft Model
==============================

Based on the results of step 1, step 2 of the pipeline generates a draft model based on the
Based on the results of step 1, step 2 of the workflow generates a draft model based on the
template model.

.. note::

When running this step outside the context of the pipeline, the input files
When running this step outside the context of the workflow, the input files
need to be related to each other. The identifiers of the genes
for the matches of the bidirectional blast results and the genes in the template model
are the same or can be adjusted using the given parameters as needed.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/hqtb/step-desc/refine-parts/annot.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The third part of the refinement aims to improve the annotations of the model.

.. image:: ../../../images/modules/3_3_annotation.png

As seen in the picture above, the pipeline currently covers:
As seen in the picture above, the workflow currently covers:

- SBOterm annotations
- KEGG pathway annotations for reactions
Expand Down
2 changes: 1 addition & 1 deletion docs/source/hqtb/step-desc/refine-parts/cleanup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ Except for the *complete Bio/MetaCyc annotations* all steps are optional.

The gap filling is currently only available in the COBRApy variant.

This part of the pipeline is still a working process,
This part of the workflow is still a working process,
stay tuned for future updates.

4 changes: 2 additions & 2 deletions docs/source/hqtb/step-desc/validation.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Step 4: Model Validation
========================

After the previous step, the final model of the pipeline has been generated.
After the previous step, the final model of the workflow has been generated.
To ensure the model is functional and a valid SBML model, the fourth step
of the pipeline performs a validation of the created model.
of the worfklow performs a validation of the created model.

Currently implemented are the following validators (more will be added in future updates):

Expand Down
6 changes: 3 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ automated curation of high-quality, ideally strain-specific, genome-scale metabo

These workflows are mainly based on the `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome` toolbox.

Additionally, most of the functions and steps of the different pipelines in ``SPECIMEN`` can be used separatly.
Additionally, most of the functions and steps of the different workflows in ``SPECIMEN`` can be used separatly.

Available Pipelines
Available Workflows
-------------------

Currently, ``SPECIMEN`` includes the following pipelines (for a summary refer to :ref:`Overview of the pipelines`):
Currently, ``SPECIMEN`` includes the following workflows (for a summary refer to :ref:`overview of the workflows`):

.. image:: images/buttons/cmpb.png
:height: 0px
Expand Down
35 changes: 20 additions & 15 deletions docs/source/overview-pipes.rst
Original file line number Diff line number Diff line change
@@ -1,32 +1,33 @@
Overview of the Pipelines
Overview of the Workflows
=========================

The following pipelines are currently available:
The following workflows are currently available:

- ``CMPB``: CarveMe + ModelPolisher based
- ``HQTB``: High-quality template based

More information about these differnt types of pipelines can be found below.
More information about these differnt types of workflows can be found below.

.. hint::

Which type of pipeline to use is heavily dependent on your organism, available information for your organism
Which type of workflow to use is heavily dependent on your
organism, available information for your organism
and the data you wish to use as input.

CarveMe + ModelPolisher based (``CMPB``) pipeline
CarveMe + ModelPolisher based (``CMPB``) workflow
-------------------------------------------------

The ``CMPB`` pipeline is based on generating a draft model using `CarveMe <https://github.com/cdanielmachado/carveme>`__ and subsequently extending and polishing
The ``CMPB`` workflow is based on generating a draft model using `CarveMe <https://github.com/cdanielmachado/carveme>`__ and subsequently extending and polishing
the model using various tools.

This pipeline can be run on an annotated genome or an already generated CarveMe model and requires very little additional
This workflow can be run on an annotated genome or an already generated CarveMe model and requires very little additional
information to be run on its base settings.
However, additional information can be added from
e.g. KEGG or BioCyc to perform an automated gapfilling using `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome`.
e.g. KEGG or BioCyc to perform an automated gap filling using `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome`.

.. note::

Currently, the ModelPolisher connection is still under construction, but the pipeline
Currently, the ModelPolisher connection is still under construction, but the workflow
can already be run.

.. toctree::
Expand All @@ -37,15 +38,19 @@ e.g. KEGG or BioCyc to perform an automated gapfilling using `refineGEMs <https:
Run CMPB <cmpb/run-pipeline>
CMPB Configuration <cmpb/cmpb-config>

High-quality template based (``HQTB``) pipeline
High-quality template based (``HQTB``) workflow
-----------------------------------------------

The ``HQTB`` pipeline curates a new model from an annotated genome based on a high-quality template model
.. warning::
Due to chances in ``refineGEMs``, this workflow is under heavy
developement and may not work as expected.

The ``HQTB`` workflow curates a new model from an annotated genome based on a high-quality template model
(plus corresponding annotated genome) and additional database information.

This pipeline aims to profit from already performed (manual) curation of an already existing model,
This workflow aims to profit from already performed (manual) curation of an already existing model,
to carry this knowledge into the new model. The closer the template is to the original, the more knowledge
can potentially be carried over. Therefore, this pipeline is more useful, if the user already has a model of
can potentially be carried over. Therefore, this workflow is more useful, if the user already has a model of
a similar organism compared to the one for which the new model should be curated for.

.. toctree::
Expand All @@ -56,10 +61,10 @@ a similar organism compared to the one for which the new model should be curated
Run HQTB <hqtb/run-pipeline>
HQTB Configuration <hqtb/hqtb-config>

More ideas for pipelines
More ideas for workflows
------------------------

Below are some ideas for pipelines, to be implemented in future update(s):
Below are some ideas for workflows, to be implemented in future update(s):

.. toctree::
PGAB <pipeline_idea>
4 changes: 2 additions & 2 deletions docs/source/pipeline_idea.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
PGAB: From genome sequence to draft model
=========================================

PGAB stands for PGAP-based pipeline.
PGAB stands for PGAP-based workflow.

.. note::

This pipeline is still in the idea stage and will be object to a future update.
This workflow is still in the idea stage and will be object to a future update.

Generating a model for an organism, where no information on genes and proteins is obtainable via any database,
causes the problem, that the model will not contain valid database identifiers for any GeneProduct. To resolve this issue, the
Expand Down

0 comments on commit 1434e21

Please sign in to comment.