Update docs #5

draeger-lab · Aug 27, 2024 · 1434e21 · 1434e21
1 parent fc76ad3
commit 1434e21
Show file tree

Hide file tree

Showing 15 changed files with 67 additions and 62 deletions.
diff --git a/docs/source/cmpb/about-pipeline.rst b/docs/source/cmpb/about-pipeline.rst
@@ -1,18 +1,18 @@
 About ``CMPB``
 ==============
 
-The CarveMe + ModelPolisher based (CMPB) pipeline curates a model using refineGEMs and ModelPolisher.
+The CarveMe + ModelPolisher based (CMPB) workflow curates a model using refineGEMs and ModelPolisher.
 The starting point is either the input files for CarveMe (future update) or an already built model.
 
-This pipeline aims at minimising the user's workload by concatenating steps that could be done individually with the 
+This workflow aims at minimising the user's workload by concatenating steps that could be done individually with the 
 integrated tools.
 
 .. _cmpb-overview:
 
-Overview of the ``CMPB`` Pipeline
+Overview of the ``CMPB`` worfklow
 ---------------------------------
 
-The following image shows an overrview of the steps of the pipeline:
+The following image shows an overrview of the steps of the worfklow:
 
 .. _cmpb_workflow:
 
@@ -26,7 +26,7 @@ The following steps are executed in the workflow:
 .. hint::
   All steps can also be performed individually.
 
-  Many of the steps of the pipeline can be fine tuned and turned off/on. 
+  Many of the steps of the worfklow can be fine tuned and turned off/on. 
   Check the :doc:`configuration file <cmpb-config>` for a full list of all parameters.
 
 - Step 0: Possible inputs

diff --git a/docs/source/cmpb/run-pipeline.rst b/docs/source/cmpb/run-pipeline.rst
@@ -1,21 +1,21 @@
-Run the ``CMPB`` Pipeline
+Run the ``CMPB`` Workflow
 =========================
 
-This page explains how to run the complete ``CMPB`` (CarveMe + ModelPolisher based) pipeline 
+This page explains how to run the complete ``CMPB`` (CarveMe + ModelPolisher based) worfklow 
 and how to collect the neccessary data.
 
-For more information about the steps of the pipeline, 
+For more information about the steps of the worfklow, 
 see :ref:`cmpb-overview`.
 
 ``CMPB``: Quickstart 
 --------------------
 
 .. warning::
 
-    Currently, the pipeline can only be run with an already generated model as input.
+    Currently, the workflow can only be run with an already generated model as input.
     The CarveMe connection will be added in a future update.
 
-The pipeline can either be run directly from the command line or its functions can be called from inside a Python script.
+The worfklow can either be run directly from the command line or its functions can be called from inside a Python script.
 The input in both cases is a :doc:`configuration file <cmpb-config>` that contains all information needed (data file paths and parameters) to run it.
 
 The configuration can be downloaded using the command line:
@@ -34,9 +34,9 @@ To download the configuration file using Python, use:
     specimen.setup.download_config(filename='./my_basic_config.yaml', type='cmpb')
 
 After downloading the configuration file, open it with an editor and change the parameters as needed.
-Missing entries will be reported when starting the pipeline.
+Missing entries will be reported when starting the worfklow.
 
-To run the pipeline using the configuration file, use
+To run the worfklow using the configuration file, use
 
 .. code-block:: bash
     :class: copyable
@@ -55,7 +55,7 @@ from inside a Python script or Jupyter Notebook with "config.yaml" being the pat
 ``CMPB``: Collecting Data
 -------------------------
 
-The pipeline has two obligatory parameters:
+The worfklow has two obligatory parameters:
 
 - Path to a model 
 

diff --git a/docs/source/dev-notes.rst b/docs/source/dev-notes.rst
@@ -16,7 +16,7 @@ The version in this release has as of yet only be tested on prokaryote genomes.
 
 .. note::
 
-    The pipeline has yet to be tested on different species and specifically Eukarya, if it works without problems on those as well.
+    The workflow has yet to be tested on different species and specifically Eukarya, if it works without problems on those as well.
 
 ``dev``: ONGOING
 ^^^^^^^^^^^^^^^^

diff --git a/docs/source/help.rst b/docs/source/help.rst
@@ -33,7 +33,7 @@ Which types of organism work with ``SPECIMEN``?
 
 ``SPECIMEN`` was originally written for prokaryota. However, adapting ``SPECIMEN`` to work with 
 other organism types is something we hope to archieve with a future update. 
-Currently, the pipelines have yet to be tested on types of organism other than prokaryota.
+Currently, the workflows have yet to be tested on types of organism other than prokaryota.
 
 Which namespaces can I use?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^

diff --git a/docs/source/hqtb/about-pipeline.rst b/docs/source/hqtb/about-pipeline.rst
@@ -1,23 +1,23 @@
 About HQTB
 ==========
 
-The high-quality template based (``HTQB``) pipeline curates the model starting from the annotated 
+The high-quality template based (``HTQB``) workflow curates the model starting from the annotated 
 genome of your strain of interest and an already curated, ideally very high-quality template model 
 of a closely related strain (species) and additional database information.
 
-This type of pipeline aims to build upon already existing knowledge to speed up model curation
+This type of workflow aims to build upon already existing knowledge to speed up model curation
 and minimize the need to perform steps again that have already been done in a similar context.
 
 .. _overview-hqtb:
 
-Overview of the ``HQTB`` Pipeline
+Overview of the ``HQTB`` Workflow
 ---------------------------------
 
-The following image shows an overrview of the steps of the pipeline:
+The following image shows an overrview of the steps of the workflow:
 
 .. image:: ../images/hqtb_pipeline-overview.png
 
-The pipeline consists of five main steps:
+The workflow consists of five main steps:
 
 .. toctree::
     :maxdepth: 3
@@ -56,12 +56,12 @@ The wrapper function allows the curation of multiple models sequentially using t
 boudary parameters.
 
 .. hint::
-    Many of the steps of the pipeline can be fine tuned and turned off/on. 
+    Many of the steps of the workflow can be fine tuned and turned off/on. 
     Check the :doc:`configuration file <hqtb-config>` for a full list of all parameters.
 
 .. note::
 
-    All steps of the pipeline can be run separatly via the command line or 
+    All steps of the workflow can be run separatly via the command line or 
     the Python integration (see :doc:`run-pipeline`).
 
     All accessible functions are listed in the :ref:`Contents of SPECIMEN` section.

diff --git a/docs/source/hqtb/run-pipeline.rst b/docs/source/hqtb/run-pipeline.rst
@@ -1,16 +1,16 @@
-Run the ``HQTB`` Pipeline
+Run the ``HQTB`` Workflow
 =========================
 
-This page explains how to run the complete ``HTQB`` (high-quality template based) pipeline 
+This page explains how to run the complete ``HTQB`` (high-quality template based) worfklow 
 and how to collected the neccessary data.
 
-For more information about the steps of the pipeline, 
+For more information about the steps of the worfklow, 
 see :ref:`overview-hqtb`.
 
 ``HQTB``: Quickstart
 --------------------
 
-The pipeline can either be run directly from the command line or its functions can be called from inside a Python script.
+The workflow can either be run directly from the command line or its functions can be called from inside a Python script.
 The input in both cases is a configuration file that contains all information needed (data file paths and parameters) to run it.
 
 The `configuration <hqtb-config.html>`__ can be downloaded using the command line:
@@ -34,9 +34,9 @@ To download the configuration file using Python, use:
 As with the command line access, the type can be changed to ``hqtb-advanced``.
 
 After downloading the configuration file, open it with an editor and change the parameters as needed.
-Missing entries will be reported when starting the pipeline.
+Missing entries will be reported when starting the worfklow.
 
-To run the pipeline using the configuration file, use
+To run the worfklow using the configuration file, use
 
 .. code-block:: bash
     :class: copyable
@@ -54,7 +54,7 @@ from inside a Python script or Jupyter Notebook with "config.yaml" being the pat
 
 .. note::
 
-    Additionally, the pipeline can be run with a wrapper to susequently build multiple models for different genomes using the same parameters.
+    Additionally, the worfklow can be run with a wrapper to susequently build multiple models for different genomes using the same parameters.
     The wrapper can be accessed using :code:`specimen hqtb run wrapper "config.yaml"` or :code:`specimen.workflow.wrapper_pipeline(config_file='/User/path/to/config.yaml', parent_dir="./")`.
 
 
@@ -72,7 +72,7 @@ If you are just starting a new project and do not have all the data ready to go,
 | The function above creates the following directory structure for your project.
 | The 'contains' column lists what is supposed to be inside the according folder. 
   The tags manual/semi/automated report how these files are added to the folder (automated = by the setup function; semi = multiple steps neccessary, some by the program, some by the user; manual = by the user).
-  The tags required/optional report whether this input is necessary to run the pipeline or if it is an optional input.
+  The tags required/optional report whether this input is necessary to run the workflow or if it is an optional input.
 
 .. table::
     :align: center 
@@ -122,12 +122,12 @@ Further details for collecting the data:
 
         - The information of the first three columns can be taken from the previous two steps while
         - For the last column the user needs to check, if the genomes have been entered into KEGG and have an organism identifier.
-        - This file is purely optional for running the pipeline but potentially leads to better results.
+        - This file is purely optional for running the worfklow but potentially leads to better results.
 
 - medium:   
 
-    The media, either for analysis or gap filling can be entered into the pipeline via a config file. 
-    The same media file can be used for both or one file for each step can be entered into the pipeline. 
+    The media, either for analysis or gap filling can be entered into the workflow via a config file. 
+    The same media file can be used for both or one file for each step can be entered into the workflow. 
     The config files are from the `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome` toolbox and access its in-build medium database. 
     Additionally, the config files allow for manual adjustment / external input.
 

diff --git a/docs/source/hqtb/step-desc/analysis.rst b/docs/source/hqtb/step-desc/analysis.rst
@@ -1,7 +1,7 @@
 Step 5: Analyse the Model
 =========================
 
-The fifth and final step of the pipeline is a short analysis of the curated model, 
+The fifth and final step of the workflow is a short analysis of the curated model, 
 generating a set of tables and graphics to nicely display the model content. Most steps, 
 excluding the statistical model analysis, are optional and can be skipped.
 

diff --git a/docs/source/hqtb/step-desc/bidirect_blast.rst b/docs/source/hqtb/step-desc/bidirect_blast.rst
@@ -1,12 +1,12 @@
 Step 1: Bidirectional BLAST
 ===========================
 
-The first step of the pipeline is to perform a  bidirectional BLAST using `DIAMOND <https://github.com/bbuchfink/diamond>`__ 
+The first step of the workflow is to perform a  bidirectional BLAST using `DIAMOND <https://github.com/bbuchfink/diamond>`__ 
 on the input and template genomes. The aim is to identify genes that are found 
 in both genomes. The bidirectional BLAST ensures a high certainty value. The idea 
 is based on the workflow described by :footcite:t:`norsigian2020workflow`.
 
-Below is an overview of this step of the pipeline:
+Below is an overview of this step of the workflow:
 
 .. image:: ../../images/modules/1_bbh.png
 

diff --git a/docs/source/hqtb/step-desc/gen_draft.rst b/docs/source/hqtb/step-desc/gen_draft.rst
@@ -1,12 +1,12 @@
 Step 2: Generate a Draft Model
 ==============================
 
-Based on the results of step 1, step 2 of the pipeline generates a draft model based on the 
+Based on the results of step 1, step 2 of the workflow generates a draft model based on the 
 template model.
 
 .. note:: 
 
-    When running this step outside the context of the pipeline, the input files 
+    When running this step outside the context of the workflow, the input files 
     need to be related to each other. The identifiers of the genes 
     for the matches of the bidirectional blast results and the genes in the template model
     are the same or can be adjusted using the given parameters as needed.

diff --git a/docs/source/hqtb/step-desc/refine-parts/annot.rst b/docs/source/hqtb/step-desc/refine-parts/annot.rst
@@ -5,7 +5,7 @@ The third part of the refinement aims to improve the annotations of the model.
 
 .. image:: ../../../images/modules/3_3_annotation.png
 
-As seen in the picture above, the pipeline currently covers:
+As seen in the picture above, the workflow currently covers:
 
 - SBOterm annotations
 - KEGG pathway annotations for reactions

diff --git a/docs/source/hqtb/step-desc/refine-parts/cleanup.rst b/docs/source/hqtb/step-desc/refine-parts/cleanup.rst
@@ -17,6 +17,6 @@ Except for the *complete Bio/MetaCyc annotations* all steps are optional.
 
     The gap filling is currently only available in the COBRApy variant.
 
-    This part of the pipeline is still a working process, 
+    This part of the workflow is still a working process, 
     stay tuned for future updates.
 
diff --git a/docs/source/hqtb/step-desc/validation.rst b/docs/source/hqtb/step-desc/validation.rst
@@ -1,9 +1,9 @@
 Step 4: Model Validation
 ========================
 
-After the previous step, the final model of the pipeline has been generated.
+After the previous step, the final model of the workflow has been generated.
 To ensure the model is functional and a valid SBML model, the fourth step
-of the pipeline performs a validation of the created model.
+of the worfklow performs a validation of the created model.
 
 Currently implemented are the following validators (more will be added in future updates):
 

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -9,12 +9,12 @@ automated curation of high-quality, ideally strain-specific, genome-scale metabo
 
 These workflows are mainly based on the `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome` toolbox.
 
-Additionally, most of the functions and steps of the different pipelines in ``SPECIMEN`` can be used separatly.
+Additionally, most of the functions and steps of the different workflows in ``SPECIMEN`` can be used separatly.
 
-Available Pipelines
+Available Workflows
 -------------------
 
-Currently, ``SPECIMEN`` includes the following pipelines (for a summary refer to :ref:`Overview of the pipelines`):
+Currently, ``SPECIMEN`` includes the following workflows (for a summary refer to :ref:`overview of the workflows`):
 
 .. image:: images/buttons/cmpb.png
   :height: 0px

diff --git a/docs/source/overview-pipes.rst b/docs/source/overview-pipes.rst
@@ -1,32 +1,33 @@
-Overview of the Pipelines
+Overview of the Workflows
 =========================
 
-The following pipelines are currently available:
+The following workflows are currently available:
 
 - ``CMPB``: CarveMe + ModelPolisher based 
 - ``HQTB``: High-quality template based 
 
-More information about these differnt types of pipelines can be found below.
+More information about these differnt types of workflows can be found below.
 
 .. hint:: 
 
-    Which type of pipeline to use is heavily dependent on your organism, available information for your organism 
+    Which type of workflow to use is heavily dependent on your 
+    organism, available information for your organism 
     and the data you wish to use as input.
 
-CarveMe + ModelPolisher based (``CMPB``) pipeline
+CarveMe + ModelPolisher based (``CMPB``) workflow
 -------------------------------------------------
 
-The ``CMPB`` pipeline is based on generating a draft model using `CarveMe <https://github.com/cdanielmachado/carveme>`__ and subsequently extending and polishing
+The ``CMPB`` workflow is based on generating a draft model using `CarveMe <https://github.com/cdanielmachado/carveme>`__ and subsequently extending and polishing
 the model using various tools.
 
-This pipeline can be run on an annotated genome or an already generated CarveMe model and requires very little additional
+This workflow can be run on an annotated genome or an already generated CarveMe model and requires very little additional
 information to be run on its base settings. 
 However, additional information can be added from 
-e.g. KEGG or BioCyc to perform an automated gapfilling using `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome`.
+e.g. KEGG or BioCyc to perform an automated gap filling using `refineGEMs <https://github.com/draeger-lab/refinegems/tree/dev-2>`__ :footcite:p:`bauerle2023genome`.
 
 .. note::
 
-    Currently, the ModelPolisher connection is still under construction, but the pipeline
+    Currently, the ModelPolisher connection is still under construction, but the workflow
     can already be run.
 
 .. toctree::
@@ -37,15 +38,19 @@ e.g. KEGG or BioCyc to perform an automated gapfilling using `refineGEMs <https:
     Run CMPB <cmpb/run-pipeline>
     CMPB Configuration <cmpb/cmpb-config>
 
-High-quality template based (``HQTB``) pipeline
+High-quality template based (``HQTB``) workflow
 -----------------------------------------------
 
-The ``HQTB`` pipeline curates a new model from an annotated genome based on a high-quality template model 
+.. warning:: 
+    Due to chances in ``refineGEMs``, this workflow is under heavy 
+    developement and may not work as expected.
+
+The ``HQTB`` workflow curates a new model from an annotated genome based on a high-quality template model 
 (plus corresponding annotated genome) and additional database information. 
 
-This pipeline aims to profit from already performed (manual) curation of an already existing model, 
+This workflow aims to profit from already performed (manual) curation of an already existing model, 
 to carry this knowledge into the new model. The closer the template is to the original, the more knowledge 
-can potentially be carried over. Therefore, this pipeline is more useful, if the user already has a model of
+can potentially be carried over. Therefore, this workflow is more useful, if the user already has a model of
 a similar organism compared to the one for which the new model should be curated for.
 
 .. toctree::
@@ -56,10 +61,10 @@ a similar organism compared to the one for which the new model should be curated
     Run HQTB <hqtb/run-pipeline>
     HQTB Configuration <hqtb/hqtb-config>
 
-More ideas for pipelines
+More ideas for workflows
 ------------------------
 
-Below are some ideas for pipelines, to be implemented in future update(s):
+Below are some ideas for workflows, to be implemented in future update(s):
 
 .. toctree::
     PGAB <pipeline_idea>
diff --git a/docs/source/pipeline_idea.rst b/docs/source/pipeline_idea.rst
@@ -1,11 +1,11 @@
 PGAB: From genome sequence to draft model
 =========================================
 
-PGAB stands for PGAP-based pipeline.
+PGAB stands for PGAP-based workflow.
 
 .. note::
 
-    This pipeline is still in the idea stage and will be object to a future update.
+    This workflow is still in the idea stage and will be object to a future update.
 
 Generating a model for an organism, where no information on genes and proteins is obtainable via any database, 
 causes the problem, that the model will not contain valid database identifiers for any GeneProduct. To resolve this issue, the