Merge branch 'refinegems-integration' of https://github.com/draeger-l…

…ab/SPECIMEN into refinegems-integration
draeger-lab · Jul 10, 2024 · 798f40a · 798f40a
2 parents 75084e4 + 1c9b87e
commit 798f40a
Showing 1 changed file with 31 additions and 26 deletions.
diff --git a/docs/source/cmpb/run-pipeline.rst b/docs/source/cmpb/run-pipeline.rst
@@ -1,8 +1,8 @@
-Run the CMPB Pipeline
-=====================
+Run the ``CMPB`` Pipeline
+=========================
 
-This page explains how to run the complete CMPB (CarveMe + ModelPolisher based) pipeline 
-and how to collected the neccessary data.
+This page explains how to run the complete ``CMPB`` (CarveMe + ModelPolisher based) pipeline 
+and how to collect the neccessary data.
 
 For more information about the steps of the pipeline, 
 see :ref:`Overview of the CMPB Pipeline`.
@@ -21,12 +21,14 @@ The input in both cases is a configuration file that contains all information ne
 The configuration can be downloaded using the command line:
 
 .. code-block:: bash
+    :class: copyable
 
     specimen setup config -t cmpb
 
 To download the configuration file using Python, use:
 
 .. code-block:: python
+    :class: copyable
 
     import specimen
     specimen.setup.download_config(filename='./my_basic_config.yaml', type='cmpb')
@@ -37,47 +39,50 @@ Missing entries will be reported when starting the pipeline.
 To run the pipeline using the configuration file, use
 
 .. code-block:: bash
+    :class: copyable
 
     specimen cmpb run "config.yaml"
 
 on the command line or
 
 .. code-block:: python
+    :class: copyable
 
     specimen.cmpb.workflow.run(config_file='config.yaml')
 
 from inside a Python script or Jupyter Notebook with "config.yaml" being the path to your configuration file.
 
-CMPB: Collecting Data
----------------------
+``CMPB``: Collecting Data
+-------------------------
 
-The pipeline has only two absolutely required parameters:
+The pipeline has two obligatory parameters:
 
-- the path to the annotated genome file (if a model is given, should be the file used to create it)
-- a media configuration (from refineGEMs) for testing the model's growth
+- The path to the annotated genome file (if a model is given, should be the file used to create it)
+- A media configuration (from refineGEMs) for testing the model's growth
 
-Further data can be added as available and/or needed (all are totally optional):
+Further data can be added as available and/or needed (all are completely optional):
 
-- the generated draft model e.g. using CarveMe
-- The reference sequence GFF file (for gap analysis required, optional for CarveMe polishing)
-- if available, the KEGG organism ID (for gap analysis required, optional for CarveMe polishing)
-- the protein FASTA of your input genome (required for lab\_strain=True, otherwise optional)
-- additional files for gapfilling
+- The generated draft model e.g. using CarveMe
+- The reference sequence GFF file (for gap analysis via KEGG required, optional for CarveMe polishing)
+- If available, the KEGG organism ID (for gap analysis via KEGG required, optional for CarveMe polishing)
+- The protein FASTA of your input genome (required for lab\_strain=True, otherwise optional)
+- Additional files for filling gaps: 
 
-    - for KEGG see bullet points above 
-    - for BioCyc, three txt files with downloaded smart tables and a protein fasta with:
+    - For KEGG see bullet points above 
+    - For BioCyc, three txt files from downloaded BioCyc SmartTables and a protein FASTA with:
 
          - 'Accession-2' and 'Reaction of gene' columns
-         - all reaction relevant information (\*)
-         - all metabolite relevant information (+)
-         - protein FASTA used as input for CarveMe
+         - All reaction relevant information [#]_
+         - All metabolite relevant information [#]_
+         - Protein FASTA used as input for CarveMe
 
-    - optionally, a manually curated EXCEL sheet with information to be (potentially) added to the model
+    - Optionally, a manually curated EXCEL sheet with information to be (potentially) added to the model
 
-- to enable adjusting the biomass objective function using BOFdat, the following information is required
+- To enable adjusting the biomass objective function using BOFdat, the following information is required
 
-    - path to a file containing the full genome sequenece of your organism
-    - the DNA weight fraction of your organism (experimentally determined or retrieved using literature research)
-    - the enzmy/ion weight fraction of your organism (experimentally determined or retrieved using literature research)
-
+    - Path to a file containing the full genome sequenece of your organism
+    - The DNA weight fraction of your organism (experimentally determined or retrieved using literature research)
+    - The enzyme/ion weight fraction of your organism (experimentally determined or retrieved using literature research)
 
+.. [#] 'Reaction' 'Reactants of reaction' 'Products of reaction' 'EC-Number' 'KEGG Reaction' 'MetaNetX' 'Reaction-Direction' 'Spontaneous?'
+.. [#] 'Compound' 'Object ID' 'Chemical Formula' 'InChI-Key' 'ChEBI'