Skip to content

Commit

Permalink
Merge branch 'refinegems-integration' of https://github.com/draeger-l…
Browse files Browse the repository at this point in the history
…ab/SPECIMEN into refinegems-integration
  • Loading branch information
cb-Hades committed Jul 10, 2024
2 parents 75084e4 + 1c9b87e commit 798f40a
Showing 1 changed file with 31 additions and 26 deletions.
57 changes: 31 additions & 26 deletions docs/source/cmpb/run-pipeline.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Run the CMPB Pipeline
=====================
Run the ``CMPB`` Pipeline
=========================

This page explains how to run the complete CMPB (CarveMe + ModelPolisher based) pipeline
and how to collected the neccessary data.
This page explains how to run the complete ``CMPB`` (CarveMe + ModelPolisher based) pipeline
and how to collect the neccessary data.

For more information about the steps of the pipeline,
see :ref:`Overview of the CMPB Pipeline`.
Expand All @@ -21,12 +21,14 @@ The input in both cases is a configuration file that contains all information ne
The configuration can be downloaded using the command line:

.. code-block:: bash
:class: copyable
specimen setup config -t cmpb
To download the configuration file using Python, use:

.. code-block:: python
:class: copyable
import specimen
specimen.setup.download_config(filename='./my_basic_config.yaml', type='cmpb')
Expand All @@ -37,47 +39,50 @@ Missing entries will be reported when starting the pipeline.
To run the pipeline using the configuration file, use

.. code-block:: bash
:class: copyable
specimen cmpb run "config.yaml"
on the command line or

.. code-block:: python
:class: copyable
specimen.cmpb.workflow.run(config_file='config.yaml')
from inside a Python script or Jupyter Notebook with "config.yaml" being the path to your configuration file.

CMPB: Collecting Data
---------------------
``CMPB``: Collecting Data
-------------------------

The pipeline has only two absolutely required parameters:
The pipeline has two obligatory parameters:

- the path to the annotated genome file (if a model is given, should be the file used to create it)
- a media configuration (from refineGEMs) for testing the model's growth
- The path to the annotated genome file (if a model is given, should be the file used to create it)
- A media configuration (from refineGEMs) for testing the model's growth

Further data can be added as available and/or needed (all are totally optional):
Further data can be added as available and/or needed (all are completely optional):

- the generated draft model e.g. using CarveMe
- The reference sequence GFF file (for gap analysis required, optional for CarveMe polishing)
- if available, the KEGG organism ID (for gap analysis required, optional for CarveMe polishing)
- the protein FASTA of your input genome (required for lab\_strain=True, otherwise optional)
- additional files for gapfilling
- The generated draft model e.g. using CarveMe
- The reference sequence GFF file (for gap analysis via KEGG required, optional for CarveMe polishing)
- If available, the KEGG organism ID (for gap analysis via KEGG required, optional for CarveMe polishing)
- The protein FASTA of your input genome (required for lab\_strain=True, otherwise optional)
- Additional files for filling gaps:

- for KEGG see bullet points above
- for BioCyc, three txt files with downloaded smart tables and a protein fasta with:
- For KEGG see bullet points above
- For BioCyc, three txt files from downloaded BioCyc SmartTables and a protein FASTA with:

- 'Accession-2' and 'Reaction of gene' columns
- all reaction relevant information (\*)
- all metabolite relevant information (+)
- protein FASTA used as input for CarveMe
- All reaction relevant information [#]_
- All metabolite relevant information [#]_
- Protein FASTA used as input for CarveMe

- optionally, a manually curated EXCEL sheet with information to be (potentially) added to the model
- Optionally, a manually curated EXCEL sheet with information to be (potentially) added to the model

- to enable adjusting the biomass objective function using BOFdat, the following information is required
- To enable adjusting the biomass objective function using BOFdat, the following information is required

- path to a file containing the full genome sequenece of your organism
- the DNA weight fraction of your organism (experimentally determined or retrieved using literature research)
- the enzmy/ion weight fraction of your organism (experimentally determined or retrieved using literature research)

- Path to a file containing the full genome sequenece of your organism
- The DNA weight fraction of your organism (experimentally determined or retrieved using literature research)
- The enzyme/ion weight fraction of your organism (experimentally determined or retrieved using literature research)

.. [#] 'Reaction' 'Reactants of reaction' 'Products of reaction' 'EC-Number' 'KEGG Reaction' 'MetaNetX' 'Reaction-Direction' 'Spontaneous?'
.. [#] 'Compound' 'Object ID' 'Chemical Formula' 'InChI-Key' 'ChEBI'

0 comments on commit 798f40a

Please sign in to comment.