-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
33 changed files
with
331 additions
and
7 deletions.
There are no files selected for viewing
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
PGAB: From genome sequence to draft model | ||
========================================= | ||
|
||
PGAB: PGAP based pipeline | ||
|
||
.. note:: | ||
|
||
This pipeline is still in the idea stage and will be object to a future update. | ||
|
||
Generating a model for an organism where no information on genes and proteins is obtainable via any database | ||
causes the problem that the model will not contain valid database identifiers for any GeneProduct. To resolve this issue the | ||
workflow in Figure :numref:`workflow` can be used. | ||
|
||
1. First annotate the genome with NCBI's Prokaryotic Genome Annotation Pipeline (PGAP) to obtain the same FASTA format as used in NCBI and use the flag for taxonomy checking. | ||
2. Then use DIAMOND with the ``nr`` database from NCBI and the obtained annotated FASTA file as input. Restrict the search to your organism's taxon if known. | ||
3. Check if any protein in the annotation FASTA file still has no database identifier. | ||
|
||
| -> YES: Rerun DIAMOND without the taxonomy check and without the restriction for the organism's taxon. | ||
| | ||
| -> NO: Continue with step 4. | ||
4. Add the DIAMOND result to the annotated FASTA file. | ||
5. Run e.g. ``CarveMe`` to obtain a draft model. | ||
6. Check if in the model any GeneProducts without NCBI Protein or RefSeq identifiers occur. | ||
|
||
| -> YES: | ||
| i. Use individual BLAST searches for the remaining GeneProducts. | ||
| ii. Add the results to the annotated and refined FASTA file. | ||
| iii. Create again a draft model with the same program with the newly refined FASTA file. | ||
| | ||
| -> NO: The draft model is done. | ||
.. _workflow: | ||
|
||
.. figure:: images/genome2draft.svg | ||
:alt: Workflow from genome sequence to a draft model | ||
|
||
Workflow from genome sequence to a draft model |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.