forked from aweimann/traitar
-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1 from aweimann/master
v1.01
- Loading branch information
Showing
28 changed files
with
209 additions
and
79 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
Traitar – the microbial trait analyzer | ||
====================================== | ||
|
||
Traitar is a software for characterizing microbial samples from | ||
nucleotide or protein sequences. It can accurately phenotype `67 diverse | ||
traits <https://github.com/hzi-bifo/traitar/blob/master/traits.tsv>`__. | ||
Please take a look at the `gitHub repository <https://github.com/hzi-bifo/traitar/>`__ for further information. | ||
|
||
Table of Contents | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
| `Installation <#installation>`__ | ||
| `Basic usage <#basic-usage>`__ | ||
| `Results <#results>`__ | ||
Installation | ||
============ | ||
|
||
Please see `INSTALL.md <https://github.com/hzi-bifo/traitar/blob/master/INSTALL.md>`__ for installation instructions. | ||
|
||
Basic usage | ||
=========== | ||
|
||
``traitar phenotype <in dir> <sample file> from_nucleotides <out_dir>`` | ||
|
||
will trigger the standard `workflow <https://raw.githubusercontent.com/hzi-bifo/traitar/master/workflow.png>`__ of Traitar, which is to predict open | ||
reading frames with Prodigal, annotate the coding sequences provided as | ||
nucleotide FASTAs in the for all samples in with Pfam families using | ||
HMMer and finally predict phenotypes from the models for the 67 traits. | ||
|
||
The sample file has one column for the sample file names and one for the | ||
names as specified by the user. You can also specify a grouping of the | ||
samples in the third column, which will be shown in the generated plots. | ||
The template looks like following - The header row is mandatory; please | ||
also take a look at the sample file for the packaged example data: | ||
sample\_file\_name{tab}sample\_name{tab}category | ||
sample1\_file\_name{tab}sample1\_name[{tabl}sample\_category1] | ||
sample2\_file\_name{tab}sample2\_name[{tabl}sample\_category2] | ||
|
||
``traitar phenotype <in dir> <sample file> from_genes <out_dir>`` | ||
|
||
assumes that gene prediction has been conducted already externally. In | ||
this case analysis will start with the Pfam annotation. If the output | ||
directory already exists, Traitar will offer to recompute or resume the | ||
individual analysis steps. This option is only available if the process | ||
is run interactively. | ||
|
||
Parallel usage | ||
~~~~~~~~~~~~~~ | ||
|
||
Traitar can benefit from parallel execution. The ``-c`` parameter sets | ||
the number of processes used e.g. ``-c 2`` for using two processes | ||
|
||
``traitar phenotype <in dir> <sample file> from_nucleotides out_dir -c 2`` | ||
|
||
This requires installing GNU parallel as noted above. | ||
|
||
Run Traitar with packaged sample data | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
``traitar phenotype <traitar_dir>/data/sample_data <traitar_dir>/data/sample_data/samples.txt from_genes <out_dir> -c 2`` | ||
will trigger phenotyping of *Listeria grayi DSM\_20601* and *Listeria | ||
ivanovii WSLC3009*. Computation should be done within 5 minutes. You can | ||
find out ``<traitar_dir>`` by running | ||
|
||
:: | ||
|
||
python | ||
>>> import traitar | ||
>>> traitar.__path__ | ||
|
||
Results | ||
======= | ||
|
||
Traitar provides the gene prediction results in | ||
``<out_dir>/gene_prediction``, the Pfam annotation in | ||
``<out_dir>/pfam_annotation`` and the phenotype prediction | ||
in\ ``<out_dir>/phenotype prediction``. | ||
|
||
Heatmaps | ||
~~~~~~~~ | ||
|
||
The phenotype prediction is summarized in heatmaps individually for the | ||
phyletic pattern classifier in ``heatmap_phypat.png``, for the | ||
phylogeny-aware classifier in ``heatmap_phypat_ggl.png`` and for both | ||
classifiers `combined <https://github.com/aweimann/traitar/blob/master/traitar/data/sample_data/traitar_out/phenotype_prediction/heatmap_combined.png?raw=true>`__ | ||
in ``heatmap_comb.png`` and provide hierarchical | ||
clustering dendrograms for phenotypes and the samples. | ||
|
||
Phenotype prediction - Tables and flat files | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
These heatmaps are based on tab separated text files e.g. | ||
``predictions_majority-votes_combined.txt``. A negative prediction is | ||
encoded as 0, a prediction made only by the pure phyletic classifier as | ||
1, one made by the phylogeny-aware classifier by 2 and a prediction | ||
supported by both algorithms as 3. | ||
``predictions_flat_majority-votes_combined.txt`` provides a flat version | ||
of this table with one prediction per row. The expert user might also | ||
want to access the individual results for each algorithm in the | ||
respective sub folders ``phypat`` and ``phypat+PGL``. | ||
|
||
Feature tracks | ||
~~~~~~~~~~~~~~ | ||
|
||
If Traitar is run from\_nucleotides it will generate a link between the | ||
Prodigal gene prediction and predicted phenotypes in | ||
``phypat/feat_gffs`` and ``phypat+PGL/feat_gffs`` (no example in the | ||
sample data). The user can visualize gene prediction phenotype-specific | ||
Pfam annotations tracks via GFF files. | ||
|
||
Feature tracks with *from\_genes* option (experimental feature) | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
If the *from\_genes* option is set, the user may specify gene GFF files | ||
via an additional column called gene\_gff in the sample file. As gene | ||
ids are not consistent across gene GFFs from different sources e.g. img, | ||
RefSeq or Prodigal the user needs to specify the origin of the gene gff | ||
file via the -g / --gene\_gff\_type parameter. Still there is no | ||
guarantee that this works currently. Using samples\_gene\_gff.txt as the | ||
sample file in the above example will generate phenotype-specific Pfam | ||
tracks for the two genomes. | ||
|
||
``traitar phenotype . samples_gene_gff.txt from_genes traitar_out -g refseq`` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,9 +12,13 @@ | |
verstr = mo.group(1) | ||
else: | ||
raise RuntimeError("Unable to find version string in %s." % (VERSIONFILE,)) | ||
|
||
long_description = open('README.rst', 'r').read() | ||
|
||
setup(name='traitar', | ||
version = verstr, | ||
description='traitar - The microbial trait analyzer', | ||
long_description = long_description, | ||
url = 'http://github.com/aweimann/traitar', | ||
author='Aaron Weimann', | ||
author_email='[email protected]', | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
__version__ = "1.0.0" | ||
__version__ = "1.0.1" |
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file modified
BIN
+337 Bytes
(100%)
traitar/data/sample_data/traitar_out/phenotype_prediction/heatmap_combined.pdf
Binary file not shown.
Binary file modified
BIN
+336 Bytes
(100%)
traitar/data/sample_data/traitar_out/phenotype_prediction/heatmap_phypat.pdf
Binary file not shown.
Binary file modified
BIN
+336 Bytes
(100%)
traitar/data/sample_data/traitar_out/phenotype_prediction/heatmap_phypat_pgl.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion
2
...sample_data/traitar_out/phenotype_prediction/phypat+PGL/predictions_conservative-vote.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
Salicin Catalase Gelatin hydrolysis Coccus Lysine decarboxylase Motile Coccus - pairs or chains predominate Maltose Growth on ordinary blood agar Colistin-Polymyxin susceptible Melibiose Spore formation Yellow pigment DNase Nitrate to nitrite Gram positive Anaerobe Bile-susceptible Glucose oxidizer Gram negative Ornithine decarboxylase L-Arabinose Casein hydrolysis Gas from glucose Lactose Tartrate utilization Raffinose Cellobiose L-Rhamnose Bacillus or coccobacillus Mucate utilization Indole D-Xylose Starch hydrolysis Growth on MacConkey agar Citrate Urea hydrolysis Glycerol Voges Proskauer Pyrrolidonyl-beta-naphthylamide Lipase D-Mannitol Trehalose Nitrite to gas Arginine dihydrolase Acetate utilization Malonate myo-Inositol Methyl red ONPG (beta galactosidase) D-Mannose Growth in 6.5% NaCl Growth at 42 degrees C Glucose fermenter Aerobe Coccus - clusters or groups predominate Capnophilic Oxidase Alkaline phosphatase Beta hemolysis Growth in KCN Hydrogen sulfide Facultative Esculin hydrolysis Sucrose D-Sorbitol Coagulase production | ||
Salicin Catalase Gelatin hydrolysis Coccus Lysine decarboxylase Motile Coccus - pairs or chains predominate Maltose Growth on ordinary blood agar Colistin-Polymyxin susceptible Melibiose Spore formation Yellow pigment DNase Nitrate to nitrite Gram positive Anaerobe Bile-susceptible Glucose oxidizer Gram negative Ornithine decarboxylase L-Arabinose Casein hydrolysis Gas from glucose Lactose Tartrate utilization Raffinose Cellobiose L-Rhamnose Bacillus or coccobacillus Mucate utilization Indole D-Xylose Starch hydrolysis Growth on MacConkey agar Citrate Urea hydrolysis Glycerol Voges Proskauer Pyrrolidonyl-beta-naphthylamide Lipase D-Mannitol Trehalose Nitrite to gas Arginine dihydrolase Acetate utilization Malonate myo-Inositol Methyl red ONPG (beta galactosidase) D-Mannose Growth in 6.5% NaCl Growth at 42°C Glucose fermenter Aerobe Coccus - clusters or groups predominate Capnophilic Oxidase Alkaline phosphatase Beta hemolysis Growth in KCN Hydrogen sulfide Facultative Esculin hydrolysis Sucrose D-Sorbitol Coagulase production | ||
Listeria_grayi_DSM_20601 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 0 1 | ||
Listeria_ivanovii_WSLC3009 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 1 0 0 0 0 1 0 0 0 1 1 0 0 1 |
2 changes: 1 addition & 1 deletion
2
...ata/sample_data/traitar_out/phenotype_prediction/phypat+PGL/predictions_majority-vote.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
Salicin Catalase Gelatin hydrolysis Coccus Lysine decarboxylase Motile Coccus - pairs or chains predominate Maltose Growth on ordinary blood agar Colistin-Polymyxin susceptible Melibiose Spore formation Yellow pigment DNase Nitrate to nitrite Gram positive Anaerobe Bile-susceptible Glucose oxidizer Gram negative Ornithine decarboxylase L-Arabinose Casein hydrolysis Gas from glucose Lactose Tartrate utilization Raffinose Cellobiose L-Rhamnose Bacillus or coccobacillus Mucate utilization Indole D-Xylose Starch hydrolysis Growth on MacConkey agar Citrate Urea hydrolysis Glycerol Voges Proskauer Pyrrolidonyl-beta-naphthylamide Lipase D-Mannitol Trehalose Nitrite to gas Arginine dihydrolase Acetate utilization Malonate myo-Inositol Methyl red ONPG (beta galactosidase) D-Mannose Growth in 6.5% NaCl Growth at 42 degrees C Glucose fermenter Aerobe Coccus - clusters or groups predominate Capnophilic Oxidase Alkaline phosphatase Beta hemolysis Growth in KCN Hydrogen sulfide Facultative Esculin hydrolysis Sucrose D-Sorbitol Coagulase production | ||
Salicin Catalase Gelatin hydrolysis Coccus Lysine decarboxylase Motile Coccus - pairs or chains predominate Maltose Growth on ordinary blood agar Colistin-Polymyxin susceptible Melibiose Spore formation Yellow pigment DNase Nitrate to nitrite Gram positive Anaerobe Bile-susceptible Glucose oxidizer Gram negative Ornithine decarboxylase L-Arabinose Casein hydrolysis Gas from glucose Lactose Tartrate utilization Raffinose Cellobiose L-Rhamnose Bacillus or coccobacillus Mucate utilization Indole D-Xylose Starch hydrolysis Growth on MacConkey agar Citrate Urea hydrolysis Glycerol Voges Proskauer Pyrrolidonyl-beta-naphthylamide Lipase D-Mannitol Trehalose Nitrite to gas Arginine dihydrolase Acetate utilization Malonate myo-Inositol Methyl red ONPG (beta galactosidase) D-Mannose Growth in 6.5% NaCl Growth at 42°C Glucose fermenter Aerobe Coccus - clusters or groups predominate Capnophilic Oxidase Alkaline phosphatase Beta hemolysis Growth in KCN Hydrogen sulfide Facultative Esculin hydrolysis Sucrose D-Sorbitol Coagulase production | ||
Listeria_grayi_DSM_20601 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 1 1 1 0 1 | ||
Listeria_ivanovii_WSLC3009 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 1 0 0 0 1 1 0 1 1 |
2 changes: 1 addition & 1 deletion
2
...data/traitar_out/phenotype_prediction/phypat+PGL/predictions_majority-vote_mean-score.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,3 @@ | ||
Salicin Catalase Gelatin hydrolysis Coccus Lysine decarboxylase Motile Coccus - pairs or chains predominate Maltose Growth on ordinary blood agar Colistin-Polymyxin susceptible Melibiose Spore formation Yellow pigment DNase Nitrate to nitrite Gram positive Anaerobe Bile-susceptible Glucose oxidizer Gram negative Ornithine decarboxylase L-Arabinose Casein hydrolysis Gas from glucose Lactose Tartrate utilization Raffinose Cellobiose L-Rhamnose Bacillus or coccobacillus Mucate utilization Indole D-Xylose Starch hydrolysis Growth on MacConkey agar Citrate Urea hydrolysis Glycerol Voges Proskauer Pyrrolidonyl-beta-naphthylamide Lipase D-Mannitol Trehalose Nitrite to gas Arginine dihydrolase Acetate utilization Malonate myo-Inositol Methyl red ONPG (beta galactosidase) D-Mannose Growth in 6.5% NaCl Growth at 42 degrees C Glucose fermenter Aerobe Coccus - clusters or groups predominate Capnophilic Oxidase Alkaline phosphatase Beta hemolysis Growth in KCN Hydrogen sulfide Facultative Esculin hydrolysis Sucrose D-Sorbitol Coagulase production | ||
Salicin Catalase Gelatin hydrolysis Coccus Lysine decarboxylase Motile Coccus - pairs or chains predominate Maltose Growth on ordinary blood agar Colistin-Polymyxin susceptible Melibiose Spore formation Yellow pigment DNase Nitrate to nitrite Gram positive Anaerobe Bile-susceptible Glucose oxidizer Gram negative Ornithine decarboxylase L-Arabinose Casein hydrolysis Gas from glucose Lactose Tartrate utilization Raffinose Cellobiose L-Rhamnose Bacillus or coccobacillus Mucate utilization Indole D-Xylose Starch hydrolysis Growth on MacConkey agar Citrate Urea hydrolysis Glycerol Voges Proskauer Pyrrolidonyl-beta-naphthylamide Lipase D-Mannitol Trehalose Nitrite to gas Arginine dihydrolase Acetate utilization Malonate myo-Inositol Methyl red ONPG (beta galactosidase) D-Mannose Growth in 6.5% NaCl Growth at 42°C Glucose fermenter Aerobe Coccus - clusters or groups predominate Capnophilic Oxidase Alkaline phosphatase Beta hemolysis Growth in KCN Hydrogen sulfide Facultative Esculin hydrolysis Sucrose D-Sorbitol Coagulase production | ||
Listeria_grayi_DSM_20601 0.525 1.170 1.137 1.234 1.454 1.002 0.268 0.440 0.780 0.177 0.178 0.751 0.453 0.275 0.969 0.383 0.971 0.290 1.389 0.792 1.306 1.827 0.060 0.723 | ||
Listeria_ivanovii_WSLC3009 0.887 1.218 1.181 0.908 1.803 1.054 0.810 2.162 0.074 0.593 0.142 1.042 0.980 0.693 0.843 0.213 1.139 1.714 0.792 0.987 1.941 2.021 0.085 0.820 |
Oops, something went wrong.