Skip to content

Releases: iquasere/KEGGCharter

Ready for bioconda

28 Jul 14:08
Compare
Choose a tag to compare

Added LICENSE
Linted the YAML

Some things for bioconda

28 Jul 10:26
Compare
Choose a tag to compare

Composed the meta.yaml
Removed some last references to reCOGnizer
Removed the solve EC numbers shenanigan - should deal with it in MOSCA
Several TODOS still to go!

KEGGCharter

08 Jun 16:23
Compare
Choose a tag to compare

A tool for representing genomic potential and transcriptomic expression into KEGG pathways

Features

KEGGCharter is a user-friendly implementation of KEGG API and Pathway functionalities. It allows for:

  • Conversion of KEGG IDs to KEGG Orthologs (KO) and of KO to EC numbers
  • Representation of the metabolic potential of the main taxa in KEGG metabolic maps (up to the top 10, each distinguished by its own colour)
  • Representation of differential expression between samples in KEGG metabolic maps (the collective sum of each function will be represented)

Installation

To install KEGGCharter, simply clone this repository and run install.bash! It requires Conda previousy installed!

git clone https://github.com/iquasere/KEGGCharter.git
sudo KEGGCharter/install.bash

Usage

reCOGnizer needs an input file, but that is all it needs!

usage: kegg_charter.py [-h] [-f FILE] [-o OUTPUT] [--tsv] [-mm METABOLIC_MAPS]
                       [-mgc METAGENOMIC_COLUMNS]
                       [-mtc METATRANSCRIPTOMIC_COLUMNS] [-tc TAXA_COLUMN]
                       [-tls TAXA_LIST] [-not NUMBER_OF_TAXA]
                       [-koc KOS_COLUMN] [-v] [-utc]
                       [-tl {SPECIES,GENUS,FAMILY,ORDER,CLASS,PHYLUM,SUPERKINGDOM}]
                       [--show-available-maps]

reCOGnizer - a tool for domain based annotation with the COG database

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  TSV or EXCEL table with information to chart
  -o OUTPUT, --output OUTPUT
                        Output directory
  --tsv                 Results will be outputed in TSV format (and not
                        EXCEL).
  -mm METABOLIC_MAPS, --metabolic-maps METABOLIC_MAPS
                        IDs of metabolic maps to output
  -mgc METAGENOMIC_COLUMNS, --metagenomic-columns METAGENOMIC_COLUMNS
                        Names of columns with metagenomic quantification
  -mtc METATRANSCRIPTOMIC_COLUMNS, --metatranscriptomic-columns METATRANSCRIPTOMIC_COLUMNS
                        Names of columns with metatranscriptomics
                        quantification
  -tc TAXA_COLUMN, --taxa-column TAXA_COLUMN
                        Column with the taxa designations to represent with
                        KEGGChart
  -tls TAXA_LIST, --taxa-list TAXA_LIST
                        List of taxa to represent in genomic potential charts
                        (comma separated)
  -not NUMBER_OF_TAXA, --number-of-taxa NUMBER_OF_TAXA
                        Number of taxa to represent in genomic potential
                        charts (comma separated)
  -koc KOS_COLUMN, --kos-column KOS_COLUMN
                        "If input file has a column "KO (KEGG Charter)",
                        setting this option will make KEGG Charter use those
                        KOs instead (THIS ARGUMENT OVERRIDES KEGG IDS COLUMNS,
                        USING KOS DIRECTLY INSTEAD!)
  -v, --version         show program's version number and exit

UniProt arguments:
  -utc, --uniprot-taxonomic-columns
                        If columns have UniProt names, KEGGCharter will search
                        for UniProt designations (e.g. Taxonomic
                        lineage(GENUS))
  -tl {SPECIES,GENUS,FAMILY,ORDER,CLASS,PHYLUM,SUPERKINGDOM}, --taxonomic-level {SPECIES,GENUS,FAMILY,ORDER,CLASS,PHYLUM,SUPERKINGDOM}
                        The taxonomic level to represent

Special functions:
  --show-available-maps
                        Outputs KEGG maps IDs and descriptions to the console
                        (so you may pick the ones you want!)

To run KEGGCharter, an input file must be supplied - see "Example" section - and the columns with genomic and/or transcriptomic information as well. Output directory is not mandatory, but may help find results.

python kegg_charter.py -f input_file.xlsx -o output_folder -mgc mg_column1,mg_column2 -mtc mt_column1,mt_column2 ...

Example

An example input file is available when downloading the GitHub repository. Inserting the KEGG IDs and genomic and transcriptomic quantifications in this file will allow to use KEGGCharter with no errors... in principle.

Outputs

KEGGCharter produces a table from the inputed data with two new columns - KO (KEGG Charter) and EC number (KEGG Charter) - containing the results of conversion of KEGG IDs to KOs and KOs to EC numbers, respectively. This file is saved as KEGGCharter_results in the output directory.
KEGGCharter then represents this information in KEGG metabolic maps. If information is available as result of (meta)genomics analysis, KEGGCharter will localize the boxes whose functions are present in the organisms' genomes, mapping their genomic potential. If (meta)transcriptomics data is available, KEGGCharter will consider the sample as a whole, measuring gene expression and performing a multi-sample comparison for each function in the metabolic maps.

  • maps with genomic information are identified with the prefix "potential_" from genomic potential
  • maps with transcriptomic information are identified with the prefix "differential_" from differential expression