Skip to content
ivartb edited this page Apr 26, 2023 · 25 revisions

Welcome to MetaFX wiki page!

MetaFX (METAgenomic Feature eXtraction) is an open-source library for feature extraction from whole-genome metagenome sequencing data and classification of groups of samples.

The idea behind MetaFX is to introduce the feature extraction algorithm specific for metagenomics short reads data. It is capable of processing hundreds of samples 1-10 Gb each. The distinct property of suggest approach is the construction of meaningful features, which can not only be used to train classification model, but also can be further annotated and biologically interpreted.

pipeline

Installation

To run MetaFX, one need to clone repo with all binaries and add them to PATH.

git clone https://github.com/ivartb/metafx_new
cd metafx_new
export PATH=bin:$PATH
Requirements:
  • JRE 1.8 or higher
  • python3
  • python libraries listed in requirements.txt file. Can be installed using pip
python -m pip install --upgrade pip
pip install -r requirements.txt
  • If you want to use metafx metaspades pipeline, you will also need SPAdes software. Please follow their installation instructions (not recommended for first-time use).

Scripts have been tested under Ubuntu 18.04 LTS and Ubuntu 20.04 LTS, but should generally work on Linux/MacOS.

Running instructions

To run MetaFX use the following syntax:

metafx <pipeline> [<Launch options>] [<Input parameters>]

To view the list of supported pipelines run metafx -h or metafx --help.

To view help for launch options and input parameters for selected pipeline run metafx <pipeline> -h or metafx <pipeline> --help.

By running MetaFX a working directory is created (by default ./workDir/). All intermediate files and final results are saved there.

MetaFX modules

MetaFX is a toolbox with a lot of modules divided into three groups:

idea

Unsupervised feature extraction pipelines

There are pipelines aimed to extract features from metagenomic dataset without any prior knowledge about samples and their relations. Algorithms perform (pseudo-)assembly of samples separately and construct the de Bruijn graph common for all samples. Further, graph components are extracted as features and feature table is constructed.

metafx metafast

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

metafx metaspades

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

Supervised feature extraction pipelines

There are pipelines aimed to extract group-relevant features based on metadata about samples such as diagnosis, treatment, biochemical results, etc. Dataset is split into groups of samples based on provided metadata information and group-specific features are constructed based on de Bruijn graphs. The resulting features are combined into feature table.

metafx unique

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

metafx stats

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

metafx colored

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

Methods for classification and interpretation

There are pipelines for analysis of the feature extraction results. Methods for samples similarity visualisation and training machine learning models are implemented. Classification models can be trained to predict samples' properties based on extracted features and to efficiently process new samples from the same environment.

metafx pca

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

metafx fit

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

metafx predict

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

metafx fit_predict

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

metafx cv

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

metafx calc_features

TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO TODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODOTODO

Examples

Minimal example is provided in the README page.

More sophisticated example is presented as a jupyter notebook with links to input and output data, all commands and commentaries.