Skip to content

Installation

Arkadiy-Garber edited this page Apr 23, 2020 · 23 revisions

This program can be installed simply by cloning the master git directory, and then running ./setup.sh. Alternatively, if Conda is not installed on your system, you can go with the manual installation, which is only a few extra commands on the part of the user. However, if you got with the setup_noconda.sh script, please make sure to have all the dependencies installed prior to running the program.

Quick Installation using Conda

Download FeGenie and enter it's directory:

git clone https://github.com/Arkadiy-Garber/FeGenie.git

cd FeGenie

Run the setup.sh script and activate the FeGenie-conda environment:

./setup.sh

source activate fegenie

Test run:

./test_run.sh

View options:

FeGenie.py -h

Regular Installation

Dependencies

  • Python (version 3.6 or higher)
  • Diamond (version 0.9.22.123) (only necessary if you are doing the cross-validation against a reference database)
  • BLAST (version 2.7.1+)
  • HMMER (version 3.2.1)
  • Prodigal (version 2.6.3)
  • R (version 3.5.1)
  • Rscript

Download FeGenie and enter its directory:

git clone https://github.com/Arkadiy-Garber/FeGenie.git

cd FeGenie

Put the FeGenie master directory into your path:

export PATH=$PATH:$(pwd)

Run the easy-start setup script:

./setup_noconda.sh

Test run:

./test_run.sh

View options:

FeGenie.py -h

Installing R packages using Rscript

Copy and paste the following commands into your terminal window (these commands may require super user permissions to run some of these commands, so if you are running FeGenie off a server where you do not have such permissions, Fegenie may not be able to generate R plot automatically for you)

Rscript -e 'install.packages("ggplot2", repos = "http://cran.us.r-project.org")'

Rscript -e 'install.packages("reshape", repos = "http://cran.us.r-project.org")'

Rscript -e 'install.packages("reshape2", repos = "http://cran.us.r-project.org")'

Rscript -e 'install.packages("tidyverse", repos = "http://cran.us.r-project.org")'

Rscript -e 'install.packages("argparse", repos = "http://cran.us.r-project.org")'

Rscript -e 'install.packages("ggdendro", repos = "http://cran.us.r-project.org")'

Rscript -e 'install.packages("ggpubr", repos = "http://cran.us.r-project.org")'

Rscript -e 'install.packages("grid", repos = "http://cran.us.r-project.org")'

Rscript -e 'install.packages("pvclust", repos = "http://cran.us.r-project.org")'

Obtaining NCBI's nr database for cross-validation (optional)

Part of this program includes an optional cross-validation of the identified putative iron genes against a protein database of your choice. We recommend validating against NCBI's nr database, as that is one of, if not the, largest sequence repository; it, thus, offers the highest chance of providing a homolog. This option is exercized by simply providing the script with the lcation of a protein database, which should be one fasta. The latest release of the nr database can be downloaded by running:

    wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz

Make sure you have enough room where you will be downloading it, and that your WiFi is good, otherwise, it may take a very long time!

Two things regarding this optional cross-validation: first, this step greatly increases the computational load and takes about 100 times longer to complete, compared to running the program without cross-validation. So what would have been a 5 minute analysis of a dozen genomes may take 10 hours, and if you are analyzing large metagenome assemblies, it may take several days to complete. However, the identification of the closest homolog in NCBI to your identified iron genes may be incredbily informative, escpially because our HMM library isn't perfect and false positives are a possibility (as they are with most annotation tools). Second, the part of the algorithm that is dedicated to the cross-validation step is largely untested. So by exercizing this optional parameter you are, in effect, acting as a beta tester for our program. So feel free to start issues on GitHub, or yell at me via email, if there are any snafus with the program or its output when the nr database is provided.