-
Notifications
You must be signed in to change notification settings - Fork 12
Installation
This program can be installed simply by cloning the master git directory, and then running ./setup.sh
. Alternatively, if Conda is not installed on your system, you can go with the manual installation, which is only a few extra commands on the part of the user. However, if you got with the setup_noconda.sh script, please make sure to have all the dependencies installed prior to running the program.
Download FeGenie and enter it's directory:
git clone https://github.com/Arkadiy-Garber/FeGenie.git
cd FeGenie
Run the setup.sh script and activate the FeGenie-conda environment:
./setup.sh
source activate fegenie
Test run:
./test_run.sh
View options:
FeGenie.py -h
- Python (version 3.6 or higher)
- Diamond (version 0.9.22.123) (only necessary if you are doing the cross-validation against a reference database)
- BLAST (version 2.7.1+)
- HMMER (version 3.2.1)
- Prodigal (version 2.6.3)
- R (version 3.5.1)
- Rscript
Download FeGenie and enter its directory:
git clone https://github.com/Arkadiy-Garber/FeGenie.git
cd FeGenie
Put the FeGenie master directory into your path:
export PATH=$PATH:$(pwd)
Run the easy-start setup script:
./setup_noconda.sh
Test run:
./test_run.sh
View options:
FeGenie.py -h
Copy and paste the following commands into your terminal window (these commands may require super user permissions to run some of these commands, so if you are running FeGenie off a server where you do not have such permissions, Fegenie may not be able to generate R plot automatically for you)
Rscript -e 'install.packages("ggplot2", repos = "http://cran.us.r-project.org")'
Rscript -e 'install.packages("reshape", repos = "http://cran.us.r-project.org")'
Rscript -e 'install.packages("reshape2", repos = "http://cran.us.r-project.org")'
Rscript -e 'install.packages("tidyverse", repos = "http://cran.us.r-project.org")'
Rscript -e 'install.packages("argparse", repos = "http://cran.us.r-project.org")'
Rscript -e 'install.packages("ggdendro", repos = "http://cran.us.r-project.org")'
Rscript -e 'install.packages("ggpubr", repos = "http://cran.us.r-project.org")'
Rscript -e 'install.packages("grid", repos = "http://cran.us.r-project.org")'
Rscript -e 'install.packages("pvclust", repos = "http://cran.us.r-project.org")'
Part of this program includes an optional cross-validation of the identified putative iron genes against a protein database of your choice. We recommend validating against NCBI's nr database, as that is one of, if not the, largest sequence repository; it, thus, offers the highest chance of providing a homolog. This option is exercized by simply providing the script with the lcation of a protein database, which should be one fasta. The latest release of the nr database can be downloaded by running:
wget ftp://ftp.ncbi.nih.gov/blast/db/FASTA/nr.gz
Make sure you have enough room where you will be downloading it, and that your WiFi is good, otherwise, it may take a very long time!
Two things regarding this optional cross-validation: first, this step greatly increases the computational load and takes about 100 times longer to complete, compared to running the program without cross-validation. So what would have been a 5 minute analysis of a dozen genomes may take 10 hours, and if you are analyzing large metagenome assemblies, it may take several days to complete. However, the identification of the closest homolog in NCBI to your identified iron genes may be incredbily informative, escpially because our HMM library isn't perfect and false positives are a possibility (as they are with most annotation tools). Second, the part of the algorithm that is dedicated to the cross-validation step is largely untested. So by exercizing this optional parameter you are, in effect, acting as a beta tester for our program. So feel free to start issues on GitHub, or yell at me via email, if there are any snafus with the program or its output when the nr database is provided.