This repository contains a pipeline to annotate features of predicted genes from assembled genomes in fasta format and predicted genes or proteins in fasta and gff format.
Phaster
eggNOG-mapper hmmer/diamond
VFDB/Victor Database (virulence factors)
Pilercr
rgi
LipoP1.0
SignalP
DOOR (BLAST)
rfam
master script located in ./final/pipeline.sh
./pipeline.sh
-d diamond (runs all tools and eggNOG-mapper in diamond mode)
-d hmmer -t gproNOG (runs all tools and eggNOG-mapper in hmmer mode)
-f Enable PHASTER to screen for prophage regions (SLOW MODE)
-p Path to query genomes (Required for PHASTER)
-r Path to ncRNA .gff files
example usage:
(move to final directory first)
./pipeline.sh -d diamond -f -p /Path/to/query/genomes/ -r /Path/to/ncRNA/gff_files
Inputs: Automated. Takes in Nucleotide sequences and Translated proteins files from Gene Prediction and genome files from Genome Assembly.
Output(s): GFF files for each tool (divided by SRR)
Merged GFF annotation file for all tools.
-scripts(additional set of scripts)
-
- phaster2gff.sh
-
- run_phaster.sh
-
- pilercr2gff.pl
-
- pilercr.sh
-
- run_operon.pl
-
- genbankonly.sh
-
- prokka.sh
-
- rgi2gff.py
-
- Clustering_graph.R
-
- changecol_operons.pl
-
- changecol_vf.pl
-
- converttogff_virfact.py
-
- converttogff_operons.py
-
- virulenceFactors.sh
- README.md
- final_results
- temp
- test
- final
-
- scripts(scripts to run tools. called upon by pipeline.sh)
-
-
- append_headers.pl
-
-
-
- run_operon.pl
-
-
-
- run_eggnog.sh
-
-
-
- sort_by_length.pl~
-
-
-
- reput.pl~
-
-
-
- reput.pl
-
-
-
- sort_by_length.pl
-
-
-
- virulencemapper.pl
-
-
-
- virulenceFactors.sh
-
-
-
- mapper_proteins.pl~
-
-
-
- mapper_proteins.pl
-
-
-
- run_eggnog.sh~
-
-
-
- run_rgi.py~
-
-
-
- rgi2gff.py
-
-
-
- cluster_meta.pl
-
-
-
- eggnog_to_gff.pl~
-
-
-
- eggnog_to_gff.pl
-
-
-
- converttogff_operons.py
-
-
-
- changecol_operons.pl
-
-
-
- database
-
-
-
-
- finalGenes.fas
-
-
-
-
-
- finalGenes.fas.nhr
-
-
-
-
-
- finalGenes.fas.nin
-
-
-
-
-
- finalGenes.fas.nsq
-
-
-
-
- run_rgi.py
-
-
-
- changecol_vf.pl
-
-
-
- converttogff_virfact.py
-
-
- clusters (created files upon start of pipeline.sh)
-
-
- assembled100_proteins_nucl.uc
-
-
-
- assembled100_nucl.fna
-
-
-
- assembled97_nucl.uc
-
-
-
- assembled97_nucl.fna
-
-
-
- assembled100_proteins.uc
-
-
-
- assembled100_proteins.faa
-
-
-
- assembled97_proteins.uc
-
-
-
- assembled97_proteins.faa
-
-
- pipeline.sh(master file. Run this to perform annotation)
-
- run_rgi2.py~
-
- run_rgi2.py
-
- blastp
-
- pipeline.sh~
-
- tools(dependencies for scripts)
-
-
- usearch
-
-
-
- rgi
-
-
-
- blastdb_aliastool
-
-
-
- blastdbcheck
-
-
-
- blastdbcmd
-
-
-
- blast_formatter
-
-
-
- blastn
-
-
-
- blastp
-
-
-
- blastx
-
-
-
- convert2blastmask
-
-
-
- deltablast
-
-
-
- dustmasker
-
-
-
- legacy_blast.pl
-
-
-
- makeblastdb
-
-
-
- makembindex
-
-
-
- makeprofiledb
-
-
-
- psiblast
-
-
-
- rpsblast
-
-
-
- rpstblastn
-
-
-
- segmasker
-
-
-
- tblastn
-
-
-
- tblastx
-
-
-
- update_blastdb.pl
-
-
-
- windowmasker
-
-
- tmp
-
-
- placeholderfile.txt
-
-
- tool_gff(output for each individual tool post-mapping)
-
-
- rgi
-
-
-
- phaster
-
-
-
- VFDB
-
-
-
- DOOR
-
-
-
- rfam
-
-
-
- Pilercr
-
-
-
- lipoP
-
-
-
- eggNOG
-
biopython
diamond (already installed in eggNOG. However, ensure that diamond isn't being called from any place other than located in the bin folder of eggnog-mapper)