List of scripts in our GitHub repo.
- door2.pl -- script used to run door2
- extractSequences.py -- a script which extracts sequences from a fasta file based on a headers given in second file
- functionalAnnotationPipeline.sh -- the final pipeline
- getFastaHeaders.sh -- script which extracts all headers from a fasta file and stores in a new file
- outputParser.py -- script which parses the output of all tools, uClust and the original GFFs to create new GFFs with annotations
- parseUclustOutput.py -- script which reads in the .uc file generated from uClust and creates an index file and a sizes file
- pilerCr.sh -- script used to run pilerCR (not included in final pipeline)
- reformatFasta.py -- script which changes the gene names in the fasta file, reformats the file so that all sequences are in 1 line and also appends the SRR ID in front of the gene name.
- reformatGff.py -- script which changes column 1 of the GFF to the gene name and also appends the SRR ID in front of the gene name.
Other files in our GitHub repo are
- kleb_all.opr
- kleb_gid.txt
- kop_final.table
- protein_fasta_protein_homolog_model.fasta
- VFDB_setB_nt.fas
These are the default databases used for door2, CARD and VFDB.