Create conda environment and install packages:
conda create -n genomedl
conda activate genomedl
mamba install ncbi-genome-download ispcr
Bash files were ran on UGE using qsub command:
- Download all 769 refseq files, make filenames readable, and ungzip.
qsub fetch_genomes.sh
- Create new folder in same directory as the from_NCBI directory.
mkdir primers
- Create file primers.tab with primer sequences and place into primers directory. Make sure spaces are tabs.
echo "LeptoPrimer GAGTAACACGTGGGTAATCTTCCT TTTACCCCACCAACTAGCTAATC" > primers.tab
- Find primers from genomes.
qsub find_primers.sh
-
Concatenate forward and reverse primers by using concat_files function in python script.
-
Run mafft on concatenated files.
qsub mafft.sh
- Analyze genome and save to file:
python3 analyze_lepto_genome.py
- Check lepto_in_silico.xlsx for finished analysis.
The analyze_lepto_genome.py script was written where speed was key. Time was taken afterwards to add comments, show some data, and fix a few errors in the analyze_lepto_genomes.ipynb.
The word 'qsub' is used to send scripts to the HPC (UGE). This can be substituted with 'bash' to run locally.