An evolutionary biology analysis suite bridging sequence data to phylogenetic trees.
Bridge is a genetic analysis toolkit built using Python. It currently has support for:
- Acquiring biological sequences from different databases
- Running BLAST searches
- Filtering downloaded sequences results based off taxonomy
- Filtering BLAST results based off taxonomy
- Aligning sequences
- Generating phylogenetic trees
Currently, Bridge can search GenBank and Ensembl for biological sequences. From these sources, one can download:
- Gene sequences
- Transcript sequences
- Protein sequences
- Clone the repository
git clone https://github.com/ashenafee/Bridge.git
- Navigate to the repository
cd Bridge
- Make the setup script executable
chmod +x ./setup
- Run the setup script
./setup
- Activate the virtual environment
source "venv/bin/activate"
- Launch the program
python bridge.py --help
To download sequences, specify which database you'd like to download from and provide the species name(s) as well as sequence(s) you'd like to download.
python bridge.py -gb -g GENE -s SPECIES -o OUTPUT
For example, say we want to download the following genes for the species Homo sapiens:
- TRPA1
- RHO
- TP53
We can do so by running the following command:
python bridge.py -gb -g "TRPA1,RHO,TP53" -s "Homo sapiens" -o "sequences.fasta"
python bridge.py -es -g GENE -s SPECIES -o OUTPUT
For example, say we want to download the following genes for the species Homo sapiens:
- TRPA1
- RHO
- TP53
We can do so by running the following command:
python bridge.py -es -g "TRPA1,RHO,TP53" -s "Homo sapiens" -o "sequences.fasta"
To run BLAST searches, specify the file containing the sequences you'd like to search against.
python bridge.py -b -f FILE -o OUTPUT
For example, say we have a file named sequences.fasta
and want to run BLAST on it. Then we can do so by running the following command:
python bridge.py -b -f "sequences.fasta" -o "blast_results.txt"
python bridge.py -b -f FILE -bp PARAMETERS -o OUTPUT
For example, say we have a file named sequences.fasta
and want to run BLAST on it with the following parameters:
- evalue = 0.001
- word_size = 11
- gapopen = 11
- max_target_seqs = 5
We can do so by running the following command:
python bridge.py -b -f "sequences.fasta" -bp "evalue=0.001,word_size=11,gapopen=11,max_target_seqs=5" -o "blast_results.txt"
python bridge.py -b -g GENE -s SPECIES -o OUTPUT
For example, say we want to run BLAST using RHO in the species Homo sapiens. We can do so by running the following command:
python bridge.py -b -g "RHO" -s "Homo sapiens" -o "blast_results.txt"
To filter results, specify the file containing the results you'd like to filter and the taxonomy you'd like to filter by.
python bridge.py -ft TAXONOMY -f FILE -o OUTPUT -bf
For example, say we have a file named blast_results.txt
and want to filter it by the taxonomy Mammalia. Then we can do so by running the following command:
python bridge.py -ft "Mammalia" -f "blast_results.txt" -o "filtered.txt" -bf
python bridge.py -ft TAXONOMY -f SUMMARY_FILE -o OUTPUT
For example, say we have a file named summary.txt
and want to filter it by the taxonomy Mammalia. Then we can do so by running the following command:
python bridge.py -ft "Mammalia" -f "summary.txt" -o "filtered.txt"
Note: This only works for sequences downloaded from GenBank.
To align sequences, specify the file containing the sequences you'd like to align.
python bridge.py -a "muscle" -t RANK -f FILE -o OUTPUT
For example, say we have a file named sequences.fasta
and want to align it using MUSCLE. Then we can do so by running the following command:
python bridge.py -a "muscle" -t "order" -f "sequences.fasta" -o "aligned.fasta"
This will create an alignment file named aligned.fasta
, which is then used to generate a tree. The distribution of sequences in the tree will be displayed by the specified rank (i.e., 20% of species in the alignment are Primates).
When running the setup script, I get virtualenv command not found
. How do I fix this?
It's likely that virtualenv
is not bundled with your installation of Python. Run the command below and then re-run the setup script to fix the issue:
pip install virtualenv
- Cannot search GenBank for large (~500) amounts of sequences at once
- Cannot accurately specify how many BLAST results to return
- Installing MUSCLE dynamically is difficult on macOS due to chip architecture differences
- Add filtering capabilities for BLAST results
- Allow the user to search for a gene and species to run a BLAST search on
- Add support for aligning sequences
- Add support for phylogenetic tree generation
- Add automatic BLAST download if not found in PATH
- Create a basic GUI to make the program more user-friendly
Contributions are welcome! If you run into a new issue, please create a new issue on the issues page. If you'd like to contribute to the project, please fork the repository and submit a pull request.
- Ashenafee Mandefro - Whole project - ashenafee
This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.