Species identification from DNA sequences using Random Forest

Certain fragments of DNA (for example, mitochondrial, nuclear, and plastid sequences) have been defined as barcodes used as markers to identify and classify species. 🐝
These markers can be understood as sequences of four letters:

they vary between individuals of the same species and between species.

These sequences in some cases they may be encoded and therefore automatically translatable into amino acid sequence (alphabetic sequences of approximately 21 letters) that store structural information or physicochemical properties. 🧬

You can use Drosophila_test and Drosophila_train for custom training. It contains data on many types of fruit flies.

For Google Colab, these files must be on your drive in the google sheets format, not the XLSX one.

(only necessary if running on Google Colab):

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Drosophila_test.xlsx		Drosophila_test.xlsx
Drosophila_train.xlsx		Drosophila_train.xlsx
LICENSE		LICENSE
README.md		README.md
Species_identification_from_DNA_sequences_using_Random_Forest.ipynb		Species_identification_from_DNA_sequences_using_Random_Forest.ipynb

Provide feedback