A lyrics scraper and classifer with a CLI.
Report Bug
·
Request Feature
In this project, I've built a lyrics scraper that automatically scrapes all song texts of a given artist from his lyrics.com artistpage. The data is then written in a .json file. The .json's can then be used to create a Multinominal Naive Bayes model using spaCy's industry-strength natural language processor. The user can then input a string of text of his choice and the model will predict, which artist would have most likely sung that line, even if the artist never did.
All modules have their own command-line interface for easy use. All credit for the song texts belong to lyrics.com for their amazing work. Please follow common sense while scraping and don't DDOS them.
To get a local copy up and running follow these simple steps.
I'd advice you to create an own virtual environment for this project. I'm using Anaconda.
- Clone the repo
git clone https://github.com/dariustorabian/lyrics-classifier.git
- Install dependencies with the requirements.txt
conda create --name <NameOfEnvironment> --file requirements.txt
- Run lyrics_scraper.py in the command line and with an artistpage URL from lyrics.com and a filename as arguments. For help, run
python lyrics_scraper.py -h
. The lyrics of this artist will then be scraped and saved under/data/FILENAME.json
. Duplicates will be skipped automatically. Repeat this step for as many artists as you'd like to use.
- Run model_creater.py in the command line. You will be asked to input the
.json
files containing the song texts that were scraped in the previous step and corresponding artist names. Then the Multinominal Naive Bayes model will be created and saved locally.
- Run lyrics_classifier.py in the command line. It will automatically load the model created in the previous step. You will be asked to input a string of text and get predictions on which artist of the ones in your model most likely sung that line. Feel free to use my
model.p
which is trained onThe Kooks, Mumford & Sons and Eminem
.
Currently, there are no new features in planning. This could change though, so feel free to check back again.
You can also always take a look at the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
Darius Torabian
- Feel free to contact me via mail.
- Here's my linkedin profile.
- My twitter-handle is:@darius_torabian.
- This is my website.
Project Link: https://github.com/dariustorabian/lyrics-classifier