Hip-Hop is my favorite music genre of all time, and Drake is an artist I've listened to for years. Many text mining analyses have been performed on rap lyrics, but I haven't seen enough works that actually dig deeper into one specific artist's discography (or in this case, Drake's discography). In this project, I will apply various Natural Language Process techniques to analyze Drake's lyrics.
As I have been doing a lot of practices on classical Machine Learning, or Deep Learning applied to images (Computer Vision), I don't have nearly as much experience working with text data. This project is my introduction to the world of Natural Language Processing, and text analysis in general.
In this project, I have practiced:
- Scraped 500 songs info with lyrics from genius.com
- Performed data wrangling and exploratory data analysis with Matplotlib and Seaborn
- Applied various NLP techniques: word embedding, bag-of-words, tokenization with NLTK, NER with SpaCy
- Topic modeling with LDA, dimensionality reduction with t-SNE, interactive topic visualization with pyLDAvis
Drake is one of the biggest, if not the biggest, rap artists in the world. Thanks to Drake's popularity, his work is very expansive and well-documented, with five official studio albums and six mixtapes. To obtain the lyrics of all Drake's songs, I scraped the data from Genius using the wonderful Genius API.
Despite having such easy-to-use API assisting the scraping process, cleaning the data was no easy task. Out of over 500 songs on Genius, there were around 300 tracks that are either duplicates, Live version, diss tracks, or the like, and they were all filtered from the dataset. The lyrics were not clean data either, as there were a lot of noise, redundant characters, and typos.
To see codes for the whole data preprocessing process, you can check this notebook. NBViewer Link.
First, let's load data from the csv file:
# load the data
data = pd.read_csv('lyrics.csv')
data.head(10)
name | album | year | lyrics | |
---|---|---|---|---|
0 | Right to Left | Born Successful | 2009 | blue green jewels with the supreme fuel and l... |
1 | Forever (Born Successful) | Born Successful | 2009 | it may not mean nothing to yall but understan... |
2 | The Winner | Born Successful | 2009 | i m performing tonight you know that shit gon... |
3 | I Do This | Born Successful | 2009 | uh shits all good the deal got signed and my ... |
4 | Fallen | Born Successful | 2009 | yeah its drake kc we was just walking just sm... |
5 | Do It Now | Born Successful | 2009 | uh yeah alright uh well alright yeah well alr... |
6 | The Search | Born Successful | 2009 | they say we killin em all all all all hip hop... |
7 | Juice | Born Successful | 2009 | boi 1da drizzy yall dont really like me i can... |
8 | Man of the Year | Comeback Season | 2007 | damn i done walked in here looking like the m... |
9 | Give Ya | Comeback Season | 2007 | check look and i aint tryna get to know nobod... |
data.tail(10)
name | album | year | lyrics | |
---|---|---|---|---|
204 | Pop Style | Views | 2016 | this sound like some forty three oh one shit ... |
205 | Grammys | Views | 2016 | yeah yeah yeah yeah jheeze yeah right look lo... |
206 | Redemption | Views | 2016 | yeah i get it i get it yeah why would i say a... |
207 | Too Good | Views | 2016 | oh yeah yeah yeah oh yeah yeah yeah yeah look... |
208 | Controlla | Views | 2016 | right my yiy just changed you just buzzed the... |
209 | Views [Trailer] | Views | 2016 | the 6 raptors diamond key new ride old ride ba... |
210 | Summers Over Interlude | Views | 2016 | ooh baby yeah days in the sun and nights in t... |
211 | Views | Views | 2016 | question is will i ever leave you the answer ... |
212 | With You | Views | 2016 | its about us right now girl where you going i... |
213 | One Dance | Views | 2016 | baby i like your style grips on your waist fr... |
Drake has a relatively huge, rich discography. According to this dataset, for 13 years of his music career, Drake has had 18 albums/mixtapes in total.
In this streaming era, artists tend to put as many tracks as possible in one album to boost their streaming numbers. This is true for Drake as well: his more recent albums (More Life, Scorpion, Views) all have higher numbers of songs comparing to his older ones with the exception of 2009, when he released three tapes in one year.
name | album | year | lyrics | tokens | Word Counts | Unique Word Counts | |
---|---|---|---|---|---|---|---|
106 | Make Things Right | Room for Improvement | 2006 | look if you a girl with the aspirations of be... | [look, if, you, a, girl, with, the, aspiration... | 456 | 223 |
109 | Intro (Room For Improvement) | Room for Improvement | 2006 | yo whats going on this is drake and ima let y... | [yo, whats, going, on, this, is, drake, and, i... | 184 | 92 |
108 | Try Harder | Room for Improvement | 2006 | sometimes i feel like lohan and hilary duff a... | [sometimes, i, feel, like, lohan, and, hilary,... | 395 | 198 |
107 | Thrill Is Gone | Room for Improvement | 2006 | loves lost loves gone love lost love is gone l... | [loves, lost, loves, gone, love, lost, love, i... | 579 | 260 |
105 | Video Girl | Room for Improvement | 2006 | uh yea get in my slick rick mode na mean im a... | [uh, yea, get, in, my, slick, rick, mode, na, ... | 782 | 338 |
104 | All This Love | Room for Improvement | 2006 | southern smoke this another one from your boy ... | [southern, smoke, this, another, one, from, yo... | 458 | 172 |
103 | Pianist Hands | Room for Improvement | 2006 | thank you ms graham for coming today you look ... | [thank, you, ms, graham, for, coming, today, y... | 161 | 101 |
110 | Drakes Voice Mail Box #2 | Room for Improvement | 2006 | what up this kim damn i ve been trying to get... | [what, up, this, kim, damn, i, ve, been, tryin... | 43 | 28 |
102 | Drakes Voice Mail Box #1 | Room for Improvement | 2006 | the man drake puts it the fuck down he s doin... | [the, man, drake, puts, it, the, fuck, down, h... | 162 | 85 |
100 | Do What You Do | Room for Improvement | 2006 | stance on lean leg up on the wall my niggas c... | [stance, on, lean, leg, up, on, the, wall, my,... | 829 | 262 |
From the graph above, we can see:
- "Take Care", Drake's 2011-2012 album that is widely recognized as his best albums, is actually one of his less lyrical ones.
- Lyrics wise, Drake was at his worst in 2016-2017, when he made a breakthrough commercially with the mega hit "Hotline Bling" that helped him top the charts from all around the world and become the biggest artist across all genres at that time. It's a dilemma for most Hip-Hop artists: you make catchy, commercial friendly songs that will top the charts and make you a lot of money/fame, in exchange for your artistry.
- However, Drake is making a comeback. Since 2018, both of his lyrical statistics has been skyrocketing, and they're going as strong as ever. As of now, Drake is currently at his peak lyrically and he shows no signs of stopping.
Named entity recognition (NER) is probably the first step towards information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.
In this section, I built a Named Entity Recognizer using SpaCy. Here's a snippet of how SpaCy works on a given text:
Here we go, the Machine Learning part of the project.
Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of topic model and is used to classify text in a document to a particular topic. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions.
In this section, first I put all song lyrics into a list. Then, using Scikit-Learn's CountVectorizer, I will create a bag-of-words corpus representing all the lyrics. Lastly, I will train an LDA model, fit it, and implement an interactive, web-based topic visualization via pyLDAvis. Sample picture of the pyLDAvis interactive plot: