Data collected from the real-world is mostly dirty and messy, which is why it’s important to acquire a number of skills of handling and cleaning such data.
This project was conducted to wrangle and analyze a dataset from the Twitter account @dog_rates, also known as WeRateDogs. The project was completed as part of Udacity's Data Analyst Nanodegree program.
We have followed the wrangling process of gathering, assessing, and cleaning data. We have gathered three datasets using three different methods. Then we assessed the data and identified 9 quality and 2 tidiness issues. Finally, we have cleaned the issues using the define, code, and test framework. After cleaning the issues, a master dataset was created. An analysis was conducted to uncover some insights from the data.
To complete this project, we have used Anaconda, Python and some of its packages and libraries (NumPy, Pandas, Matplotlib, Seaborn, Requests, Tweepy, and JSON), Jupyter Notebook, Sublime Text, and Microsoft Word.