TO DO LIST :
- [] COLLECT MORE TWEET FOR MORE ANALYSIS
- REFRACTOR PREPROCESSING STEP
- WORK ON TOPIC MODELING
- EMBEDED IT REAL TIME PROCESS WITH A DATABASE (check apache airflow)
- Learn about topic modeling and write the first draft of that blog post
- Refractor the twitter to collect geograhpical data about the blog
- [] refractor the blog post and put in a publishable state.
- find a way to run the streaming code at midnight and collect tweets for one date
- ask question on SO about how to retrieve data about the country
- other step for cleaning
- Remove one word or 2 words strings and charcteres
- Remove kinshasa
- [for each word in a topic plot the distribution]
- Read Aspitel project and check how she implemented it
- create the django or flask project and plan the deployment
- [] write unittest for project
- [] refractor the project and make it maintainable
- [] get geocode location for all cities in DRC
- fix the issue of retrieving tweets by date
- use apache airflow to run cron jobs and data retrieval task at a specific time in a day
- [] Deploy everything to DO
- [] Improve the processing by removing Congolese names from stematization
- Add a job that tweet the word count everyday after getting it
- [] Create a job that goes to every tweet and collect all the replies about it
- [] Get all the data for year 2020 and save it in a raw json file without cleaning
- [] Save the data to json file without cleaning
- [] https://dagshub.com/ investigate the usage of this
- Move the project from Airflow to Prefect or Any other workflow manager
- [] get all the tweets from my timeline
- [] Add a script to intialize the database migration
The whole country [11.94,-13.64,30.54,5.19]
I finally get time to touch on this project after around 6 month of it being down. It took me around a day to setup a new server and to get the project running again. The github action are back working but there have been a lot of learning since the last time I worked on this project.
I would like to improve it by adding new tools.
HEre is the next road map.
Before adding new features to the project I would like to replace Airflow with Perfect as workflow manager. Replace Docker with Kubernetes as container manager. Then add more feature and improve the modelling aspect of the project.