Skip to content

Latest commit

 

History

History
20 lines (17 loc) · 1.17 KB

README.md

File metadata and controls

20 lines (17 loc) · 1.17 KB

Big Data Processing Pipeline

Program was implemented using Python, Twitter API, Kafka, MongoDB, and Tableau. Refer the report for further implementation details:
View Report

Architecture

Overview:

  • Twitter API is leveraged to obtain information to be processed
  • Kafka takes the data and connects the various other components of this pipeline
  • MongoDB stores the obtained tweets for later analysis
  • Tableau creates meaningful visualizations


Results:

Upon examining the visualizations we see a relative concentration of tweets containing the COVID hashtag in the Americas, Europe, and Southern Asia, this seems to line up with expectations of areas that both have a high adoption of twitter and many Covid-19 cases. Further work needs to be done to validate this conclusion though.