Big Data Processing Pipeline

Program was implemented using Python, Twitter API, Kafka, MongoDB, and Tableau. Refer the report for further implementation details:
View Report

Architecture

Overview:

Twitter API is leveraged to obtain information to be processed
Kafka takes the data and connects the various other components of this pipeline
MongoDB stores the obtained tweets for later analysis
Tableau creates meaningful visualizations

Results:

Upon examining the visualizations we see a relative concentration of tweets containing the COVID hashtag in the Americas, Europe, and Southern Asia, this seems to line up with expectations of areas that both have a high adoption of twitter and many Covid-19 cases. Further work needs to be done to validate this conclusion though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Big Data Processing Pipeline

Architecture

Results:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Big Data Processing Pipeline

Architecture

Results: