Interactive analytics for Reddit | Real-Time data analysis.
The purpose of this project was to get metrics about Reddit posts in Real time using various technologies such as Angular, Apache Kafka, Spring KStream, Apache Spark, Spring Kafka Spring Webflux with Reactor.
It contains five main components:
-
client
which is the front end app using Angular. -
a web service (
Reddit-producer
) that calls the Reddit API and gets posts. it then sends it to a Kafka topic. Here is an example of a Rest call to the reddit API: -
two Consumers (
Kafka-Stream-Consumer
andSpark-Consumer
) that are basically stream processors. These get the data from Kafka as a Stream, and process it in Real-time, producing metrics and statistics that are put back in a Kafka Topicmetrics
to be consumed later. -
Spring-Kafka-Reactive-Backend
is a service that is connected to the reddit metrics topic and waits sends it to the frontend using a websocket.