This project was a part of the coursework completed for Distributed Database Systems CSE 512. The first two phases included setting up a Hadoop File System and Spark Clusters and running extensive tests monitoring the Resource Utilization in all the clusters using the Ganglia Monitor.
The source code uploaded is from the third phase that dealt with performing HotSpot Analysis on our Distributed System. This problem was adapted from the ACM Sigspatial Cup 2016. However, our input included only the data from January.
A detailed report is also uploaded. For instructions to run and test the code please follow the installation_run_instructions file.
The following flowchart describes the algorithm used.
We were tasked with finding the top 50 hotspots based on analysis and calculation of a score.
Do not use unless you have obtained permission.
Copyright 2017
@authors
- Vraj Delhivala mailto: [email protected]
- Ankit Nadig mailto: [email protected]
- Anshuman Bora mailto: [email protected]
- Saumya Parikh mailto: [email protected]