- Loukia Pavlana (el18711)
- Andreas Hadjisavvas (el18701)
This project was developed as part of the course Advanced Topics in Databases. It involved processing large-scale datasets of New York City taxi trips using Apache Spark and HDFS. The main goal was to perform complex queries and data transformations to analyze various aspects of the ride data.
- Analyzed ride data to identify top tip routes, peak hours, and fare patterns.
- Used Apache Spark's DataFrame/SQL API and RDD API for data transformations and query execution.
- Leveraged distributed computing for efficient processing of large datasets.
- Implemented query optimizations for performance improvements.
- Apache Spark
- HDFS
- Python
- Course: Προχωρημένα Θέματα Βάσεων Δεδομένων (Advanced Databases)
- Institution: National Technical University of Athens