Data Analytics Pipeline for Historical Taxi Data

The industry today relies heavily on data analytics to make predictions. These predictions lead to successful business models that incentivise heavily from machine learning. Popular taxi services such as Uber and Lyft provide their users with a prediction of taxi fare before the customer is mapped to a driver. We try to provide a similar solution using the open dataset provided by the NYC Taxi and Limousine Commision (NYC-TLC). The intention is to process voluminous data in streams from NYC-TLC’s public data repository and perform parallel feature engineering and deploy a prediction engine on top of it.

In this project we implemented a data analytics pipeline to process over 100 million records of NYC-TLC historical data from a public S3 repository and predicted taxi fares. We contributed to parallel data preprocessing on AWS EMR using PySpark and Pandas and added machine learning models on top of it. Also implemented a Flask web application as an interface for users to query (serving layer) the trained models.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
doc		doc
models		models
notebook		notebook
preprocessor		preprocessor
ui		ui
.gitignore		.gitignore
Data_Analytics_Pipeline_For_Historical_Taxi_Data.pdf		Data_Analytics_Pipeline_For_Historical_Taxi_Data.pdf
Data_Preprocessing.ipynb		Data_Preprocessing.ipynb
Model_Fitting-Final.ipynb		Model_Fitting-Final.ipynb
Model_Fitting.ipynb		Model_Fitting.ipynb
README.md		README.md
architecture.jpg		architecture.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analytics Pipeline for Historical Taxi Data

About

Releases

Packages

Contributors 2

Languages

parthnagori/Data-Analytics-Pipeline-For-Historical-Taxi-Data

Folders and files

Latest commit

History

Repository files navigation

Data Analytics Pipeline for Historical Taxi Data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages