Skip to content

Latest commit

 

History

History
 
 

01-intro

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Intro and Use case Reminder

The project is New York City Taxi trip duration prediction.
The goal is to use the available data in order to train a simple machine learning model to predict the trip duration based on some input that can be available in production environment.

An ultimate goal for this use case can be to predict in real time trips durations (google-maps/waze like itinerary) but for simplicity, in this module, we assume that we need batch prediction. The data for which we need predictions will be stored in a file for ingestion in the trained model.

The machine learning phase is mainly constituted by the following steps :

  • data processing
  • model training
  • model evaluation
  • prediction

The data to use for this module can be downloaded from the TLC Trip Record Data page. To complete this module, you will need 03 samples of data :

Disclaimer : The volumes of data used in this module are not at all significant to have efficient models and interpretable performances. Here we use data volumes that fit locally and allow pipelines building and fast execution but we don't focus on model performance and interpretability because it is not the main focus of this course.

Data location : Please create a "00-data" folder in the course root directory and put the downloaded files inside.
If names are different, please rename your files to "yellow_tripdata_2021-01.parquet" (2021-02 / 2021-03)

Notebook execution

A notebook implementing the machine learning steps to predict Taxi trip duration can be found in the course' GitHub repository in the introduction course.

Since the main focus of the course is not Machine Learning itself, let's just run the notebook in your local jupyter container.

  1. First, let's create our jupyter lab image and network by running
make prepare-mlops-crashcourse
  1. Then, let's create our local jupyter lab container by running
make launch-mlops-crashcourse

You will need to pass the token 'MLOPS' to your jupyter lab UI

  1. Finally, go to lessons/01-intro/practice-intro-supinfo.ipynb and try running and understanding the modelization implementation