In this module we'll learn how to deploy a model in production. We'll learn how to create a REST API that will serve our model predictions. We'll also learn how to dockerize the API. Finally we'll learn how to use locust to run some load tests.
We want to have a REST API running that can predict trip duration for the NYC Taxi given the pickup location ID, the drop off location ID and the number of passengers.
Once the infra is running (make prepare-mlops-crashcourse
and make launch-mlops-crashcourse
)
-
Go inside jupyter container (starting in
/app
directory)- either by using VSCODE's
Remote Explorer
extension (you can download it here) andattaching to running container
- or simply by going to the jupyter lab on
http://localhost:10000
on your browser
- either by using VSCODE's
-
Go to
lessons/05-model-deployment
:cd lessons/05-model-deployment/
-
Initialize the course:
make init_course_model_deployment
Details on the init
The init will do the following:- install the dependencies for this lesson
- pull the data from internet
- build a local model that is saved in `web_service/local_models/`
- copy this model to the shared volume
- push a model to the running MLFlow server and register it as production
-
Go to the
web_service
directory where there will be a local model already pushed toweb_service/local_models/
cd web_service
-
You can run the api locally by running the following command:
uvicorn main:app --reload --host 0.0.0.0 --port 8000
Doing so you'll have the api running on
http://localhost:8000
since this port is forwarded to the host's port 8000 you can interact with the api from your host machine.-
You can go to
http://localhost:8000/docs
to see the documentation of the api (where you can also test the api) -
You can also use
curl
to test the api's health :curl http://localhost:8000
-
You can send requests to the running api using python (example inside container opening another terminal):
cd /app/lessons/05-model-deployment/ && python bin/post_payload.py
-
You can also send a request to the api using
curl
:curl -X POST 'http://localhost:8000/predict'\ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "PULocationID": 264, "DOLocationID": 264, "passenger_count": 2}'
- (optional) create payload file creating a file
test_payload.json
with the following content:
{"PULocationID": 264, "DOLocationID": 264, "passenger_count": 2}
- or using
cat
:
cat << EOF > test_payload.json { "PULocationID": 264, "DOLocationID": 264, "passenger_count": 2 } EOF
- then you can send request using the payload file:
curl -X POST \ 'http://localhost:8000/predict' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d @test_payload.json
- (optional) create payload file creating a file
-
-
Now that you understood how to create an api locally, you can create a docker image for the api. Since we're working with code that is inside the running container, you you'll to run the following commands from the host machine (not inside the container):
-
Build the docker image (you should be inside the
05-model-deployment
directory):- first you must move the model
pipeline__v0.0.1.joblib
that was copied to the shared volume during the init of the course frominfra/mlflow_server/local/
toweb_service/local_models/
:mv ../../infra/mlflow_server/local/pipeline__v0.0.1.joblib web_service/local_models/
- then you can build the docker image:
docker build -t prediction_server -f Dockerfile.app .
- first you must move the model
-
Run the docker image:
docker run -itd --rm --name prediction_server -p 8001:8001 --network mlops-crashcourse-supinfo prediction_server
-
Check if the container is running:
docker ps
-
Check if the api is running:
- go to
http://localhost:8001/docs
to see the documentation of the api (where you can also test the api)
- go to
-
You can send requests to the running dockerized api using python (example inside container opening another terminal):
-
change the host in
/app/lessons/05-model-deployment/bin/post_payload.py
tohttp://prediction_server:8001
-
then just like previously you can run:
cd /app/lessons/05-model-deployment/ && python bin/post_payload.py
-
-
you can stop the prediction server from the host machine:
docker stop prediction_server
-
you can remove the prediction server image from the host machine:
docker image rm prediction_server
-
-
Locust inside the jupyter container (you should be inside the
05-model-deployment
directory):-
Run locust targeting the api running:
locust -f ./locust/locustfile.py --host=http://localhost:8000
-
Go to
http://localhost:8089
on your browser to see the locust dashboard.- You can define the target number of users (peak concurrency), the spawn rate (users started/second) and how long the test should run (you might need to expend the Advanced options item).
-
Start the swarm
-
You can see the results of the test on the dashboard. And save the results to a file by clicking on the Download data tab and Download Report.
-
Once you're done with this simple version, your turn to build a more complex version of the api that pulls the model from the running MLFlow server.
Note: You can find the solution to this lab in the
solution.zip
.