In this module, we will walk through the steps required to deploy a Ray service that utilizes our previously fine-tuned model, using the pre-signed url generated before.
Objective: To provide the Ray service with the location from which it should fetch the serving script.
- Navigate to the directory containing Ray manifests:
cd ray_serve_manifests
- Update the
working_dir
with the previously generated Presigned URLand save it:
serveConfigV2: |
applications:
- name: falcon_finetuned_financial_data
import_path: serve_script.deployment_finetuned
route_prefix: /falcon_finetuned_financial
runtime_env:
working_dir: "PASTE YOUR PREVIOUSLY GENERATED PRESIGNED URL HERE"
Apply the updated YAML manifest to deploy the Ray service:
kubectl apply -f 00_ray_serve_fine_tuned.yaml
After applying the manifest, you can verify the status of the RayService, and explore its corresponding resources:
- Check the RayService:
kubectl get rayservices -n ray-svc-finetuned
- Check the Pods:
kubectl get pods -n ray-svc-finetuned
- Check the Node Provisioning:
kubectl get nodes -l provisioner=gpu-serve
- Check NVIDIA GPU Operator:
kubectl get pods -n gpu-operator
- Wait for Resources to Initialize:
kubectl get pods -n ray-svc-finetuned -w
It can take a while to initialize because of the dependencies (GPU Operator and pulling Ray Images)
To ensure that the service exposing our non-fine-tuned model is active and running, execute the following command:
kubectl get svc -n ray-svc-finetuned
Expected Output:
You should see an output similar to the following:
ray-svc-finetuned-head-svc ClusterIP 172.20.214.232 <none> 10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP 36m
ray-svc-finetuned-raycluster-kmfjg-head-svc ClusterIP 172.20.37.172 <none> 10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP 52m
ray-svc-finetuned-serve-svc ClusterIP 172.20.91.55 <none> 8000/TCP
The service named ray-svc-finetuned-serve-svc
should be visible and listen on port 8000/TCP
.
For the sake of this demonstration, we are not using a LoadBalancer or Ingress to expose the service. Instead, we'll use the port-forward
command to route traffic to the service.
Note: Open a new terminal for this step.
kubectl port-forward svc/ray-svc-finetuned-serve-svc 8000:8000 -n ray-svc-finetuned
We have prepared a sample chatbot application that will interact with our deployed model.
- Navigate to the sample application directory:
cd ../sample_app
- Install Python dependencies:
pip install -r requirements.txt
- Define the model endpoint as an environment variable:
export MODEL_ENDPOINT="/falcon_finetuned_financial"
With the environment variable set, you can now run the chatbot application.
python app.py
Open your web browser and navigate to http://127.0.0.1:7860
to start interacting with your deployed model.
Let's ask the same question that we asked for the non-finetuned model
Question: Why do I need an emergency fund if I already have investments?