-
Notifications
You must be signed in to change notification settings - Fork 125
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add documentation for modelcars (#337)
* Add documentation for modelcars, introduced in 0.12 as experimental feature Signed-off-by: Roland Huß <[email protected]> * added some references to this feature Signed-off-by: Roland Huß <[email protected]> --------- Signed-off-by: Roland Huß <[email protected]>
- Loading branch information
Showing
2 changed files
with
201 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,200 @@ | ||
# Serving models with OCI images | ||
|
||
KServe's traditional approach for model initialization involves fetching models from sources like [S3 buckets](../s3/s3.md) or [URIs](../uri/uri.md). This process is adequate for small models but becomes a bottleneck for larger ones like used for large language models, significantly delaying startup times in auto-scaling scenarios. | ||
|
||
"Modelcars" is a KServe feature designed to address these challenges. It streamlines model fetching using OCI images, offering several advantages: | ||
|
||
- **Reduced Startup Times:** By avoiding repetitive downloads of large models, startup delays are significantly minimized. | ||
- **Lower Disk Space Usage:** The feature decreases the need for duplicated local storage, conserving disk space. | ||
- **Enhanced Performance:** Modelcars allows for advanced techniques like pre-fetching images and lazy-loading, improving efficiency. | ||
- **Compatibility and Integration:** It seamlessly integrates with existing KServe infrastructure, ensuring ease of adoption. | ||
|
||
Modelcars represents a step forward in efficient model serving, particularly beneficial for handling large models and dynamic serving environments. | ||
|
||
## Enabling Modelcars | ||
|
||
Modelcars is an experimental feature in KServe and is not enabled by default. | ||
To take advantage of this new model serving method, it needs to be activated in the KServe configuration. | ||
Follow the steps below to enable Modelcars in your environment. | ||
|
||
!!! note | ||
Modelcars are currently in an experimental phase. Enable this feature in a test environment first to ensure it meets your requirements before using it in a production setting. | ||
|
||
Modelcars can be enabled by modifying the `storageInitializer` configuration in the `inferenceservice-config` ConfigMap. | ||
This can be done manually using `kubectl edit` or by executing the script provided below, with the current namespace set to the namespace where the `kserve-controller-manager` is installed (depends on the way how KServer is installed.) | ||
|
||
```bash | ||
# Script to enable Modelcars | ||
# Fetch the current storageInitializer configuration | ||
config=$(kubectl get configmap inferenceservice-config -n kserve -o jsonpath='{.data.storageInitializer}') | ||
# Enable modelcars and set the UID for the containers to run (required for minikube) | ||
newValue=$(echo $config | jq -c '. + {"enableModelcar": true, "uidModelcar": 1010}') | ||
|
||
# Create a temporary directory for the patch file | ||
tmpdir=$(mktemp -d) | ||
cat <<EOT > $tmpdir/patch.txt | ||
[{ | ||
"op": "replace", | ||
"path": "/data/storageInitializer", | ||
"value": '$newValue' | ||
}] | ||
EOT | ||
|
||
# Apply the patch to the ConfigMap | ||
kubectl patch configmap -n kserve inferenceservice-config --type=json --patch-file=$tmpdir/patch.txt | ||
|
||
# Restart the KServe controller to apply changes | ||
kubectl delete pod -n kserve -l control-plane=kserve-controller-manager | ||
``` | ||
|
||
## Prepare an OCI Image with Model Data | ||
|
||
To utilize Modelcars for serving models, you need to prepare an OCI (Open Container Initiative) image containing your model data. This process involves creating a Dockerfile and building an OCI image that houses your model in a specific directory. Below are the steps and an example to guide you through this process. | ||
|
||
1. **Create a Dockerfile:** | ||
Start by creating a Dockerfile that uses a base image containing the necessary commands like `ln` (for creating symbolic links) and `sleep` (for keeping the container running). The Dockerfile should also include steps to create a directory `/model` for your model data and copy the data into this directory. | ||
Here's an example Dockerfile where the `data/` directory contains your model data. This data will later be mounted in `/mnt/models` by the runtime: | ||
```dockerfile | ||
FROM busybox | ||
RUN mkdir /models && chmod 775 /models | ||
COPY data/ /models/ | ||
``` | ||
2. **Build and Push the Image to a Registry**: Once your Dockerfile is ready, use either docker or podman to build and push the image to a container registry like Docker Hub or quay.io | ||
```bash | ||
docker build -t myuser/mymodel:1.0 . | ||
docker push myuser/mymodel:1.0 | ||
``` | ||
|
||
By completing these steps, you'll have an OCI image ready with your model data, which can then be used with the Modelcars feature in KServe for efficient model serving. | ||
|
||
## Using Modelcars | ||
|
||
With Modelcars enabled and your OCI image containing the model data prepared, integrating this setup into your `InferenceService` is straightforward. The key step involves specifying the `storageURI` with the `oci://` schema in your `InferenceService` configuration to point to your OCI image. | ||
|
||
Here’s an example of how an `InferenceService` configuration would look when using the Modelcars feature: | ||
|
||
```yaml | ||
apiVersion: serving.kserve.io/v1beta1 | ||
kind: InferenceService | ||
metadata: | ||
name: my-infeference-service | ||
spec: | ||
predictor: | ||
model: | ||
modelFormat: | ||
name: sklearn | ||
storageUri: oci://myuser/mymodel:1.0 | ||
``` | ||
In order to fully leverage the local caching capabilities of OCI images in the Modelcars setup, it is crucial to use a specific tag for your model image, rather than relying on the default `latest` tag. | ||
For instance, in the provided example, the tag `1.0` is utilized. | ||
This approach ensures that the modelcar image is pulled with a `IfNotPresent` policy, allowing for efficient use of local cache. | ||
On the other hand, using the `latest` tag, or omitting a tag altogether, defaults to a `Always` pull policy. | ||
This means the image would be re-downloaded every time a Pod restarts or scales up, negating the benefits of local caching and potentially leading to increased startup times. | ||
|
||
## Example | ||
|
||
Let's see how modecars work by deploying the [getting started example](../../../../get_started/first_isvc/) by using an OCI image and check how it is different to the startup with a storage-initalizer init-container. | ||
|
||
Asuming you have setup a namespace `kserve-test` that is KServe enabled, create an `InferenceService` that uses an `oci://` storage URL: | ||
|
||
``` shell | ||
kubectl apply -n kserve-test -f - <<EOF | ||
apiVersion: "serving.kserve.io/v1beta1" | ||
kind: "InferenceService" | ||
metadata: | ||
name: "sklearn-iris-oci" | ||
spec: | ||
predictor: | ||
model: | ||
modelFormat: | ||
name: sklearn | ||
storageUri: "oci://rhuss/kserving-example-sklearn:1.0" | ||
EOF | ||
``` | ||
|
||
After the `InferenceService` has been deployed successfully, you can follow the [steps of the getting started example](../../../../get_started/first_isvc/) to verify the installation. | ||
|
||
Finally, let's have a brief look under the covers for how this feature works. | ||
Let's first check the runtime pod: | ||
|
||
``` shell | ||
kubectl get pods | ||
``` | ||
|
||
!!! success "Sample Output" | ||
|
||
```{ .shell .no-copy } | ||
NAME READY STATUS RESTARTS AGE | ||
sklearn-iris-oci-predictor-00001-deployment-58fc6564d7 3/3 Running 1 (39m ago) 40m | ||
``` | ||
|
||
As you can see, the Pod has now one additional container. This container is running the modelcar image and runs a `ln -s /proc/$$/root/models /mnt/` command to create a symbolic link on a shared empty volume that is mounted on `/mnt` in the modelcar container and the serving runtime container. The magic here is the symbolic link over proc filesystem, which is shared among all containers. This is possible on Kubernetes for the container's of a Pod if the field `.spec.shareProcessNamespace` is set to `true`, which is the case for all storageUri that leverages the `oci://` schema. | ||
|
||
Let's jump into the runtime container and examine the mounted `/mnt` filesystem: | ||
|
||
``` shell | ||
# InferenceService Pod | ||
pod=$(kubectl get pods -l serving.kserve.io/inferenceservice=sklearn-iris-oci -o name) | ||
# Verify that shareProcessNamespace is enabled | ||
kubectl get $pod -o jsonpath="{.spec.shareProcessNamespace}" | ||
# Jump into pod and check the model location | ||
kubectl exec -it $pod -c kserve-container -- bash | ||
``` | ||
|
||
!!! Success "Sample in-container session" | ||
```{ .shell .no-copy } | ||
sklearn-iris-oci-predictor:/$ cd /mnt | ||
sklearn-iris-oci-predictor:/mnt$ ls -l | ||
total 0 | ||
lrwxrwxrwx 1 1010 root 20 Jan 27 10:35 models -> /proc/38/root/models | ||
sklearn-iris-oci-predictor:/mnt$ cd /mnt/models | ||
sklearn-iris-oci-predictor:/mnt/models$ ls -l | ||
total 8 | ||
-rw-r--r-- 1 root root 5408 Jan 26 15:58 model.joblib | ||
``` | ||
|
||
As you can see, the runtime can directly access the data coming from the modelcar image, without prior copying it over in another volume. | ||
|
||
## Configuration | ||
|
||
Fine-tuning the behavior of Modelcars in KServe is possible through global configuration settings for inference services. These settings are located in the `inferenceservice-config` ConfigMap, which resides in the `kserve` namespace or the namespace where the KServe controller operates. This ConfigMap includes various subconfigurations, with the Modelcars configuration located under the `storageInitializer` entry. | ||
|
||
To view the current configuration, use the following command: | ||
|
||
```bash | ||
kubectl get cm -n kserve inferenceservice-config --jsonpath "{.data.storageInitializer}" | ||
``` | ||
|
||
!!! success "Sample Output" | ||
|
||
```{ .json .no-copy } | ||
{ | ||
"image": "kserve/storage-initializer:v0.11.2", | ||
"memoryRequest": "100Mi", | ||
"memoryLimit": "1Gi", | ||
"cpuRequest": "100m", | ||
"cpuLimit": "1", | ||
"enableDirectPvcVolumeMount": false, | ||
"enableModelcar": true, | ||
"uidModelcar": 1010 | ||
} | ||
``` | ||
|
||
The output is a JSON string representing the configuration. For Modelcars, several keys are available for customization: | ||
|
||
| Key | Description | Example | | ||
|------------------|-----------------------------------------------------------------------------------------------------|----------| | ||
| `enableModelcar` | Enables direct access to an OCI container image using a source URL with an "oci://" schema. | `true` | | ||
| `cpuModelcar` | CPU request and limit for the modelcar container. | `10m` | | ||
| `memoryModelcar` | Memory request and limit for the modelcar container. | `15Mi` | | ||
| `uidModelcar` | UID under which the modelcar process and the main container run. Set to `0` for root if needed. If not set, the UID of the containers is used. | `1042` | | ||
|
||
## References | ||
|
||
* [Modelcar Design document](https://docs.google.com/document/d/1Bs4fnP8rhPMaoPoLSYVvuRq-z9vkGPQ0rKbmfH4I7js/edit#heading=h.xw1gqgyqs5b) | ||
* [Original GitHub issue](https://github.com/kserve/kserve/issues/3043) (discusses also some alternative solutions) | ||
* [12-minute demo](https://www.youtube.com/watch?v=KzWH8v6CcR0) | ||
* [Code walkthrough](https://www.youtube.com/watch?v=axegGpQ6nHs) showing the implementation of Modelcars in KServe (for background information) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters