This document is a work in progress to discuss the data plane requirements for machine learning inference.
Its aims to:
- Provide a set of schemas for machine learning inference including the predictor and associated components such as model explainers, outlier and skew detectors.
- Provide a set of proposals for components to advertise the schemas they support.
There are various components that are useful for machine learning inference, these include
- The core predictor
- Model explainer
- Outlier detector
- Concept drift (skew) detector
Schemas will be needed for each. The aim is to provide a set of schemas for the core predictive model along with associated schemas for the most common tasks in helping data scientists, users and devops teams to monitor, understand and manage the lifecyle of the running model.
Data planes for request/response to machine learning models are well defined in the ecosystem. Existing examples are:
At present, we don't define a new data plane for model input/output but allow models to publish the input/output schema they respect. In future, we may provide an additional standard data plane schema for kfserving, independent of the backing model runtimes.
TODO
TODO
TODO
There is an open question whether we define a combined schema to return the aggregation from the various components or we assume only the model response is returned and other components (model explanation etc) return their response asynchronously to some metrics/logging channel.
The control plane will allow switching off/on of components so a synchronous response could provide some subset of all data payloads, e.g. prediction, explanation. In proto buffers representation a combined payload could look like:
message KFServing {
KFPrediction prediction = 1;
KFExplanation explanation = 2;
KFOutlier outlier = 3;
KFSkew skew = 4;
}
- There should be a unified prediction id so responses from varied components can be tied together for monitoring and auditing.
It is unclear whether we should impose any other metadata.
Components should be able to advertise what schemas they respect to allow the control plane to do static validation. Static validation will be important if we allow pipelines of components in future.
Knative has a proposal in the context of Knative eventing.