HF pipeline based inference #111

kamalkraj · 2022-07-05T19:45:05Z

Please let me know your thoughts on converting the text classification also to HF pipeline, similar to Token classification and QA pipeline. I can work on this feature.

Thanks

pommedeterresautee · 2022-07-05T20:27:23Z

Hi @kamalkraj,

Thank you for your proposition.

In token classif and QA there is a mechanic to transform scores output by the model into something a bit more actionable (extract spans, etc.).
It seems to me that classification is a bit simpler, we can just reuse the scores directly.
What do you think would be the reason for switching to a pipeline based model?

Kind regards,
Michaël

Nb: I probably miss something obvious as I no XP with pipelines

kamalkraj · 2022-07-05T20:54:32Z

Hi @pommedeterresautee,

If the model is directly outputting scores, the client who uses this model also needs to maintain index-2-label mapping.
Whenever we change a model server, we also need to ensure the label-2-index mapping between client and server is in sync.
If the server/model output is an exact label with a score, it will be much easier to integrate and less error-prone.

Thanks

kamalkraj · 2022-07-05T20:58:23Z

Currently, this lib only supports single sentence classification. We can also add support for models trained on data like https://huggingface.co/datasets/snli

pommedeterresautee · 2022-07-18T06:35:57Z

Hi @kamalkraj, to keep you updated we are thinking into writing our own CUDA kernels and run them on Pytorch directly (without any ONNX / TRT in between) and hope to reach decent performances (at least close to ONNX Runtime ones). If this works (which is not guaranteed at all), we would not need anymore to convert stuff from one framework to another.

Btw, what do you think of such approach (if this works and is totally transparent to the final user, like pip install XXX and then optimize(model))? Would it be an issue for your use cases to not have an ONNX/TRT plan artefact? How do you balance ease of you use and perf?

kamalkraj · 2022-07-18T10:31:15Z

Hi @pommedeterresautee,

By own CUDA kernels, do you mean something like deepspeed?

pommedeterresautee · 2022-07-18T10:47:19Z

Yes but much simpler to use (even user if he wants to should be able to compose their own fused kernel without knowing cuda) and if possible less monolithic (not layer wide). Also more PyTorch vanilla (basically some fused kernels and replace original code by FX).
Still ... same spirit than deepspeed inference and torchDynamo -> stay in python during the put in prod step.

kamalkraj · 2022-07-18T12:27:37Z

Okay.
One more question, after optimization, how the model is integrated? Using Triton Inference Server or Direct integration to the program?

pommedeterresautee · 2022-07-18T12:55:22Z

What do you mean by direct integration?
If inference, whatever the user wants, may be Triton through BLS, torchserve or Ray server (never tested).
The point is to be as light and invisible as possible.

kamalkraj · 2022-07-19T06:28:25Z

Thank you for the clarification.
But I don't know why would somebody write a custom Cuda kernel to achieve a similar performance to the model exported by torch.onnx.export. I know onnx export has limitations, but most of the time, it works fine.
Do you have any specific model or use case?

pommedeterresautee added the enhancement New feature or request label Jul 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HF pipeline based inference #111

HF pipeline based inference #111

kamalkraj commented Jul 5, 2022

pommedeterresautee commented Jul 5, 2022

kamalkraj commented Jul 5, 2022

kamalkraj commented Jul 5, 2022 •

edited

Loading

pommedeterresautee commented Jul 18, 2022

kamalkraj commented Jul 18, 2022

pommedeterresautee commented Jul 18, 2022 •

edited

Loading

kamalkraj commented Jul 18, 2022

pommedeterresautee commented Jul 18, 2022

kamalkraj commented Jul 19, 2022

HF pipeline based inference #111

HF pipeline based inference #111

Comments

kamalkraj commented Jul 5, 2022

pommedeterresautee commented Jul 5, 2022

kamalkraj commented Jul 5, 2022

kamalkraj commented Jul 5, 2022 • edited Loading

pommedeterresautee commented Jul 18, 2022

kamalkraj commented Jul 18, 2022

pommedeterresautee commented Jul 18, 2022 • edited Loading

kamalkraj commented Jul 18, 2022

pommedeterresautee commented Jul 18, 2022

kamalkraj commented Jul 19, 2022

kamalkraj commented Jul 5, 2022 •

edited

Loading

pommedeterresautee commented Jul 18, 2022 •

edited

Loading