Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HF pipeline based inference #111

Open
kamalkraj opened this issue Jul 5, 2022 · 9 comments
Open

HF pipeline based inference #111

kamalkraj opened this issue Jul 5, 2022 · 9 comments
Labels
enhancement New feature or request

Comments

@kamalkraj
Copy link
Contributor

Hi @pommedeterresautee,

Please let me know your thoughts on converting the text classification also to HF pipeline, similar to Token classification and QA pipeline. I can work on this feature.

Thanks

@pommedeterresautee
Copy link
Member

Hi @kamalkraj,

Thank you for your proposition.

In token classif and QA there is a mechanic to transform scores output by the model into something a bit more actionable (extract spans, etc.).
It seems to me that classification is a bit simpler, we can just reuse the scores directly.
What do you think would be the reason for switching to a pipeline based model?

Kind regards,
Michaël

Nb: I probably miss something obvious as I no XP with pipelines

@pommedeterresautee pommedeterresautee added the enhancement New feature or request label Jul 5, 2022
@kamalkraj
Copy link
Contributor Author

Hi @pommedeterresautee,

If the model is directly outputting scores, the client who uses this model also needs to maintain index-2-label mapping.
Whenever we change a model server, we also need to ensure the label-2-index mapping between client and server is in sync.
If the server/model output is an exact label with a score, it will be much easier to integrate and less error-prone.

Thanks

@kamalkraj
Copy link
Contributor Author

kamalkraj commented Jul 5, 2022

Currently, this lib only supports single sentence classification. We can also add support for models trained on data like https://huggingface.co/datasets/snli

@pommedeterresautee
Copy link
Member

Hi @kamalkraj, to keep you updated we are thinking into writing our own CUDA kernels and run them on Pytorch directly (without any ONNX / TRT in between) and hope to reach decent performances (at least close to ONNX Runtime ones). If this works (which is not guaranteed at all), we would not need anymore to convert stuff from one framework to another.

Btw, what do you think of such approach (if this works and is totally transparent to the final user, like pip install XXX and then optimize(model))? Would it be an issue for your use cases to not have an ONNX/TRT plan artefact? How do you balance ease of you use and perf?

@kamalkraj
Copy link
Contributor Author

Hi @pommedeterresautee,

By own CUDA kernels, do you mean something like deepspeed?

@pommedeterresautee
Copy link
Member

pommedeterresautee commented Jul 18, 2022

Yes but much simpler to use (even user if he wants to should be able to compose their own fused kernel without knowing cuda) and if possible less monolithic (not layer wide). Also more PyTorch vanilla (basically some fused kernels and replace original code by FX).
Still ... same spirit than deepspeed inference and torchDynamo -> stay in python during the put in prod step.

@kamalkraj
Copy link
Contributor Author

Okay.
One more question, after optimization, how the model is integrated? Using Triton Inference Server or Direct integration to the program?

@pommedeterresautee
Copy link
Member

What do you mean by direct integration?
If inference, whatever the user wants, may be Triton through BLS, torchserve or Ray server (never tested).
The point is to be as light and invisible as possible.

@kamalkraj
Copy link
Contributor Author

Thank you for the clarification.
But I don't know why would somebody write a custom Cuda kernel to achieve a similar performance to the model exported by torch.onnx.export. I know onnx export has limitations, but most of the time, it works fine.
Do you have any specific model or use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants