-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HF pipeline based inference #111
Comments
Hi @kamalkraj, Thank you for your proposition. In token classif and QA there is a mechanic to transform scores output by the model into something a bit more actionable (extract spans, etc.). Kind regards,
|
If the model is directly outputting scores, the client who uses this model also needs to maintain index-2-label mapping. Thanks |
Currently, this lib only supports single sentence classification. We can also add support for models trained on data like https://huggingface.co/datasets/snli |
Hi @kamalkraj, to keep you updated we are thinking into writing our own CUDA kernels and run them on Pytorch directly (without any ONNX / TRT in between) and hope to reach decent performances (at least close to ONNX Runtime ones). If this works (which is not guaranteed at all), we would not need anymore to convert stuff from one framework to another. Btw, what do you think of such approach (if this works and is totally transparent to the final user, like |
By own CUDA kernels, do you mean something like deepspeed? |
Yes but much simpler to use (even user if he wants to should be able to compose their own fused kernel without knowing cuda) and if possible less monolithic (not layer wide). Also more PyTorch vanilla (basically some fused kernels and replace original code by FX). |
Okay. |
What do you mean by direct integration? |
Thank you for the clarification. |
Hi @pommedeterresautee,
Please let me know your thoughts on converting the text classification also to HF pipeline, similar to Token classification and QA pipeline. I can work on this feature.
Thanks
The text was updated successfully, but these errors were encountered: