-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model warmup fails after adding Triton indexing kernels #2838
Comments
I have the same problem. I assume that you and others who reported the issues below are using the Docker image, and we have reverted to using Triton indexing kernels. Since the process involves compiling C files with calls to Python, but the Python headers are not available for C, we encounter an error. In simple terms, I assume that Python.h is not available when compiling these shared object files. After reviewing the Dockerfile, it appears that Python3.11-dev is not included in the final image, which is why Python.h is missing. Just guessing, my "sure" value is about 0.6 😁🤷🏼♂️ It seems to be the same issue as the following issues: |
#2835 not related... it's about gpu split from 2 to 4 H100, no any python stacktrace. But thanks @KreshLaDoge |
Update: I was able to get it working by changing the base image to devel to match the builder image I have to rebuild the image which takes time and increases the size of the image but now it works!! |
I don't know how to reproduce the issue Phi3.5 works perfectly under 3.0.1. I everyone here using podman ? I don't see why it should make any difference though.. |
can also every confirm they are using |
I can confirm the issue for me with 3.0.1, 3.0, and 2.4.0 |
I also had issues with 3.0.1 I suspect that it’s about missing Python.h, which would also explain why it worked for @YaserJaradeh when he changed the base image of Ubuntu to devel variant. But it can be something else. Currently, I’m forced to assing GPUs to container manually and not through the Nvidia container toolkit, so it might be related if others, experiencing the same issue, are using vGPUs for example 🤷 |
Can you elaborate ? It might be a potential culprit. |
Here is example of my docker compose and our way, how we assign GPUs - don't judge me, it has it’s own reasons why I can't use container toolkit 🤷 Anyway, I doubt that anyone else experiencing this issue has similar configuration. |
I get the same errors with the latest docker image. So far I tested Mixtral 8x7B and llama 3.3 and both had the same error. In short, this command
leads to the following error:
nvidia-smi:
I am wondering why there aren't more comments in this thread. Is there a workaround? |
@scriptator maybe you can try building the image that I have here #2848 and see if that works for you |
I can confirm that this change works for me. Thx a lot! |
@scriptator it is good to have a confirmation that this works! I'm still trying to figure out with the PR what is the best way to do without changing the base image to devel because it increases the size of the final image! but couldn't get it to work so far. |
Good point - the image I built with your pull request 2 hours ago is 20GB, which is quite a lot compared to the 12,8 GB of the official image. |
Minimal reproduction command: |
It's starting to look like a podman bug... I cannot reproduce with the minimal reproducer.... Which version of podman are you using ? ( |
I can't reproduce any of the issues even with podman on my end... What are the host configs? GPU, cuda version, every potential service, driver on the nodes etc.. ? |
Getting some output for this: #2848 (comment) would be helpful in understanding the issue! |
In our (= @scriptator's) case, the problem has disappeared. We cannot be sure of the cause, but maybe my notes are of help to someone: The server where the problem occurred was running RHEL 9.5, but with a kernel from RHEL 9.3, which was required due to an issue with another inference framework. We could finally upgrade to a current kernel last week, and since then this issue does not occur for us anymore. However, at the same time, some Nvidia libraries were also upgraded. I don't know whether upgrading the kernel or the Nvidia libraries (or just the subsequent reboot) fixed the issue for us. |
System Info
I was using v2.3.1 via docker and everything was working. When I updated to later versions including the latest my TGI doesn't start due to an error:
This is my
nvidia-smi
output:Information
Tasks
Reproduction
here is the TGI env:
And here is how I'm running container (running it via podman):
Which is generated on my system from running it via a docker compose file.
Expected behavior
The TGI server to start correctly and normally as it had before adding the Triton kernels!
The text was updated successfully, but these errors were encountered: