-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Engine loop has died #419
Comments
Hi @warlock135, which version of HabanaAI vllm-fork are you using, 1.18.0 or habana_main? |
I'm using the habana_main version |
@warlock135 we will try to find what the issue is. In the meantime, please try using v1.18.0 branch (tag v0.5.3.post1+Gaudi-1.18.0), and see if the issue is still present. |
After switching to v0.5.3.post1+Gaudi-1.18.0 and applying the patch from this pull request, the error no longer occurred, even after several hours of inferencing. |
@warlock135 glad to hear that original issue goes away. Can you please file another ticket with repro steps regarding the BS 30 scenario? It's the same command as here? We will look into this |
Setup: I followed the instructions provided here. The only difference is that I used the pytorch-installer-2.4.0 Docker image instead of pytorch-installer-2.4.1 (which resulted in an "unknown manifest" error during the pull). The command to run VLLM remains the same as in here. Client: I’m using a Python aiohttp client to send requests to VLLM. API: /v1/completions Prompts: Each prompt contains approximately 3,500 words (with a chat template applied, without any system prompt). |
I also encountered an engine crash today when attempting to cancel client requests (70 ccu/batchsize). Below is the error log.
|
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
When inferencing with vllm (using openAI api server), I got the error below:
After this, all current request were released and vllm crash when another request came:
I started vllm with the following command (inside docker container with image vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest)
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: