-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: [TPU] Prefix caching + w8a8 + long context results in degraded performance and corrupted output
bug
Something isn't working
#12371
opened Jan 23, 2025 by
kiratp
1 task done
Release v0.7.0
release
Related to new version release
#12365
opened Jan 23, 2025 by
simon-mo
5 tasks
[Bug]: Inference with gguf returns garbage
bug
Something isn't working
#12364
opened Jan 23, 2025 by
q0dr
1 task done
[RFC]: Refactor
config-format
and load-format
as plugins
RFC
#12363
opened Jan 23, 2025 by
maxdebayser
1 task done
[New Model]: Request for supporting microsoft/phi-4 Model
new model
Requests to new models
#12358
opened Jan 23, 2025 by
yash7verma
1 task done
[Usage]: When running models on multiple GPUs, workload does not get split
usage
How to use vllm
#12354
opened Jan 23, 2025 by
ArturDev42
1 task done
[Bug]: Cannot serve Qwen2.5 in OpenVINO
bug
Something isn't working
#12350
opened Jan 23, 2025 by
cheng358
1 task done
[Usage]: how to use tool calling with auto option, setting the tool works
usage
How to use vllm
#12349
opened Jan 23, 2025 by
balachandarsv
1 task done
[Usage]: Why does it consume so much memory?
usage
How to use vllm
#12346
opened Jan 23, 2025 by
HouLingLXH
1 task done
[Usage]: fp8 sparse gemm in vllm/csrc/sparse/cutlass/sparse_scaled_mm__xxx
usage
How to use vllm
#12344
opened Jan 23, 2025 by
zhink
1 task done
[Bug]: Why are the vLLM and Hugging Face Transformers inference results inconsistent?
bug
Something isn't working
#12343
opened Jan 23, 2025 by
Molasse
1 task done
[Bug]: Run multiple LLMs inference one by one with multiple TP always pending on the second one in Model list
bug
Something isn't working
#12337
opened Jan 23, 2025 by
alexhegit
1 task done
[Usage]: How to log incoming requests (inputs and outputs) in vllm serve ?
usage
How to use vllm
#12336
opened Jan 23, 2025 by
thangld201
1 task done
[Bug]: Speculative decoding does not work
bug
Something isn't working
speculative-decoding
#12323
opened Jan 22, 2025 by
JohnConnor123
1 task done
[Usage]: Is it possible to speed up the generation speed by adding another video card?
usage
How to use vllm
#12322
opened Jan 22, 2025 by
JohnConnor123
1 task done
[Performance]: Unable to produce the result of throughput & latency claimed on vLLM dashboard v0
performance
Performance-related issues
#12315
opened Jan 22, 2025 by
Neildave999
1 task done
[Performance]: It takes too much time to Add a request.
performance
Performance-related issues
#12314
opened Jan 22, 2025 by
HuXinjing
1 task done
[Usage]: File Access Error When Using RunAI Model Streamer with S3 in VLLM
usage
How to use vllm
#12311
opened Jan 22, 2025 by
nskumz
1 task done
[Bug]: Possible GPU Memory Utilization issue/bug for embeddings model
bug
Something isn't working
#12308
opened Jan 22, 2025 by
mmubeen-6
1 task done
[Bug]: CUDA Exception on multi-gpus with concurrent users
bug
Something isn't working
#12307
opened Jan 22, 2025 by
hahmad2008
1 task done
[Usage]: How can I use LLMEngine to perform distributed inference for multimodal large models, such as Qwen-VL?
usage
How to use vllm
#12305
opened Jan 22, 2025 by
frederichen01
1 task done
[Bug]: Linting pre-commit hook does not apply yapf fixes; yapf fails quietly
bug
Something isn't working
#12302
opened Jan 22, 2025 by
afeldman-nm
1 task done
[Bug]: build docker error
bug
Something isn't working
#12300
opened Jan 22, 2025 by
jordane95
1 task done
[Feature]: DeepSeek-R1 tool choice && Function Call
feature request
#12297
opened Jan 22, 2025 by
warlockedward
1 task done
Previous Next
ProTip!
Exclude everything labeled
bug
with -label:bug.