-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Granite 3.0 disconnect between parser and example template
bug
Something isn't working
#10379
opened Nov 15, 2024 by
wilbry
1 task done
[Feature]: NVIDIA Triton GenAI Perf Benchmark
feature request
good first issue
Good for newcomers
help wanted
Extra attention is needed
#10377
opened Nov 15, 2024 by
simon-mo
1 task done
[Bug]: Guided Decoding Broken in Streaming mode
bug
Something isn't working
#10376
opened Nov 15, 2024 by
JC1DA
1 task done
[Bug]: Torch profiling does not stop and cannot get traces for all workers
bug
Something isn't working
#10365
opened Nov 15, 2024 by
ruisearch42
1 task done
[Bug]: contine generation but do not return the output
bug
Something isn't working
#10359
opened Nov 15, 2024 by
siyuyuan
1 task done
[Bug]: Qwen2 VL takes only 18Gb when run by using hugggingface code, but the same model takes 38 GB GPU memory with VLM
bug
Something isn't working
#10357
opened Nov 15, 2024 by
Samjith888
[Usage]: cuda oom when serving multi task on same server
usage
How to use vllm
#10345
opened Nov 15, 2024 by
reneix
1 task done
[Misc]: Snowflake Arctic out of memory error with TP-8
bug
Something isn't working
#10344
opened Nov 14, 2024 by
rajagond
1 task done
[Feature]: Allow head_size smaller than 128 on TPU with Pallas backend
feature request
#10343
opened Nov 14, 2024 by
manninglucas
1 task done
[Bug]: KV Cache Error with KV_cache_dtype=FP8 and Large Sequence Length: Losing Context Length of Model
bug
Something isn't working
#10337
opened Nov 14, 2024 by
amakaido28
1 task done
[Bug]: different output of same prompt when inferred with single sequence vs concurrent requests on vllm openai server , temp =0.
bug
Something isn't working
#10336
opened Nov 14, 2024 by
bhupendrathore
1 task done
[Bug]: Out of Memory (OOM) Issues During MMLU Evaluation with lm_eval
bug
Something isn't working
#10325
opened Nov 14, 2024 by
wchen61
1 task done
[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}]
bug
Something isn't working
#10324
opened Nov 14, 2024 by
victorserbu2709
1 task done
[Feature]: To adapt to the TTS task, I need to directly pass in the embedding. How should I modify it?
feature request
#10323
opened Nov 14, 2024 by
1nlplearner
1 task done
[Installation]: Request to include vllm==0.6.2 for cuda 11.8
installation
Installation problems
#10319
opened Nov 14, 2024 by
amew0
1 task done
[Performance]: Results from the vLLM Blog article "How Speculative Decoding Boosts vLLM Performance by up to 2.8x" are unreproducible
performance
Performance-related issues
#10318
opened Nov 14, 2024 by
yeonjoon-jung01
1 task done
[Bug]: FusedMoE kernel performance depends on input prompt length while decoding
bug
Something isn't working
#10313
opened Nov 14, 2024 by
taegeonum
1 task done
[Usage]: how to use How to use vllm
vllm
to output code only
usage
#10309
opened Nov 14, 2024 by
shaoyuyoung
1 task done
[Installation]: Build vllm environment error
installation
Installation problems
#10303
opened Nov 13, 2024 by
Kawai1Ace
1 task done
[Bug]: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
bug
Something isn't working
#10300
opened Nov 13, 2024 by
yananchen1989
1 task done
[Bug]: Get meaningless output when run long context inference of Qwen2.5 model with vllm>=0.6.3
bug
Something isn't working
#10298
opened Nov 13, 2024 by
piamo
1 task done
[Bug]: VLLLm crash when running Qwen/Qwen2.5-Coder-32B-Instruct on two H100 GPUs
bug
Something isn't working
#10296
opened Nov 13, 2024 by
noamwies
1 task done
[Usage]: What does "since, enforce-eager is enabled, async output processor cannot be used" mean exactly?
usage
How to use vllm
#10295
opened Nov 13, 2024 by
Leon-Sander
1 task done
[Feature]: Quark quantization format upstream to VLLM
feature request
#10294
opened Nov 13, 2024 by
kewang-xlnx
[Bug]: Can't use yarn rope config for long context in Qwen2 model
bug
Something isn't working
#10293
opened Nov 13, 2024 by
FlyCarrot
1 task done
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.