Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[V1] PR 1/N for v1 sample and prompt logprobs support #9880

Open
wants to merge 519 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
519 commits
Select commit Hold shift + click to select a range
ff7d7d2
updated
robertgshaw2-redhat Jan 2, 2025
8aa8baa
update comment
robertgshaw2-redhat Jan 2, 2025
527228d
format
robertgshaw2-redhat Jan 2, 2025
f5d0b57
reduce length of comments
robertgshaw2-redhat Jan 2, 2025
711ff13
updated
robertgshaw2-redhat Jan 2, 2025
3a99615
reduce assets
robertgshaw2-redhat Jan 2, 2025
6bb6d34
updated
robertgshaw2-redhat Jan 2, 2025
d73010d
updated
robertgshaw2-redhat Jan 2, 2025
b8f40df
updated
robertgshaw2-redhat Jan 2, 2025
e806678
clean
robertgshaw2-redhat Jan 2, 2025
afef932
reduce cruft
robertgshaw2-redhat Jan 2, 2025
71580ae
revert crruft
robertgshaw2-redhat Jan 2, 2025
1d52a37
updated
robertgshaw2-redhat Jan 3, 2025
c8eef87
cleanup
robertgshaw2-redhat Jan 3, 2025
b501aed
updated
robertgshaw2-redhat Jan 3, 2025
ac070f8
updated
robertgshaw2-redhat Jan 3, 2025
9a28ddf
updated
robertgshaw2-redhat Jan 3, 2025
d1a956d
update comment
robertgshaw2-redhat Jan 3, 2025
5fd0060
updated
robertgshaw2-redhat Jan 3, 2025
433b93c
merge
robertgshaw2-redhat Jan 3, 2025
0d2f7c8
stash
robertgshaw2-redhat Jan 3, 2025
06b9aba
cleanup
robertgshaw2-redhat Jan 3, 2025
035e2c2
updated
robertgshaw2-redhat Jan 3, 2025
17e41c8
remove
robertgshaw2-redhat Jan 3, 2025
2cb4832
finish cleaning sampler.py
robertgshaw2-redhat Jan 3, 2025
92595a4
updated
robertgshaw2-redhat Jan 3, 2025
c82fc85
updated comment
robertgshaw2-redhat Jan 3, 2025
c3c4f9c
passing mypy!
robertgshaw2-redhat Jan 3, 2025
fec3d15
comment
robertgshaw2-redhat Jan 3, 2025
d002d67
todo -> fixme
robertgshaw2-redhat Jan 3, 2025
3157e8b
updated
robertgshaw2-redhat Jan 3, 2025
60125e3
fixed sampler bug
afeldman-nm Jan 4, 2025
5908cb1
fixed some sampler bugs
afeldman-nm Jan 5, 2025
c5f9565
merge
afeldman-nm Jan 5, 2025
fc52031
wip fixing detokenizer test
afeldman-nm Jan 5, 2025
7dc2756
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
6e57de4
wip
afeldman-nm Jan 6, 2025
599aae8
temporary hack to use pickling
afeldman-nm Jan 6, 2025
2aa1007
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
ae1e1b7
wip detokenizer test
afeldman-nm Jan 6, 2025
ae00145
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
a1c5b2e
fix: logprobs not being wrapped in an array
afeldman-nm Jan 6, 2025
7288370
sample logprobs work
afeldman-nm Jan 6, 2025
85e57d9
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
0e90ccb
detokenizer test passing for sample logprobs
afeldman-nm Jan 6, 2025
c2f48fb
detokenizer tests passing
afeldman-nm Jan 6, 2025
7993d08
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
13177d4
prompt logprobs with chunked prefill!
afeldman-nm Jan 6, 2025
05536f5
cleanup
afeldman-nm Jan 6, 2025
fa64529
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 6, 2025
0d17df8
light refactor
afeldman-nm Jan 6, 2025
f707191
torch serialization with msgpack via enc_/ext_hooksgit status!
afeldman-nm Jan 6, 2025
637c45c
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 7, 2025
cd5e7c6
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
8b1b995
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
3d00348
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
ce4f081
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 8, 2025
62d648a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 9, 2025
3546639
wip
afeldman-nm Jan 9, 2025
a8c0167
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 9, 2025
0bba8f9
Merge branch 'v1_logprobs' into v1_logprobs_prompt
afeldman-nm Jan 9, 2025
69218ab
GPU returns num_prompt_logprobs + 1 prompt logprobs
afeldman-nm Jan 9, 2025
2505244
now prompt logprobs include prompt token
afeldman-nm Jan 9, 2025
e1058ac
wip making prompt logprobs line up with tok ids
afeldman-nm Jan 9, 2025
5f33902
partial req peek token
afeldman-nm Jan 9, 2025
199a834
refactoring
afeldman-nm Jan 9, 2025
879fc44
refactoring; non-blocking cpu->gpu transfer
afeldman-nm Jan 9, 2025
0f425fe
wip detokenizer tests
afeldman-nm Jan 9, 2025
1089127
detok test fix
afeldman-nm Jan 9, 2025
d2742d8
passing detok tests
afeldman-nm Jan 9, 2025
cf28c9b
Merge branch 'main' into v1_logprobs
afeldman-nm Jan 9, 2025
749be5a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 10, 2025
a55e679
LLMEngine test working, wip AsyncLLM test
afeldman-nm Jan 10, 2025
b2c0c95
reverted unwanted changes
afeldman-nm Jan 10, 2025
9a40c5f
success
afeldman-nm Jan 10, 2025
ca94fd4
Merge branch 'main' into v1_logprobs_apc_merge
afeldman-nm Jan 10, 2025
465d984
added test_completion, switched model
afeldman-nm Jan 12, 2025
1f19724
wip test_completion
afeldman-nm Jan 12, 2025
33d3922
merge
afeldman-nm Jan 12, 2025
4ed0994
Merge branch 'v1_logprobs' into v1_logprobs_apc_merge
afeldman-nm Jan 12, 2025
33093ee
Merge branch 'main' into afeldman-nm/v1_logprobs
robertgshaw2-redhat Jan 13, 2025
435bb15
updated
robertgshaw2-redhat Jan 13, 2025
c996901
sort of fixed RequestState cyclical import; added logprobs, prompt_lo…
afeldman-nm Jan 14, 2025
ba9561a
actually fixed RequestState circular import
afeldman-nm Jan 14, 2025
34735be
woops
afeldman-nm Jan 14, 2025
6a501eb
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 14, 2025
49c2c8c
wip
afeldman-nm Jan 14, 2025
016e747
untested first-pass at logprobs integration into new output processin…
afeldman-nm Jan 15, 2025
b269a7a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
cda2ba2
wip
afeldman-nm Jan 15, 2025
bf20f4b
passing with no sample/prompt logprobs
afeldman-nm Jan 15, 2025
4fae200
fix to get prompt logprobs tests passing (sample logprobs tests alrea…
afeldman-nm Jan 15, 2025
9deca70
sample and prompt logprobs optional in EngineCoreOutput; makes detoke…
afeldman-nm Jan 15, 2025
d96ec24
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
46e65ae
wip
afeldman-nm Jan 15, 2025
65b9b64
refactored output processor test vectors into utils and test fixtures
afeldman-nm Jan 15, 2025
6ddc4f9
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
8dad984
refactored test fixtures
afeldman-nm Jan 15, 2025
789d0a4
merge
afeldman-nm Jan 15, 2025
29f491f
format
afeldman-nm Jan 15, 2025
3302eae
format
afeldman-nm Jan 15, 2025
fb3c836
Merge branch 'v1_logprobs_apc' into v1_logprobs_test_merge
afeldman-nm Jan 15, 2025
110afd1
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
a6ecea4
mock engine includes logprobs
afeldman-nm Jan 15, 2025
c12f29a
progress integrating logprobs into output processor tests
afeldman-nm Jan 15, 2025
29dd713
non-logprobs output processor tests pass
afeldman-nm Jan 15, 2025
29f77e3
output processor tests passing without logprobs checks
afeldman-nm Jan 15, 2025
18a2162
added logprobs test; detokenizer test is just detokenizer
afeldman-nm Jan 15, 2025
2648a05
merge
afeldman-nm Jan 15, 2025
ab40e32
Merge branch 'v1_logprobs_proc_test' into v1_logprobs
afeldman-nm Jan 15, 2025
e16ea40
output processor tests almost finished
afeldman-nm Jan 15, 2025
89f7977
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 15, 2025
3e23b32
Merge branch 'v1_logprobs_merge' into v1_logprobs
afeldman-nm Jan 15, 2025
cf92387
Merge branch 'v1_logprobs_proc_test' into v1_logprobs
afeldman-nm Jan 15, 2025
c8fc3c3
wip
afeldman-nm Jan 16, 2025
bd2c36b
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 16, 2025
63d4484
_validate_logprobs progress
afeldman-nm Jan 16, 2025
4353f01
enhanced logprobs checks
afeldman-nm Jan 16, 2025
1c418a6
wip
afeldman-nm Jan 17, 2025
ab95d87
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 17, 2025
80d420d
Merge branch 'v1_logprobs' into v1_logprobs_proc_test_merge
afeldman-nm Jan 17, 2025
1a5850b
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 17, 2025
c554c5c
merge
afeldman-nm Jan 20, 2025
c24cfd6
cleanup
afeldman-nm Jan 20, 2025
0a8f9ae
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
201c1cd
Update vllm/v1/engine/detokenizer.py
afeldman-nm Jan 22, 2025
832d5c8
Update vllm/v1/engine/detokenizer.py
afeldman-nm Jan 22, 2025
1a59237
detokenize()
afeldman-nm Jan 22, 2025
a49ab7e
Update vllm/v1/sample/sampler.py
afeldman-nm Jan 22, 2025
2bf6829
removed unnecessary lines from Scheduler; array_like in EngineCoreOutput
afeldman-nm Jan 22, 2025
c008732
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 22, 2025
bfce1d6
tuples
afeldman-nm Jan 22, 2025
982381d
redundant else's
afeldman-nm Jan 22, 2025
aedc1b8
Update vllm/v1/worker/gpu_input_batch.py
afeldman-nm Jan 22, 2025
6c0fe71
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 22, 2025
f5f5954
tuple
afeldman-nm Jan 22, 2025
4480ec0
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
6adaa1f
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
ef2d33a
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
d09604e
merge
afeldman-nm Jan 22, 2025
f1d0234
modified reference detokenization impl
afeldman-nm Jan 22, 2025
135a585
don't use decode()
afeldman-nm Jan 22, 2025
753d7c7
don't use decode for prompt logprobs
afeldman-nm Jan 22, 2025
207b802
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 22, 2025
c0033a4
refactor
afeldman-nm Jan 22, 2025
a20fa58
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 23, 2025
c6b87b1
Update vllm/v1/worker/gpu_model_runner.py
afeldman-nm Jan 24, 2025
bdb1dbe
naive prompt logprobs fix doesn't work
afeldman-nm Jan 24, 2025
b2a779f
new prompt logprobs approach
afeldman-nm Jan 24, 2025
3d8c7fd
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
4435b4b
Merge branch 'v1_logprobs' into v1_logprobs_repeek
afeldman-nm Jan 24, 2025
ef233c4
Integrated next-chunk peek into input_ids
afeldman-nm Jan 24, 2025
e1a9c51
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
fde5c1a
Merge branch 'v1_logprobs' into v1_logprobs_repeek
afeldman-nm Jan 24, 2025
091c2e1
remove unnecessary code; refactor
afeldman-nm Jan 24, 2025
2e98cf0
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 24, 2025
c83529e
Merge branch 'v1_logprobs_repeek' into v1_logprobs
afeldman-nm Jan 24, 2025
f9d9eb9
fixing lint failures
afeldman-nm Jan 24, 2025
0a012f3
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
0526a01
Merge branch 'v1_logprobs_merge' into v1_logprobs
afeldman-nm Jan 24, 2025
e33d8bb
partial_req_ids -> partial_req_id
afeldman-nm Jan 24, 2025
3562aec
partial_req_ids -> partial_req_id
afeldman-nm Jan 24, 2025
5eb3aa0
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
8564e79
bugfix
afeldman-nm Jan 24, 2025
7d0d6d8
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 24, 2025
3f8edfe
non-blocking copy to cpu
afeldman-nm Jan 24, 2025
f38a17c
Rework output processor logic
njhill Jan 24, 2025
ea2a005
Fix test
njhill Jan 26, 2025
1fc08f2
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 27, 2025
955953d
fix basic import issue
afeldman-nm Jan 27, 2025
e6edcbe
cleanup
afeldman-nm Jan 28, 2025
94e120f
cleanup
afeldman-nm Jan 28, 2025
1c24f91
Update vllm/v1/worker/gpu_model_runner.py
afeldman-nm Jan 28, 2025
dd96496
cleanup
afeldman-nm Jan 28, 2025
7a44291
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Jan 28, 2025
0ca162a
delta mode fix
afeldman-nm Jan 28, 2025
fe56625
WIP add ranks etc.
njhill Jan 29, 2025
3efb2df
more encapsulated prompt logprobs approach
afeldman-nm Jan 29, 2025
9bfccb8
fixes
afeldman-nm Jan 29, 2025
1b7fe30
pythonized engine core logprobs
afeldman-nm Jan 29, 2025
b812d17
merging serialization changes
afeldman-nm Jan 29, 2025
e0e0708
logprob ranks work
afeldman-nm Jan 29, 2025
c609a3d
refactor
afeldman-nm Jan 29, 2025
aaea609
merge
afeldman-nm Jan 29, 2025
b0a0451
0 logprobs test
afeldman-nm Jan 29, 2025
fb5add1
Merge remote-tracking branch 'origin/main' into afeldman-nm/v1_logprobs
njhill Jan 29, 2025
1d505d4
zero fix is probably in
afeldman-nm Jan 29, 2025
e73af7b
zero fix is almost in; updated logprobs test cases
afeldman-nm Jan 29, 2025
17b21ac
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 29, 2025
fc79dfa
zero issue seems to be fixed
afeldman-nm Jan 30, 2025
2e05530
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 30, 2025
bcfa1d6
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 30, 2025
34b20f4
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 30, 2025
1e730ad
Clean-up; simplify Logprobs dict construction
njhill Jan 30, 2025
d797cbf
wip
afeldman-nm Jan 30, 2025
891604e
Updated logprobs processor unit tests to reflect new engine core outp…
afeldman-nm Jan 30, 2025
2d4a96e
Merge branch 'afeldman-nm/v1_logprobs' of https://github.com/neuralma…
afeldman-nm Jan 30, 2025
a9ecdf9
bugfix
afeldman-nm Jan 31, 2025
bec83f5
fixed test vector bug
afeldman-nm Jan 31, 2025
d663462
rank computation fix
afeldman-nm Jan 31, 2025
ce7c38c
wip
afeldman-nm Jan 31, 2025
f186fe3
reverting
afeldman-nm Jan 31, 2025
7c4b089
reverting
afeldman-nm Jan 31, 2025
5129c20
reverting
afeldman-nm Jan 31, 2025
cc8ce98
revert complete
afeldman-nm Jan 31, 2025
de94a16
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 31, 2025
ff3122a
fixed serialization bug
afeldman-nm Jan 31, 2025
2ca6b03
stop fix
afeldman-nm Jan 31, 2025
f559431
acknowledge broken invariant
afeldman-nm Jan 31, 2025
fe053b0
Merge branch 'v1_logprobs' into v1_logprobs_merge
afeldman-nm Jan 31, 2025
2896276
Merge branch 'main' into v1_logprobs_merge
afeldman-nm Jan 31, 2025
f61cf5c
fixed typing issue
afeldman-nm Jan 31, 2025
ad25c6d
woops zero logprob fix
afeldman-nm Jan 31, 2025
f934094
additional zero logprob fix
afeldman-nm Jan 31, 2025
bc92a80
merge
afeldman-nm Jan 31, 2025
696f890
small fix
afeldman-nm Jan 31, 2025
8f209f2
Merge branch 'v1_logprobs_apc' into v1_logprobs_test
afeldman-nm Jan 31, 2025
e74c0e4
echo tests pass
afeldman-nm Jan 31, 2025
21428e3
Simplify Logprobs dict construction
njhill Feb 1, 2025
c8ae49b
nit
robertgshaw2-redhat Feb 2, 2025
5883a70
revert changes
robertgshaw2-redhat Feb 2, 2025
c180b37
formats
robertgshaw2-redhat Feb 2, 2025
7c8e13d
update
robertgshaw2-redhat Feb 2, 2025
6486cdd
update comment
robertgshaw2-redhat Feb 2, 2025
22b07e2
update
robertgshaw2-redhat Feb 2, 2025
e7ba970
revert unnessary change
robertgshaw2-redhat Feb 2, 2025
8c57385
cleanup suprious change
robertgshaw2-redhat Feb 2, 2025
b0af03b
cleanup suprious change
robertgshaw2-redhat Feb 2, 2025
3be08c8
simplify update sample logprobs logic
robertgshaw2-redhat Feb 2, 2025
d9dc980
mypy:)
robertgshaw2-redhat Feb 2, 2025
36b9a36
share more logic between sample and prompt logprobs
robertgshaw2-redhat Feb 2, 2025
4658be3
updated
robertgshaw2-redhat Feb 2, 2025
9ee7ac3
updated
robertgshaw2-redhat Feb 2, 2025
4d5f444
stash
robertgshaw2-redhat Feb 2, 2025
1f1f49a
updated
robertgshaw2-redhat Feb 2, 2025
b9632e2
remove
robertgshaw2-redhat Feb 2, 2025
52dd142
fail if prefix caching is enabled
robertgshaw2-redhat Feb 2, 2025
76a0324
mypy
robertgshaw2-redhat Feb 2, 2025
2448997
stash
robertgshaw2-redhat Feb 2, 2025
1fec265
updated
robertgshaw2-redhat Feb 2, 2025
8c1b89e
merge
afeldman-nm Feb 2, 2025
29da400
fix
robertgshaw2-redhat Feb 2, 2025
f70661f
Merge branch 'v1_logprobs_test_merge' into v1_logprobs
afeldman-nm Feb 2, 2025
1d19700
revert
robertgshaw2-redhat Feb 2, 2025
50b7660
updated
robertgshaw2-redhat Feb 2, 2025
2c0d7f3
Merge branch 'main' into afeldman-nm/v1_logprobs
robertgshaw2-redhat Feb 2, 2025
56aa97f
updated with header
robertgshaw2-redhat Feb 2, 2025
5cc977a
fix pref commit
robertgshaw2-redhat Feb 2, 2025
73653f1
missing test
robertgshaw2-redhat Feb 2, 2025
4c3ca35
revert msgpack
robertgshaw2-redhat Feb 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added tests/v1/samplers/__init__.py
Empty file.
387 changes: 387 additions & 0 deletions tests/v1/samplers/test_logprobs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
from typing import List, Tuple

import pytest
import torch

from tests.kernels.utils import override_backend_env_variable
from vllm import SamplingParams

from ...conftest import VllmRunner

MODELS = ["facebook/opt-125m"]


def _get_test_batch(batch_logprobs_composition: str) -> List[Tuple]:
"""Generate logprobs configs for a batch of requests

A given request's logprobs configuration is (1) num_sample_logprobs and (2)
num_prompt_logprobs. The batch logprobs configuration is the list of request
logprobs configs.

batch_logprobs_composition == "NONE" yields a batch with no sample or prompt
logprobs

batch_logprobs_composition == "SAMPLE" yields a batch with some requests
configured for sample logprobs only, and others configured for no logprobs

batch_logprobs_composition == "PROMPT" yields a batch with some requests
configured for prompt logprobs only, and others configured for no logprobs

batch_logprobs_composition == "SAMPLE_PROMPT" yields a batch with some
requests configured for sample logprobs and prompt logprobs, some configured
for only sample logprobs or only prompt logprobs, and some configured for
no logprobs

Args:
batch_logprobs_composition: types of logprobs configs to include in batch

Returns:

List of (Optional[num_sample_logprobs], Optional[num_prompt_logprobs])
tuples
"""
if batch_logprobs_composition == "NONE":
# No requests with sample or prompt logprobs
return [(None, None), (0, None), (None, 0), (0, 0)]
elif batch_logprobs_composition == "SAMPLE":
return [
(None, None),
(None, 0),
(0, None),
(0, 0),
(5, None),
(3, 0),
]
elif batch_logprobs_composition == "PROMPT":
return [
(None, 0),
(0, None),
(0, 0),
(None, 6),
(0, 5),
]
elif batch_logprobs_composition == "SAMPLE_PROMPT":
return [
(None, 0),
(0, None),
(0, 0),
(5, None),
(3, 0),
(6, 3),
(None, 6),
(0, 5),
]
else:
raise ValueError("Invalid logprobs batch configuration for test.")


def _test_case_get_logprobs_and_prompt_logprobs(
hf_runner,
vllm_runner,
model: str,
dtype: str,
detokenize: bool,
batch_logprobs_composition: str,
max_num_batched_tokens: int,
example_prompts,
monkeypatch,
) -> None:
test_prompts = example_prompts

# LLM engine v1
monkeypatch.setenv("VLLM_USE_V1", "1")
override_backend_env_variable(monkeypatch, "FLASH_ATTN")

max_num_seqs = 128
max_num_batched_tokens = 128
max_model_len = 128

max_tokens = 5
with hf_runner(model, dtype=dtype) as hf_model:
hf_outputs = hf_model.generate_greedy(
test_prompts,
max_tokens=max_tokens,
)
hf_logprobs = hf_model.generate_greedy_logprobs(
test_prompts,
max_tokens=max_tokens,
)

# Batch has mixed sample params
# (different logprobs/prompt logprobs combos)
logprob_prompt_logprob_list = _get_test_batch(batch_logprobs_composition)

# We rely on there being more prompts than combinations of
# logprobs & prompt logprobs which we want to test
assert len(test_prompts) >= len(logprob_prompt_logprob_list)
# Make sure there is a sample params for each prompt
num_extra_params = len(test_prompts) - len(logprob_prompt_logprob_list)
if num_extra_params > 0:
logprob_prompt_logprob_list = (
logprob_prompt_logprob_list +
logprob_prompt_logprob_list[-num_extra_params:])
# Now the number of prompts should match the number of sample params combos
assert len(test_prompts) == len(logprob_prompt_logprob_list)
# Generate SamplingParams
vllm_sampling_params = [
SamplingParams(max_tokens=max_tokens,
logprobs=lp,
prompt_logprobs=plp,
temperature=0.0,
detokenize=detokenize)
for lp, plp in logprob_prompt_logprob_list
]

with vllm_runner(
model,
dtype=dtype,
max_logprobs=7,
max_num_batched_tokens=max_num_batched_tokens,
max_num_seqs=max_num_seqs,
max_model_len=max_model_len,
enforce_eager=True,
) as vllm_model:
vllm_results = vllm_model.model.generate(
test_prompts, sampling_params=vllm_sampling_params)

for vllm_result, hf_logprob, hf_output, logprob_prompt_logprob in zip(
vllm_results, hf_logprobs, hf_outputs,
logprob_prompt_logprob_list):

# Extract request-level (prompt)logprobs config
num_top_logprobs = logprob_prompt_logprob[0]
num_top_prompt_logprobs = logprob_prompt_logprob[1]

# Test whether sampled token output is consistent between vLLM and HF
# vLLM prompt+completion should match HF output
assert (vllm_result.prompt_token_ids +
vllm_result.outputs[0].token_ids == hf_output[0])

# Validate sample logprobs
if num_top_logprobs is not None and num_top_logprobs > 0:
assert num_top_logprobs is not None
# Confirm that the structure of the sample logprobs in the result is
# correct
assert vllm_result.outputs[0].logprobs is not None
assert len(vllm_result.outputs[0].logprobs) == max_tokens
for logprobs in vllm_result.outputs[0].logprobs:
assert logprobs is not None
# If the output token is not included in the top X
# logprob, it can return 1 more data
assert (len(logprobs) == num_top_logprobs
or len(logprobs) == num_top_logprobs + 1)
output_text = vllm_result.outputs[0].text
output_string_from_most_likely_tokens_lst: List[str] = []
for top_logprobs in vllm_result.outputs[0].logprobs:
top_logprob = next(iter(top_logprobs.values()))
output_string_from_most_likely_tokens_lst.append(
top_logprob.decoded_token)

if detokenize:
output_string_from_most_likely_tokens = "".join(
output_string_from_most_likely_tokens_lst)
assert output_text == output_string_from_most_likely_tokens, (
"The output text from the top logprob for each token "
"position should be the same as the output text in the "
"result.")
else:
assert output_text == ''
assert output_string_from_most_likely_tokens_lst == (
[None] * max_tokens)

# Compare vLLM sample logprobs to HF
vllm_sample_logprobs = vllm_result.outputs[0].logprobs
for i, top_logprobs in enumerate(vllm_sample_logprobs):
for token_id, sample_logprob in top_logprobs.items():
logprob = sample_logprob.logprob
torch.testing.assert_close(
logprob,
hf_logprob[i][-1][token_id].item(),
atol=1e-2,
rtol=1e-2)
if detokenize:
assert isinstance(sample_logprob.decoded_token, str), (
"The token should be decoded by the time it is"
" returned to the user.")
else:
# Logprobs disabled for this request; should be None
assert vllm_result.outputs[0].logprobs is None

# Validate prompt logprobs
if (num_top_prompt_logprobs is not None
and num_top_prompt_logprobs > 0):
# Confirm that structure of prompt logprobs in result is correct
assert vllm_result.prompt_logprobs is not None
# - The first prompt logprob is always None
assert vllm_result.prompt_logprobs[0] is None
# - Prompt logprobs are returned for all indices in
# the prompt
assert len(vllm_result.prompt_logprobs) == len(
vllm_result.prompt_token_ids)
for prompt_logprobs in vllm_result.prompt_logprobs[1:]:
assert prompt_logprobs is not None
# - If the prompt token is not included in the top X
# logprob, it can return 1 more data
assert (len(prompt_logprobs) == num_top_prompt_logprobs
or len(prompt_logprobs) == num_top_prompt_logprobs + 1)

# Compare prompt logprobs to HF
# The first prompt logprob is always None, so we compare it from
# 1:.
vllm_prompt_logprobs = vllm_result.prompt_logprobs[1:]
for i, vllm_prompt_logprob_dict in enumerate(vllm_prompt_logprobs):
for token_id, logprob in vllm_prompt_logprob_dict.items():
torch.testing.assert_close(
logprob.logprob,
hf_logprob[0][i][token_id].item(),
atol=2e-2,
rtol=2e-2)
else:
assert vllm_result.prompt_logprobs is None


@pytest.mark.parametrize("model", MODELS)
@pytest.mark.parametrize("dtype",
["half"]) # needed for comparing logprobs with HF
@pytest.mark.parametrize("max_num_batched_tokens", [128, 256, 1024])
@pytest.mark.parametrize("batch_logprobs_composition",
["NONE", "SAMPLE", "PROMPT", "SAMPLE_PROMPT"])
def test_get_logprobs_and_prompt_logprobs(
hf_runner,
vllm_runner,
model: str,
dtype: str,
batch_logprobs_composition: str,
max_num_batched_tokens: int,
example_prompts,
monkeypatch,
) -> None:
"""Test V1 Engine logprobs & prompt logprobs

Exercise a variety of combinations of `logprobs` and `prompt_logprobs`
settings and validate that
* The generated logprobs and prompt logprobs are consistent with the
configuration settings, in terms of whether or not the logprobs
(of either type) were requested and how many were requested
* The generated logprobs are consistent with the generated tokens
* The generated (prompt)logprobs are consistent with HuggingFace
(prompt)logprobs, as a reference

batch_logprobs_composition controls the logprobs configurations for
requests in the batch under test.

Args:
hf_runner
vllm_runner
model
dtype
batch_logprobs_composition: logprobs configuration for test batch
max_num_batched_tokens: token budget for scheduling
example_prompts
monkeypatch
"""
_test_case_get_logprobs_and_prompt_logprobs(
hf_runner=hf_runner,
vllm_runner=vllm_runner,
model=model,
dtype=dtype,
detokenize=True,
batch_logprobs_composition=batch_logprobs_composition,
max_num_batched_tokens=max_num_batched_tokens,
example_prompts=example_prompts,
monkeypatch=monkeypatch)


@pytest.mark.parametrize("model", MODELS)
@pytest.mark.parametrize("dtype",
["half"]) # needed for comparing logprobs with HF
@pytest.mark.parametrize("max_num_batched_tokens", [128])
@pytest.mark.parametrize("batch_logprobs_composition",
["NONE", "SAMPLE", "PROMPT", "SAMPLE_PROMPT"])
def test_fast_get_logprobs_and_prompt_logprobs(
hf_runner,
vllm_runner,
model: str,
dtype: str,
batch_logprobs_composition: str,
max_num_batched_tokens: int,
example_prompts,
monkeypatch,
) -> None:
"""Fast test: V1 Engine logprobs & prompt logprobs

Faster version of `test_get_logprobs_and_prompt_logprobs` with
fewer test cases.
"""

_test_case_get_logprobs_and_prompt_logprobs(
hf_runner=hf_runner,
vllm_runner=vllm_runner,
model=model,
dtype=dtype,
detokenize=True,
batch_logprobs_composition=batch_logprobs_composition,
max_num_batched_tokens=max_num_batched_tokens,
example_prompts=example_prompts,
monkeypatch=monkeypatch)


def test_max_logprobs(monkeypatch):
"""vLLM v1 engine should fail a request with `logprobs > max_logprobs`

Should also fail for `prompt_logprobs > max_logprobs`

Args:
monkeypatch
"""
# LLM engine v1
monkeypatch.setenv("VLLM_USE_V1", "1")
override_backend_env_variable(monkeypatch, "FLASH_ATTN")

runner = VllmRunner("facebook/opt-125m", max_logprobs=1)
vllm_sampling_params = SamplingParams(logprobs=1)
# should pass
runner.generate(["Hello world"], sampling_params=vllm_sampling_params)

bad_sampling_params = SamplingParams(logprobs=2)
with pytest.raises(ValueError):
runner.generate(["Hello world"], sampling_params=bad_sampling_params)


@pytest.mark.parametrize("model", MODELS)
def test_none_logprobs(vllm_runner, model, example_prompts, monkeypatch):
"""Engine should return `logprobs` and `prompt_logprobs` as `None`

Args:
vllm_runner
model
example_prompts
monkeypatch
"""

# LLM engine v1
monkeypatch.setenv("VLLM_USE_V1", "1")
override_backend_env_variable(monkeypatch, "FLASH_ATTN")

max_num_seqs = 256
max_num_batched_tokens = None
max_tokens = 5

with vllm_runner(
model,
max_num_batched_tokens=max_num_batched_tokens,
max_num_seqs=max_num_seqs,
) as vllm_model:
sampling_params_logprobs_none = SamplingParams(max_tokens=max_tokens,
logprobs=None,
prompt_logprobs=None,
temperature=0.0)
results_logprobs_none = vllm_model.model.generate(
example_prompts, sampling_params=sampling_params_logprobs_none)

for i in range(len(results_logprobs_none)):
# Check sample logprobs are None
assert results_logprobs_none[i].outputs[0].logprobs is None
assert results_logprobs_none[i].outputs[0].cumulative_logprob is None
# Check prompt logprobs are None
assert results_logprobs_none[i].prompt_logprobs is None
Loading