Spec Draft #3

LiuXiaoxuanPKU · 2023-11-06T18:05:59Z

No description provided.

vllm/engine/spec_decoding.py

vllm/model_executor/models/llama.py

vllm/sequence.py

vllm/engine/llm_engine.py

pian13131

Please take a look the comments!

vllm/engine/spec_dec.py

pian13131 · 2023-11-10T07:17:41Z

vllm/engine/spec_dec.py

+                                   token_id: int,
+                                   index: int):
+            token_prob = sample.sd_draft_probs[index]
+            assert token_id in token_prob


I am not sure if vllm itself how to handle such assertion. Typically, when you assertion failed, it will terminate the program. You don't want to terminate your whole vllm server program because of just one request. Please double check on how vllm handle this kind of case. Be careful to decide you should assert or not.

vllm/worker/worker.py

vllm/engine/llm_engine.py

vllm/engine/arg_utils.py

vllm/block.py

vllm/core/scheduler.py

vllm/engine/llm_engine.py

vllm/engine/spec_dec.py

vllm/model_executor/layers/kv_mqa.py

vllm/model_executor/layers/sampler.py

vllm/sequence.py

vllm/core/scheduler.py

vllm/engine/llm_engine.py

pian13131 · 2023-11-18T23:14:59Z

vllm/engine/llm_engine.py

@@ -566,6 +593,10 @@ def step(self) -> List[RequestOutput]:
            blocks_to_swap_out=scheduler_outputs.blocks_to_swap_out,
            blocks_to_copy=scheduler_outputs.blocks_to_copy,
        )
+
+        if self.spec_dec_worker:
+            # accept will set accepted_token_ids and accepted_token_probs in output


Same here, the description of the method/function should be side of its own definition instead of the place you call it.

pian13131 · 2023-11-18T23:19:15Z

vllm/engine/spec_dec.py

+PAD_TOKEN_ID = 0
+
+
+class SpecDecWorker(Worker):


It would be better to add some comments to this quite complex and important class.
why we need this class, how to use this class. you may add some thing like "User should call function xxx to xxxx when xxx, then call function xxx to xxx....."(only describe the public method) Give a high level usage explanation would helps others(or even yourself in the future) to understand your code

pian13131 · 2023-11-18T23:54:54Z

vllm/sequence.py

+        draft_tokens = [list(tp.keys())[0] for tp in self.draft_token_probs]
+        return draft_tokens
+
+    def get_verified_token_ids(self) -> List[int]:


Perhaps get_accpeted_token_ids would be better?

pian13131 · 2023-11-19T03:10:43Z

vllm/sequence.py

+    def _delete_logical_block(self, block: LogicalTokenBlock) -> None:
+        self.logical_token_blocks.remove(block)


any reason to create this method instead of just inplace use self.logical_token_blocks.remove(block)?

pian13131 · 2023-11-19T03:13:32Z

vllm/sequence.py

+        while n > 0:
+            assert len(self.logical_token_blocks) > 0
+            last_block = self.logical_token_blocks[-1]
+            if last_block.num_tokens < n:


if last_block.num_tokens == n, I think you also need to call self._delete_logical_block(last_block)?

pian13131 · 2023-11-19T03:18:28Z

vllm/worker/worker.py

+                    verify_tokens = seq_data.get_verified_token_ids()
+                    verify_len = len(verify_tokens)


I think you should use accept instead of verify in all places since difference word used for same key word would introduce ambiguity

pian13131 · 2023-11-19T03:21:15Z

vllm/worker/worker.py

-                    sliding_window_blocks = (self.sliding_window //
-                                             self.block_size)
-                    block_table = block_table[-sliding_window_blocks:]
+                if draft_len > 0:


I think here you'd better add some comments like:
"Speculative Decoding enabled case: <explain the main difference of this case compared with normal case, maybe something like it need to pass in multiple tokens that are accepted in previous run>"

truenorth8 · 2023-11-20T14:45:24Z

@LiuXiaoxuanPKU Are you considering to also port these changes to TGI (huggingface's inference project), at any point?
the current MR looks great btw

LiuXiaoxuanPKU · 2023-11-21T03:17:04Z

@LiuXiaoxuanPKU Are you considering to also port these changes to TGI (huggingface's inference project), at any point? the current MR looks great btw

Thanks for the attention! The PR is still in progress and the current version mainly focuses on correctness instead of performance. There are still many performance improvement spaces. We are targeting to make the PR runnable this week. The vllm team will start reviewing late this week and next week.

We will not work on porting to TGI for now. Making it work in vllm is already complicated...

github-actions · 2024-11-07T02:34:59Z

This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!

LiuXiaoxuanPKU added 3 commits November 6, 2023 09:58

spec draft

75ae5fd

Merge branch 'vllm-project:main' into spec

46cd4c3

minor

edeaec0

skrider reviewed Nov 7, 2023

View reviewed changes

vllm/engine/spec_decoding.py Outdated Show resolved Hide resolved

vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved

vllm/sequence.py Outdated Show resolved Hide resolved

skrider reviewed Nov 7, 2023

View reviewed changes

vllm/engine/llm_engine.py Outdated Show resolved Hide resolved

LiuXiaoxuanPKU added 12 commits November 7, 2023 16:26

minor

95a7e13

draft tokens

366fbb9

minor

3c7397e

merge

9f35009

Merge branch 'main' of github.com:LiuXiaoxuanPKU/vllm

9b64276

Merge branch 'main' into spec

1525262

minor

7e6224a

Merge branch 'spec' of github.com:LiuXiaoxuanPKU/vllm into spec

93901c8

draft logits

692328a

need to change draft token probs data structure

8b6d647

rejection sampling

675e1ae

rejection sampling

32267f6

pian13131 requested changes Nov 10, 2023

View reviewed changes

LiuXiaoxuanPKU added 12 commits November 12, 2023 10:38

format

1aab040

get draft probs

826b54a

style

b2ec9aa

combine draft_token_ids and output_token_ids in SequenceData

6382396

invalidate kv draft

89d8ba2

fix

9594d08

pass in multiple tokens for generation phase, kv_mqa

6b1e94c

pass scheduler to spec worker

2d5c379

mqa

025bb89

separate sampler

dd23ff7

lots of fix, multi_qa_kv runnable

f1b3987

nan in hidden states

9a85990

pian13131 reviewed Nov 17, 2023

View reviewed changes

LiuXiaoxuanPKU added 4 commits November 16, 2023 20:34

lots of style fix, early break accepting tokens

54bfebd

fix free bug

a904ac9

bug fix

0cb9326

minor fix get target probs in prefill phase

4e9ae6c

pian13131 requested changes Nov 19, 2023

View reviewed changes

LiuXiaoxuanPKU and others added 16 commits November 24, 2023 18:24

fix mismatch between logical and physical blocks!!

0ff36e7

add alphas

d2d67f9

tokenizer & bug fix

7d94cb2

pass tests

b1a5a88

add flag

93c7956

remove speculative decoding for prompt run

141da66

remove temperature, only support all greedy for now

439c88b

clean

40ab8d4

minor

bf2ebe9

merge

179e968

fix & pass tests

664a256

format

7f9a373

remove old files

0540142

remove untouched file

993f2d4

format

c410cbe

format

9f2d98b

github-actions bot added the stale label Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec Draft #3

Spec Draft #3

LiuXiaoxuanPKU commented Nov 6, 2023

pian13131 left a comment

pian13131 Nov 10, 2023

pian13131 Nov 18, 2023

pian13131 Nov 18, 2023

pian13131 Nov 18, 2023

pian13131 Nov 19, 2023

pian13131 Nov 19, 2023

pian13131 Nov 19, 2023

pian13131 Nov 19, 2023

truenorth8 commented Nov 20, 2023

LiuXiaoxuanPKU commented Nov 21, 2023

github-actions bot commented Nov 7, 2024

		def _delete_logical_block(self, block: LogicalTokenBlock) -> None:
		self.logical_token_blocks.remove(block)

		verify_tokens = seq_data.get_verified_token_ids()
		verify_len = len(verify_tokens)

		PAD_TOKEN_ID = 0


		class SpecDecWorker(Worker):

Spec Draft #3

Are you sure you want to change the base?

Spec Draft #3

Conversation

LiuXiaoxuanPKU commented Nov 6, 2023

pian13131 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

truenorth8 commented Nov 20, 2023

LiuXiaoxuanPKU commented Nov 21, 2023

github-actions bot commented Nov 7, 2024