[Question] Document/examples to enable draft model speculative decoding using c++ executor API #2424

ynwang007 · 2024-11-07T19:46:40Z

Hi,

I am interested to use a draft model as speculative decoding, and the only example I found is: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/draft_target_model

We use tensorRT LLM (c++ runtime) with the python executor interface: https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/tensorrt_llm/pybind/bindings.cpp, can anyone provide instructions regarding how to support draft model speculative decoding on top of that?

If I understand it correctly, we have to implement the logic to generate draft tokens for each iteration ourselves and then pass them to the target model executor? Is there a way to have an executor API to do the work for us? Thanks!

achartier · 2024-11-08T04:56:59Z

That's correct, you can find an example using ExternalDraftTokensConfig in https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/model_runner_cpp.py#L628

An example using the C++ executor API will be provided next update.

achartier · 2024-11-14T03:29:40Z

See https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/cpp/executor/executorExampleFastLogits.cpp

hello-11 added question Further information is requested triaged Issue has been triaged by maintainers labels Nov 8, 2024

hello-11 assigned achartier Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Document/examples to enable draft model speculative decoding using c++ executor API #2424

[Question] Document/examples to enable draft model speculative decoding using c++ executor API #2424

ynwang007 commented Nov 7, 2024

achartier commented Nov 8, 2024

achartier commented Nov 14, 2024

[Question] Document/examples to enable draft model speculative decoding using c++ executor API #2424

[Question] Document/examples to enable draft model speculative decoding using c++ executor API #2424

Comments

ynwang007 commented Nov 7, 2024

achartier commented Nov 8, 2024

achartier commented Nov 14, 2024