Skip to content
/ Jakiro Public

This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE"

License

Notifications You must be signed in to change notification settings

haiduo/Jakiro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Jakiro

Jakiro

Paper (Jakiro)

Version Maintenance Contributions welcome

Speedup ratios of different models on the MT-bench under non-greedy settings.

benchmark

Jakiro is an advanced approach designed to enhance speculative decoding (SD) for large language models. By integrating Mixture of Experts (MoE), Jakiro enables independent experts to generate diverse predictions, effectively decoupling correlations among candidates and addressing a key limitation of traditional tree-based sampling. Jakiro significantly boosts prediction accuracy and inference speed, setting a new state-of-the-art (SOTA) in speculative decoding. Extensive experiments across various models demonstrate its robustness and effectiveness in real-world applications.

Test demo

The following shows the actual measured inference speeds of Jakiro and EAGLE-2 on a single RTX 4090 GPU with 24GB of memory using the Vicuna 7B model. As shown, Jakiro has a faster decoding speed and a higher compression ratio.

EAGLE-2 Demo Jakiro Demo

Code

The code is currently being organized and will be released soon. Stay tuned!

Reference

For technical details and full experimental results, please check the paper of Jakiro.

@misc{huang2025jakiroboostingspeculativedecoding,
      title={Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE}, 
      author={Haiduo Huang and Fuwei Yang and Zhenhua Liu and Yixing Xu and Jinze Li and Yang Liu and Xuanwu Yin and Dong Li and Pengju Ren and Emad Barsoum},
      year={2025},
      eprint={2502.06282},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.06282}, 
}

Acknowledgements

This project has been influenced by many excellent projects in the LLM community, such as EAGLE, Medusa, FastChat, and others. The logo is designed by GPT-4o.

About

This repository is the official implementation of "Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published