Skip to content

πŸ“š Collection of awesome generation acceleration resources copy

Notifications You must be signed in to change notification settings

joelulu/Awesome-Generation-Acceleration

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 

Repository files navigation

π™°πš πšŽπšœπš˜πš–πšŽ π™ΆπšŽπš—πšŽπš›πšŠπšπš’πš˜πš— π™°πšŒπšŒπšŽπš•πšŽπš›πšŠπšπš’πš˜πš—

Awesome papercount Maintenance Last Commit GitHub

πŸ”₯ News

  • 2024/10/12 πŸš€πŸš€ We release our work ToCa about accelerating diffusion transformers for FREE, which achieves nearly lossless acceleration of 2.36Γ— on OpenSora!

  • 2024/07/15 πŸ€—πŸ€— We release an open-sourse repo "Awesome-Generation-Acceleration", which collects recent awesome generation accleration papers!

πŸ“š Contents

πŸ’¬ Keywords

πŸ“ Papers

Fast Sampling

  • [1] Denoising Diffusion Implicit Models, ICLR 2021.

    Song, Jiaming and Meng, Chenlin and Ermon, Stefano.

    [Paper] [Code]

  • [2] DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps, NeurIPS 2022.

    Lu, Cheng and Zhou, Yuhao and Bao, Fan and Chen, Jianfei and Li, Chongxuan and Zhu, Jun.

    [Paper] [Code]

  • [3] DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models, arXiv 2022.

    Lu, Cheng and Zhou, Yuhao and Bao, Fan and Chen, Jianfei and Li, Chongxuan and Zhu, Jun.

    [Paper] [Code]

  • [4] Jump Your Steps: Optimizing Sampling Schedule of Discrete Diffusion Models, arXiv 2022.

    Yong-Hyun Park and Chieh-Hsin Lai and Satoshi Hayakawa and Yuhta Takida and Yuki Mitsufuji.

    [Paper] [Code]

  • [5] AdaDiff: Adaptive Step Selection for Fast Diffusion, arXiv 2023.

    Hui Zhang, Zuxuan Wu, Zhen Xing, Jie Shao, Yu-Gang Jiang.

    [Paper] [Code]

  • [6] DuoDiff: Accelerating Diffusion Models with a Dual-Backbone Approach, arXiv 2024.

    Daniel Gallo FernΓ‘ndez, Rǎzvan-Andrei Matişan, Alejandro Monroy MuΓ±oz, Ana-Maria Vasilcoiu, Janusz Partyka, Tin HadΕΎi VeljkoviΔ‡, Metod Jazbec.

    [Paper] [Code]

Pruning

  • [1] Token Merging for Fast Stable Diffusion, CVPRW 2023.

    Bolya, Daniel and Hoffman, Judy.

    [Paper] [Code]

  • [2] Structural Pruning for Diffusion Models, NeurIPS 2023.

    Fang, Gongfan and Ma, Xinyin and Wang, Xinchao.

    [Paper] [Code]

  • [3] Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models, CVPR 2024.

    Wang, Hongjie and Liu, Difan and Kang, Yan and Li, Yijun and Lin, Zhe and Jha, Niraj K and Liu, Yuchen.

    [Paper] [Code]

  • [4] LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models, arXiv 2024.

    Zhang, Dingkun and Li, Sijia and Chen, Chen and Xie, Qingsong and Lu, Haonan.

    [Paper] [Code]

  • [5] SparseDM: Toward Sparse Efficient Diffusion Models, arXiv 2024.

    Wang, Kafeng and Chen, Jianfei and Li, He and Mi, Zhenpeng and Zhu, Jun.

    [Paper] [Code]

  • [6] Token Fusion: Bridging the Gap between Token Pruning and Token Merging, WACV 2024.

    Kim, Minchul and Gao, Shangqian and Hsu, Yen-Chang and Shen, Yilin and Jin, Hongxia.

    [Paper] [Code]

  • [7] Token Caching for Diffusion Transformer Acceleration, arXiv 2024.

    Jinming Lou and Wenyang Luo and Yufan Liu and Bing Li and Xinmiao Ding and Weiming Hu and Jiajiong Cao and Yuming Li and Chenguang Ma.

    [Paper] [Code]

  • [8] Dynamic Diffusion Transformer, arXiv 2024.

    Wangbo Zhao and Yizeng Han and Jiasheng Tang and Kai Wang and Yibing Song and Gao Huang and Fan Wang and Yang You.

    [Paper] [Code]

  • [9] ToDo: Token Downsampling for Efficient Generation of High-Resolution Images, IJCAIw 2024.

    Smith, Ethan and Saxena, Nayan and Saha, Aninda.

    [Paper] [Code]

  • [10] Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Models, ECCV 2024.

    Ju, Chen and Wang, Haicheng and Li, Zeqian and Chen, Xu and Zhai, Zhonghua and Huang, Weilin and Xiao, Shuai.

    [Paper] [Code]

  • [11] F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis, AAAI 2024.

    Su, Sitong and Liu, Jianzhi and Gao, Lianli and Song, Jingkuan.

    [Paper] [Code]

  • [12] DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization, NeurIPS 2024.

    Zhu, Haowei and Tang, Dehua and Liu, Ji and Lu, Mingjie and Zheng, Jintu and Peng, Jinzhang and Li, Dong and Wang, Yu and Jiang, Fan and Tian, Lu and others.

    [Paper] [Code]

  • [13] Importance-based Token Merging for Diffusion Models, arXiv 2024.

    Wu, Haoyu and Xu, Jingyi and Le, Hieu and Samaras, Dimitris.

    [Paper] [Code]

Quantization

  • [1] Post-training Quantization on Diffusion Models, CVPR 2023.

    Shang, Yuzhang and Yuan, Zhihang and Xie, Bin and Wu, Bingzhe and Yan, Yan.

    [Paper] [Code]

  • [2] Temporal Dynamic Quantization for Diffusion Models, NeurIPS 2023.

    So, Junhyuk and Lee, Jungwon and Ahn, Daehyun and Kim, Hyungjun and Park, Eunhyeok.

    [Paper] [Code]

  • [3] QVD: Post-training Quantization for Video Diffusion Models, arXiv 2024.

    Tian, Shilong and Chen, Hong and Lv, Chengtao and Liu, Yu and Guo, Jinyang and Liu, Xianglong and Li, Shengxi and Yang, Hao and Xie, Tao.

    [Paper] [Code]

  • [4] VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers, arXiv 2024.

    Deng, Juncan and Li, Shuaiting and Wang, Zeyu and Gu, Hong and Xu, Kedong and Huang, Kejie.

    [Paper] [Code]

  • [5] DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing, arXiv 2024.

    Dong, Zhenyuan and Zhang, Sai Qian.、

    [Paper] [Code]

  • [6] Q-dit: Accurate post-training quantization for diffusion transformers, arXiv 2024.

    Chen, Lei and Meng, Yuan and Tang, Chen and Ma, Xinzhu and Jiang, Jingyan and Wang, Xin and Wang, Zhi and Zhu, Wenwu.

    [Paper] [Code]

  • [οΌ—] SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models, arXiv 2024.

    Muyang Li and Yujun Lin and Zhekai Zhang and Tianle Cai and Xiuyu Li and Junxian Guo and Enze Xie and Chenlin Meng and Jun-Yan Zhu and Song Han.

    [Paper] [Code] γ€€

Distillation

  • [1] Progressive Distillation for Fast Sampling of Diffusion Models, ICLR 2022.

    Salimans, Tim and Ho, Jonathan.

    [Paper] [Code]

  • [2] SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds, NeurIPS 2023.

    Li, Yanyu and Wang, Huan and Jin, Qing and Hu, Ju and Chemerys, Pavlo and Fu, Yun and Wang, Yanzhi and Tulyakov, Sergey and Ren, Jian.

    [Paper] [Code]

  • [3] BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion, ECCV 2024.

    Kim, Bo-Kyeong and Song, Hyoung-Kyu and Castells, Thibault and Choi, Shinkook.

    [Paper] [Code]

  • [4] Accelerating Diffusion Models with One-to-Many Knowledge Distillation, arXiv 2024.

    Linfeng Zhang and Kaisheng Ma.

    [Paper] [Code]

  • [5] Relational Diffusion Distillation for Efficient Image Generation, ACM MM 2024.

    Weilun Feng and Chuanguang Yang and Zhulin An and Libo Huang and Boyu Diao and Fei Wang and Yongjun Xu.

    [Paper] [Code]

Cache Mechanism

  • [1] Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models, arXiv 2023.

    Li, Senmao and Hu, Taihang and Khan, Fahad Shahbaz and Li, Linxuan and Yang, Shiqi and Wang, Yaxing and Cheng, Ming-Ming and Yang, Jian.

    [Paper] [Code]

  • [2] DeepCache: Accelerating Diffusion Models for Free, CVPR 2024.

    Ma, Xinyin and Fang, Gongfan and Wang, Xinchao.

    [Paper] [Code]

  • [3] βˆ†-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers, arXiv 2024.

    Chen, Pengtao and Shen, Mingzhu and Ye, Peng and Cao, Jianjian and Tu, Chongjun and Bouganis, Christos-Savvas and Zhao, Yiren and Chen, Tao.

    [Paper] [Code]

  • [4] FORA: Fast-Forward Caching in Diffusion Transformer Acceleration, arXiv 2024.

    Selvaraju, Pratheba and Ding, Tianyu and Chen, Tianyi and Zharkov, Ilya and Liang, Luming.

    [Paper] [Code]

  • [5] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching, NeurIPS 2024.

    Ma, Xinyin and Fang, Gongfan and Mi, Michael Bi and Wang, Xinchao.

    [Paper] [Code]

  • [6] Cache Me if You Can: Accelerating Diffusion Models through Block Caching, CVPR 2024.

    Wimbauer, Felix and Wu, Bichen and Schoenfeld, Edgar and Dai, Xiaoliang and Hou, Ji and He, Zijian and Sanakoyeu, Artsiom and Zhang, Peizhao and Tsai, Sam and Kohler, Jonas and others.

    [Paper] [Code]

  • [7] Token Caching for Diffusion Transformer Acceleration, arXiv 2024.

    Jinming Lou and Wenyang Luo and Yufan Liu and Bing Li and Xinmiao Ding and Weiming Hu and Jiajiong Cao and Yuming Li and Chenguang Ma.

    [Paper] [Code]

  • [8] HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration, arXiv 2024.

    Yushi Huang and Zining Wang and Ruihao Gong and Jing Liu and Xinjie Zhang and Jun Zhang.

    [Paper] [Code]

  • [9] Accelerating Diffusion Transformers with Token-wise Feature Caching, arXiv 2024.

    Chang Zou and Xuyang Liu and Ting Liu and Siteng Huang and Linfeng Zhang.

    [Paper] [Code]

  • [10] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality, arXiv 2024.

    Zhengyao Lv and Chenyang Si and Junhao Song and Zhenyu Yang and Yu Qiao and Ziwei Liu and Kwan-Yee K. Wong.

    [Paper] [Code]

  • [11] Adaptive Caching for Faster Video Generation with Diffusion Transformers, arXiv 2024.

    Kumara Kahatapitiya and Haozhe Liu and Sen He and Ding Liu and Menglin Jia and Michael S. Ryoo and Tian Xie.

    [Paper] [Code]

  • [1οΌ’] Ca2-VDM: Efficient Autoregressive Video Diffusion Modelγ€€with Causal Generation and Cache Sharing, arXiv 2024.

    Kaifeng Gao and Jiaxin Shi and Hanwang Zhang and Chunping Wang and Jun Xiao and Long Chen.

    [Paper] [Code]

  • [13] Accelerating Vision Diffusion Transformers with Skip Branches, arXiv 2024.

    Guanjie Chen and Xinyu Zhao and Yucheng Zhou and Tianlong Chen and Cheng Yu.

    [Paper] [Code]

Deployment Optimization

  • [1] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models, CVPR 2024 Highlight.

    Li, Muyang and Cai, Tianle and Cao, Jiaxin and Zhang, Qinsheng and Cai, Han and Bai, Junjie and Jia, Yangqing and Li, Kai and Han, Song.

    [Paper] [Code]

  • [2] PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models, arXiv 2024.

    Wang, Jiannan and Fang, Jiarui and Li, Aoyu and Yang, PengCheng.

    [Paper] [Code]

  • [3] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising, NeurIPS 2024.

    Chen, Zigeng and Ma, Xinyin and Fang, Gongfan and Tan, Zhenxiong and Wang, Xinchao.

    [Paper] [Code]

  • [4] Fast and Memory-Efficient Video Diffusion Using Streamlined Inference, NeurIPS 2024.

    Zheng Zhan and Yushu Wu and Yifan Gong and Zichong Meng and Zhenglun Kong and Changdi Yang and Geng Yuan and Pu Zhao and Wei Niu and Yanzhi Wang.

    [Paper] [Code]

Others

  • [1] Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators, ECCV 2024.

    Yifan Pu and Zhuofan Xia and Jiayi Guo and Dongchen Han and Qixiu Li and Duo Li and Yuhui Yuan and Ji Li and Yizeng Han and Shiji Song and Gao Huang and Xiu Li.

    [Paper] [Code]

πšƒπš›πšŠπš’πš—πš’πš—πš-πšπš›πšŽπšŽ π™ΆπšŽπš—πšŽπš›πšŠπšπš’πš˜πš— π™°πšŒπšŒπšŽπš•πšŽπš›πšŠπšπš’πš˜πš—

πŸ“’ π™²πš˜πš•πš•πšŽπšŒπšπš’πš˜πš— 𝚘𝚏 π™°πš πšŽπšœπš˜πš–πšŽ πšƒπš›πšŠπš’πš—πš’πš—πš-πšπš›πšŽπšŽ π™ΆπšŽπš—πšŽπš›πšŠπšπš’πš˜πš— π™°πšŒπšŒπšŽπš•πšŽπš›πšŠπšπš’πš˜πš— πšπšŽπšœπš˜πšžπš›πšŒπšŽπšœ.

Training-free Stable Diffusion Acceleration

Base modals: Stable Diffusion, Stable Video Diffusion and Text2Video-Zero.

  • [1] Token Merging for Fast Stable Diffusion, CVPRW 2023.

    Bolya, Daniel and Hoffman, Judy.

    [Paper] [Code]

  • [2] AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration, ICCV 2023.

    Li, Lijiang and Li, Huixia and Zheng, Xiawu and Wu, Jie and Xiao, Xuefeng and Wang, Rui and Zheng, Min and Pan, Xin and Chao, Fei and Ji, Rongrong.

    [Paper] [Code]

  • [3] Structural Pruning for Diffusion Models, NeurIPS 2023.

    Fang, Gongfan and Ma, Xinyin and Wang, Xinchao.

    [Paper] [Code]

  • [4] Faster Diffusion: Rethinking the Role of UNet Encoder in Diffusion Models, NeurIPS 2024.

    Li, Senmao and Hu, Taihang and Khan, Fahad Shahbaz and Li, Linxuan and Yang, Shiqi and Wang, Yaxing and Cheng, Ming-Ming and Yang, Jian.

    [Paper] [Code]

  • [5] DeepCache: Accelerating Diffusion Models for Free, CVPR 2024.

    Ma, Xinyin and Fang, Gongfan and Wang, Xinchao.

    [Paper] [Code]

  • [6] Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models, CVPR 2024.

    Wang, Hongjie and Liu, Difan and Kang, Yan and Li, Yijun and Lin, Zhe and Jha, Niraj K and Liu, Yuchen.

    [Paper] [Code]

  • [7] PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future, arXiv 2024.

    Guangyi Wang and Yuren Cai and Lijiang Li and Wei Peng and Songzhi Su.

    [Paper] [Code]

  • [8] Token Fusion: Bridging the Gap between Token Pruning and Token Merging, WACV 2024.

    Kim, Minchul and Gao, Shangqian and Hsu, Yen-Chang and Shen, Yilin and Jin, Hongxia.

    [Paper] [Code]

  • [9] Agent Attention: On the Integration of Softmax and Linear Attention, ECCV 2024.

    Han, Dongchen and Ye, Tianzhu and Han, Yizeng and Xia, Zhuofan and Song, Shiji and Huang, Gao.

    [Paper] [Code]

  • [10] T-GATE: Temporally Gating Attention to Accelerate Diffusion Model for Free!, arXiv 2024.

    Zhang, Wentian and Liu, Haozhe and Xie, Jinheng and Faccio, Francesco and Shou, Mike Zheng and Schmidhuber, J{"u}rgen.

    [Paper] [Code]

  • [11] Faster Diffusion via Temporal Attention Decomposition, arXiv 2024.

    Liu, Haozhe and Zhang, Wentian and Xie, Jinheng and Faccio, Francesco and Xu, Mengmeng and Xiang, Tao and Shou, Mike Zheng and Perez-Rua, Juan-Manuel and Schmidhuber, J{"u}rgen}.

    [Paper] [Code]

  • [12] ToDo: Token Downsampling for Efficient Generation of High-Resolution Images, IJCAIw 2024.

    Smith, Ethan and Saxena, Nayan and Saha, Aninda.

    [Paper] [Code]

  • [13] Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Models, ECCV 2024.

    Ju, Chen and Wang, Haicheng and Li, Zeqian and Chen, Xu and Zhai, Zhonghua and Huang, Weilin and Xiao, Shuai.

    [Paper] [Code]

  • [14] F3-Pruning: A Training-Free and Generalized Pruning Strategy towards Faster and Finer Text-to-Video Synthesis, AAAI 2024.

    Su, Sitong and Liu, Jianzhi and Gao, Lianli and Song, Jingkuan.

    [Paper] [Code]

  • [15] Fast and Memory-Efficient Video Diffusion Using Streamlined Inference, NeurIPS 2024.

    Zheng Zhan and Yushu Wu and Yifan Gong and Zichong Meng and Zhenglun Kong and Changdi Yang and Geng Yuan and Pu Zhao and Wei Niu and Yanzhi Wang.

    [Paper] [Code]

  • [16] Importance-based Token Merging for Diffusion Models, arXiv 2024.

    Wu, Haoyu and Xu, Jingyi and Le, Hieu and Samaras, Dimitris.

    [Paper] [Code]

Training-free Diffusion Transformer Acceleration

Base modals: DiT-XL for Image Generation, PIXART-Ξ± for Text2Image, Open-Sora and Open-Sora-Plan for Text2Video.

  • [1] βˆ†-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers, arXiv 2024.

    Chen, Pengtao and Shen, Mingzhu and Ye, Peng and Cao, Jianjian and Tu, Chongjun and Bouganis, Christos-Savvas and Zhao, Yiren and Chen, Tao.

    [Paper] [Code]

  • [2] FORA: Fast-Forward Caching in Diffusion Transformer Acceleration, arXiv 2024.

    Selvaraju, Pratheba and Ding, Tianyu and Chen, Tianyi and Zharkov, Ilya and Liang, Luming.

    [Paper] [Code]

  • [3] DiTFastAttn: Attention Compression for Diffusion Transformer Models, NeurIPS 2024.

    Yuan, Zhihang and Lu, Pu and Zhang, Hanling and Ning, Xuefei and Zhang, Linfeng and Zhao, Tianchen and Yan, Shengen and Dai, Guohao and Wang, Yu.

    [Paper] [Code]

  • [4] Real-Time Video Generation with Pyramid Attention Broadcast, arXiv 2024.

    Xuanlei Zhao and Xiaolong Jin and Kai Wang and Yang You.

    [Paper] [Code]

  • [5] Accelerating Diffusion Transformers with Token-wise Feature Caching, arXiv 2024.

    Chang Zou and Xuyang Liu and Ting Liu and Siteng Huang and Linfeng Zhang.

    [Paper] [Code]

  • [6] FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality, arXiv 2024.

    Zhengyao Lv and Chenyang Si and Junhao Song and Zhenyu Yang and Yu Qiao and Ziwei Liu and Kwan-Yee K. Wong.

    [Paper] [Code]

  • [7] Adaptive Caching for Faster Video Generation with Diffusion Transformers, arXiv 2024.

    Kumara Kahatapitiya and Haozhe Liu and Sen He and Ding Liu and Menglin Jia and Michael S. Ryoo and Tian Xie.

    [Paper] [Code]

Training-free Auto-Regressive Generation Acceleration

Base modals: Anole and Lumina-mGPT for Text2Image.

  • [1] Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding, arXiv 2024.

    Yao Teng and Han Shi and Xian Liu and Xuefei Ning and Guohao Dai and Yu Wang and Zhenguo Li and Xihui Liu.

    [Paper] [Code]

πŸ’Œ Contact

For any question, please email [email protected].

About

πŸ“š Collection of awesome generation acceleration resources copy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published