This repo contains a comprehensive paper list of Model Quantization for efficient deep learning on AI conferences/journals/arXiv. As a highlight, we categorize the papers in terms of model structures and application scenarios, and label the quantization methods with keywords.
This repo is being actively updated, and contributions in any form to make this list more comprehensive are welcome. Special thanks to collaborator Zhikai Li, and all researchers who have contributed to this repo!
If you find this repo useful, please consider ★STARing and feel free to share it with others!
[Update: Sep, 2024] Add new papers from ICML-24 and IJCAI-24.
[Update: Jul, 2024] Add new papers from CVPR-24.
[Update: May, 2024] Add new papers from ICLR-24.
[Update: Apr, 2024] Add new papers from AAAI-24.
[Update: Nov, 2023] Add new papers from NeurIPS-23.
[Update: Oct, 2023] Add new papers from ICCV-23.
[Update: Jul, 2023] Add new papers from AAAI-23 and ICML-23.
[Update: Jun, 2023] Add new arXiv papers uploaded in May 2023, especially the hot LLM quantization field.
[Update: Jun, 2023] Reborn this repo! New style, better experience!
Keywords: PTQ
: post-training quantization | Non-uniform
: non-uniform quantization | MP
: mixed-precision quantization | Extreme
: binary or ternary quantization
- "A Survey of Quantization Methods for Efficient Neural Network Inference", Book Chapter: Low-Power Computer Vision, 2021. [paper]
- "Full Stack Optimization of Transformer Inference: a Survey", arXiv, 2023. [paper]
- "A White Paper on Neural Network Quantization", arXiv, 2021. [paper]
- "Binary Neural Networks: A Survey", PR, 2020. [Paper] [
Extreme
]
- "ERQ: Error Reduction for Post-Training Quantization of Vision Transformers", ICML, 2024. [paper] [
PTQ
] - "Outlier-aware Slicing for Post-Training Quantization in Vision Transformer", ICML, 2024. [paper] [
PTQ
] - "PTQ4SAM: Post-Training Quantization for Segment Anything", CVPR, 2024. [paper] [
PTQ
] - "Instance-Aware Group Quantization for Vision Transformers", CVPR, 2024. [paper] [
PTQ
] - "Bi-ViT: Pushing the Limit of Vision Transformer Quantization", AAAI, 2024. [paper] [
Extreme
] - "AQ-DETR: Low-Bit Quantized Detection Transformer with Auxiliary Queries", AAAI, 2024. [paper]
- "LRP-QViT: Mixed-Precision Vision Transformer Quantization via Layer-wise Relevance Propagation", arXiv, 2023. [paper] [
PTQ
] [MP
] - "MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer", arXiv, 2023. [paper] [
PTQ
] [MP
] - "I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference", ICCV, 2023. [paper] [code]
- "RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers", ICCV, 2023. [paper] [code] [
PTQ
] - "QD-BEV: Quantization-aware View-guided Distillation for Multi-view 3D Object Detection", ICCV, 2023. [paper]
- "BiViT: Extremely Compressed Binary Vision Transformers", ICCV, 2023. [paper] [
Extreme
] - "Jumping through Local Minima: Quantization in the Loss Landscape of Vision Transformers", ICCV, 2023. [paper]
- "PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile", NeurIPS, 2023. [paper]
- "Oscillation-free Quantization for Low-bit Vision Transformers", ICML, 2023. [paper] [code]
- "PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers", TNNLS, 2023. [paper]
- "Variation-aware Vision Transformer Quantization", arXiv, 2023. [paper]
- "NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers", CVPR, 2023. [paper] [
PTQ
] - "Boost Vision Transformer with GPU-Friendly Sparsity and Quantization", CVPR, 2023. [paper]
- "Q-DETR: An Efficient Low-Bit Quantized Detection Transformer", CVPR, 2023. [paper]
- "Output Sensitivity-Aware DETR Quantization", 2023. [paper]
- "Q-HyViT: Post-Training Quantization for Hybrid Vision Transformer with Bridge Block Reconstruction", arXiv, 2023. [paper] [
PTQ
] - "Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer", NeurIPS, 2022. [paper] [code]
- "Patch Similarity Aware Data-Free Quantization for Vision Transformers", ECCV, 2022. [paper] [code] [
PTQ
] - "PTQ4ViT: Post-Training Quantization for Vision Transformers with Twin Uniform Quantization", ECCV, 2022. [paper] [code] [
PTQ
] - "FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer", IJCAI, 2022. [paper] [code] [
PTQ
] - "Q-ViT: Fully Differentiable Quantization for Vision Transformer", arXiv, 2022. [paper]
- "Post-Training Quantization for Vision Transformer", NeurIPS, 2021. [paper] [
PTQ
]
- "Evaluating Quantized Large Language Models", ICML, 2024. [paper]
- "SqueezeLLM: Dense-and-Sparse Quantization", ICML, 2024. [paper] [
PTQ
] [Non-uniform
] - "KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache", ICML, 2024. [paper]
- "LQER: Low-Rank Quantization Error Reconstruction for LLMs", ICML, 2024. [paper]
- "Extreme Compression of Large Language Models via Additive Quantization", ICML, 2024. [paper]
- "BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization", ICML, 2024. [paper]
- "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs", ICML, 2024. [paper]
- "Compressing Large Language Models by Joint Sparsification and Quantization", ICML, 2024. [paper]
- "FrameQuant: Flexible Low-Bit Quantization for Transformers", ICML, 2024. [paper] [
PTQ
] - "OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models", ICLR, 2024. [paper]"
- "LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models", ICLR, 2024. [paper]
- "SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression", ICLR, 2024. [paper] [
PTQ
] - "QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models", ICLR, 2024. [paper]
- "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models", ICLR, 2024. [paper] [
PTQ
] - "PB-LLM: Partially Binarized Large Language Models", ICLR, 2024. [paper] [
Extreme
] - "AffineQuant: Affine Transformation Quantization for Large Language Models", ICLR, 2024. [paper]
- "Rethinking Channel Dimensions to Isolate Outliers for Low-bit Weight Quantization of Large Language Models", ICLR, 2024. [paper]
- "LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models", ICLR, 2024. [paper]
- "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models", AAAI, 2024. [paper]
- "Norm Tweaking: High-Performance Low-Bit Quantization of Large Language Models", AAAI, 2024. [paper]
- "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", AAAI, 2024. [paper]
- "Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation", AAAI, 2024. [paper] [
PTQ
] - "What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation", AAAI, 2024. [paper]
- "EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs", arXiv, 2024. [paper]
- "IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact", arXiv, 2024. [paper]
- "FlattenQuant: Breaking Through the Inference Compute-bound for Large Language Models with Per-tensor Quantization", arXiv, 2024. [paper]
- "A Comprehensive Evaluation of Quantization Strategies for Large Language Models", arXiv, 2024. [paper]
- "GPTVQ: The Blessing of Dimensionality for LLM Quantization", arXiv, 2024. [paper]
- "APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models", arXiv, 2024. [paper]
- "EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge", arXiv, 2024. [paper]
- "RepQuant: Towards Accurate Post-Training Quantization of Large Transformer Models via Scale Reparameterization", arXiv, 2024. [paper]
- "Accurate LoRA-Finetuning Quantization of LLMs via Information Retention", arXiv, 2024. [paper]
- "KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization", arXiv, 2023. [paper]
- "Extreme Compression of Large Language Models via Additive Quantization", arXiv, 2023. [paper]
- "ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks", arXiv, 2023. [paper] [
PTQ
] - "CBQ: Cross-Block Quantization for Large Language Models", arXiv, 2023. [paper] [
PTQ
] - "FP8-BERT: Post-Training Quantization for Transformer", arXiv, 2023. [paper] [
PTQ
] - "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge", arXiv, 2023. [paper]
- "SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM", arXiv, 2023. [paper] [
PTQ
] - "A Speed Odyssey for Deployable Quantization of LLMs", arXiv, 2023. [paper]
- "AFPQ: Asymmetric Floating Point Quantization for LLMs", arXiv, 2023. [paper]
- "Enabling Fast 2-bit LLM on GPUs: Memory Alignment, Sparse Outlier, and Asynchronous Dequantization", arXiv, 2023. [paper]
- "QLoRA: Efficient Finetuning of Quantized LLMs", NeurIPS, 2023. [paper] [code]
- "QuIP: 2-Bit Quantization of Large Language Models With Guarantees", NeurIPS, 2023. [paper] [code] [
PTQ
] - "Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization", NeurIPS, 2023. [paper]
- "QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources", arXiv, 2023. [paper]
- "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models", arXiv, 2023. [paper]
- "Atom: Low-bit Quantization for Efficient and Accurate LLM Serving", arXiv, 2023. [paper]
- "ZeroQuant-HERO: Hardware-Enhanced Robust Optimized Post-Training Quantization Framework for W8A8 Transformers", arXiv, 2023. [paper]
- "LLM-FP4: 4-Bit Floating-Point Quantized Transformers", arXiv, 2023. [paper]
- "TEQ: Trainable Equivalent Transformation for Quantization of LLMs", arXiv, 2023. [paper]
- "Efficient Post-training Quantization with FP8 Formats", arXiv, 2023. [paper]
- "Probabilistic Weight Fixing: Large-scale training of neural network weight uncertainties for quantization", arXiv, 2023. [paper]
- "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs", arXiv, 2023. [paper]
- "Norm Tweaking: High-performance Low-bit Quantization of Large Language Models", arXiv, 2023. [paper]
- "Understanding the Impact of Post-Training Quantization on Large Language Models", arXiv, 2023. [paper]
- "QuantEase: Optimization-based Quantization for Language Models -- An Efficient and Intuitive Algorithm", arXiv, 2023. [paper]
- "FPTQ: Fine-grained Post-Training Quantization for Large Language Models", arXiv, 2023. [paper]
- "FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs", arXiv, 2023. [paper] [
PTQ
] - "Gradient-Based Post-Training Quantization: Challenging the Status Quo", arXiv, 2023. [paper] [
PTQ
] - "NUPES : Non-Uniform Post-Training Quantization via Power Exponent Search", arXiv, 2023. [paper] [
Non-uniform
] - "ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats", arXiv, 2023. [paper]
- "Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models", arXiv, 2023. [paper]
- "Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study", arXiv, 2023. [paper]
- "INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation", arXiv, 2023. [paper]
- "QIGen: Generating Efficient Kernels for Quantized Inference on Large Language Models", arXiv, 2023. [paper] [code]
- "OWQ: Lessons learned from activation outliers for weight quantization in large language models", arXiv, 2023. [paper] [
PTQ
] - "PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models", arXiv, 2023. [paper]
- "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration", arXiv, 2023. [paper] [
PTQ
] - "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models", arXiv, 2023. [paper]
- "Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling", arXiv, 2023. [paper] [
PTQ
] - "RPTQ: Reorder-based Post-training Quantization for Large Language Models", arXiv, 2023. [paper] [code] [
PTQ
] - "The case for 4-bit precision: k-bit Inference Scaling Laws", ICML, 2023. [paper]
- "Quantized Distributed Training of Large Models with Convergence Guarantees", ICML, 2023. [paper]
- "Understanding Int4 Quantization for Language Models: Latency Speedup, Composability, and Failure Cases", ICML, 2023. [paper]
- "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models", ICML, 2023. [paper] [code] [
PTQ
] - "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers", ICLR, 2023. [papar] [code] [
PTQ
] - "BiBERT: Accurate Fully Binarized BERT", ICLR, 2022. [paper] [code] [
Extreme
] - "BiT: Robustly Binarized Multi-distilled Transformer", NeurIPS, 2022. [paper] [code] [
Extreme
] - "Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models", NeurIPS, 2022. [paper] [code] [
PTQ
] - "LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale", NeurIPS, 2022. [paper] [code]
- "Towards Efficient Post-training Quantization of Pre-trained Language Models", NeurIPS, 2022. [paper] [
PTQ
] - "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers", NeurIPS, 2022. [paper] [code] [
PTQ
] - "Compression of Generative Pre-trained Language Models via Quantization", ACL, 2022. [paper]
- "I-BERT: Integer-only BERT Quantization", ICML, 2021. [paper] [code]
- "BinaryBERT: Pushing the Limit of BERT Quantization", ACL, 2021. [paper] [code] [
Extreme
] - "On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers", ACL, 2021. [paper]
- "Understanding and Overcoming the Challenges of Efficient Transformer Quantization", EMNLP, 2021. [paper] [code]
- "KDLSQ-BERT: A Quantized Bert Combining Knowledge Distillation with Learned Step Size Quantization", arXiv, 2021. [paper]
- "TernaryBERT: Distillation-aware Ultra-low Bit BERT", EMNLP, 2020. [paper] [code] [
Extreme
] - "Extremely Low Bit Transformer Quantization for On-Device Neural Machine Translation", EMNLP, 2020. [paper]
- "GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference", MICRO, 2020. [paper]
- "Towards Fully 8-bit Integer Inference for the Transformer Model", IJCAI, 2020. [paper]
- "Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT", AAAI, 2020. [paper]
- "Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model", ICML, 2019. [paper]
- "Q8BERT: Quantized 8Bit BERT", EMC2 Workshop, 2019. [paper]
- "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models", CVPR, 2024. [paper] [
PTQ
] - "Towards Accurate Post-training Quantization for Diffusion Models", CVPR, 2024. [paper] [
PTQ
] - "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models", ICLR, 2024. [paper]
- "QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning", arXiv, 2024. [paper]
- "Enhanced Distribution Alignment for Post-Training Quantization of Diffusion Models", arXiv, 2023. [paper]
- "Efficient Quantization Strategies for Latent Diffusion Models", arXiv, 2023. [paper] [
PTQ
] - "Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models", arXiv, 2023. [paper]
- "Effective Quantization for Diffusion Models on CPUs", arXiv, 2023. [paper]
- "PTQD: Accurate Post-Training Quantization for Diffusion Models", NeurIPS, 2023. [paper] [
PTQ
] - "Q-DM: An Efficient Low-bit Quantized Diffusion Model", NeurIPS, 2023. [paper]
- "Temporal Dynamic Quantization for Diffusion Models", NeurIPS, 2023. [paper]
- "Q-diffusion: Quantizing Diffusion Models", ICCV, 2023. [paper] [code] [
PTQ
] - "Towards Accurate Data-free Quantization for Diffusion Models", arXiv, 2023. [paper] [
PTQ
] - "Post-training Quantization on Diffusion Models", CVPR, 2023. [paper] [code] [
PTQ
]
- "Sharpness-Aware Data Generation for Zero-shot Quantization", ICML, 2024. [paper]
- "A2Q+: Improving Accumulator-Aware Weight Quantization", ICML, 2024. [paper]
- "HyQ: Hardware-Friendly Post-Training Quantization for CNN-Transformer Hybrid Networks", IJCAI, 2024. [paper] [
PTQ
] - "Retraining-Free Model Quantization via One-Shot Weight-Coupling Learning", CVPR, 2024. [paper] [
MP
] - "Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices", CVPR, 2024. [paper] [
MP
] - "Enhancing Post-training Quantization Calibration through Contrastive Learning", CVPR, 2024. [paper] [
PTQ
] - "Data-Free Quantization via Pseudo-label Filtering", CVPR, 2024. [paper]
- "Make RepVGG Greater Again: A Quantization-Aware Approach", AAAI, 2024. [paper]
- "MetaMix: Meta-State Precision Searcher for Mixed-Precision Activation Quantization", AAAI, 2024. [paper] [
MP
] - "Robustness-Guided Image Synthesis for Data-Free Quantization", AAAI, 2024. [paper]
- "PTMQ: Post-training Multi-Bit Quantization of Neural Networks", AAAI, 2024. [paper] [
PTQ
] - "Trainable Fixed-Point Quantization for Deep Learning Acceleration on FPGAs", arXiv, 2023. [paper]
- "StableQ: Enhancing Data-Scarce Quantization with Text-to-Image Data", arXiv, 2023. [paper]
- "Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers", NeurIPS, 2023. [paper] [
Extreme
] - "TexQ: Zero-shot Network Quantization with Texture Feature Distribution Calibration", NeurIPS, 2023. [paper]
- "Overcoming Forgetting Catastrophe in Quantization-Aware Training", ICCV, 2023. [paper]
- "Causal-DFQ: Causality Guided Data-Free Network Quantization", ICCV, 2023. [paper] [code]
- "DenseShift: Towards Accurate and Efficient Low-Bit Power-of-Two Quantization", ICCV, 2023. [paper]
- "EQ-Net: Elastic Quantization Neural Networks", ICCV, 2023. [paper] [code]
- "A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance", ICCV, 2023. [paper]
- "EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization", ICCV, 2023. [paper] [
MP
] - "Unified Data-Free Compression: Pruning and Quantization without Fine-Tuning", ICCV, 2023. [paper] [
PTQ
] - "Few-bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction", ICML, 2023. [paper] [code]
- "FlexRound: Learnable Rounding based on Element-wise Division for Post-Training Quantization", ICML, 2023. [paper] [
PTQ
] - "Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning", PR, 2023. [paper]
- "OMPQ: Orthogonal Mixed Precision Quantization", AAAI, 2023. [paper] [
MP
] - "Rethinking Data-Free Quantization as a Zero-Sum Game", AAAI, 2023. [paper]
- "Quantized Feature Distillation for Network Quantization", AAAI, 2023. [paper]
- "Resilient Binary Neural Network", AAAI, 2023. [paper] [
Extreme
] - "Fast and Accurate Binary Neural Networks Based on Depth-Width Reshaping", AAAI, 2023. [paper] [
Extreme
] - "Efficient Quantization-aware Training with Adaptive Coreset Selection", arXiv, 2023. [paper]
- "One-Shot Model for Mixed-Precision Quantization", CVPR, 2023. [paper] [
MP
] - "Adaptive Data-Free Quantization", CVPR, 2023. [paper]
- "Bit-shrinking: Limiting Instantaneous Sharpness for Improving Post-training Quantization", CVPR, 2023. [paper] [
PTQ
] - "Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective", CVPR, 2023. [paper] [code] [
PTQ
] - "GENIE: Show Me the Data for Quantization", CVPR, 2023. [paper] [code] [
PTQ
] - "Bayesian asymmetric quantized neural networks", PR, 2023. [paper]
- "Distribution-sensitive Information Retention for Accurate Binary Neural Network", IJCV, 2023. [paper] [
Extreme
] - "SDQ: Stochastic Differentiable Quantization with Mixed Precision", ICML, 2022. [paper] [
MP
] - "Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks", ICML, 2022. [paper] [code]
- "GACT: Activation Compressed Training for Generic Network Architectures", ICML, 2022. [paper] [code]
- "Overcoming Oscillations in Quantization-Aware Training", ICML, 2022. [paper] [code]
- "Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation", CVPR, 2022. [paper] [code] [
Non-uniform
] - "Learnable Lookup Table for Neural Network Quantization", CVPR, 2022. [paper] [code] [
Non-uniform
] - "Mr.BiQ: Post-Training Non-Uniform Quantization based on Minimizing the Reconstruction Error", CVPR, 2022. [paper] [
PTQ
] [Non-uniform
] - "Data-Free Network Compression via Parametric Non-uniform Mixed Precision Quantization", CVPR, 2022. [paper] [
Non-uniform
] [MP
] - "IntraQ: Learning Synthetic Images With Intra-Class Heterogeneity for Zero-Shot Network Quantization", CVPR, 2022. [paper] [code]
- "Instance-Aware Dynamic Neural Network Quantization", CVPR, 2022. [paper]
- "Leveraging Inter-Layer Dependency for Post-Training Quantization", NeurIPS, 2022. [paper] [
PTQ
] - "Theoretically Better and Numerically Faster Distributed Optimization with Smoothness-Aware Quantization Techniques", NeurIPS, 2022. [paper]
- "Entropy-Driven Mixed-Precision Quantization for Deep Network Design", NeurIPS, 2022. [paper] [
MP
] - "Redistribution of Weights and Activations for AdderNet Quantization", NeurIPS, 2022. [paper]
- "FP8 Quantization: The Power of the Exponent", NeurIPS, 2022. [paper] [code]
- "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning", NeurIPS, 2022. [paper] [code] [
PTQ
] - "ClimbQ: Class Imbalanced Quantization Enabling Robustness on Efficient Inferences", NeurIPS, 2022. [paper]
- "Non-Uniform Step Size Quantization for Accurate Post-Training Quantization", ECCV, 2022. [paper] [
PTQ
] [Non-uniform
] - "Towards Accurate Network Quantization with Equivalent Smooth Regularizer", ECCV, 2022. [paper]
- "BASQ: Branch-wise Activation-clipping Search Quantization for Sub-4-bit Neural Networks", ECCV, 2022. [paper] [code]
- "RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization", ECCV, 2022. [paper]
- "Mixed-Precision Neural Network Quantization via Learned Layer-Wise Importance", ECCV, 2022. [paper] [Code] [code] [
MP
] - "Symmetry Regularization and Saturating Nonlinearity for Robust Quantization", ECCV, 2022. [paper]
- "RAPQ: Rescuing Accuracy for Power-of-Two Low-bit Post-training Quantization", IJCAI, 2022. [paper] [code] [
PTQ
] - "MultiQuant: Training Once for Multi-bit Quantization of Neural Networks", IJCAI, 2022. [paper]
- "F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization", ICLR, 2022. [paper]
- "8-bit Optimizers via Block-wise Quantization", ICLR, 2022. [paper] [code]
- "Information Bottleneck: Exact Analysis of (Quantized) Neural Networks", ICLR, 2022. [paper] [code]
- "QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization", ICLR, 2022. [paper] [code] [
PTQ
] - "SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation", ICLR, 2022. [paper] [code] [
PTQ
] - "FILM-QNN: Efficient FPGA Acceleration of Deep Neural Networks with Intra-Layer, Mixed-Precision Quantization", FPGA, 2022. [paper] [
MP
] - "Accurate Post Training Quantization with Small Calibration Sets", ICML, 2021. [paper] [code] [
PTQ
] - "How Do Adam and Training Strategies Help BNNs Optimization?", ICML, 2021. [paper] [code]
- "ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training", ICML, 2021. [paper] [code]
- "HAWQ-V3: Dyadic Neural Network Quantization", ICML, 2021. [paper] [code] [
MP
] - "Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution", ICML, 2021. [paper] [
MP
] - "Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators", ICML, 2021. [paper] [code]
- "Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples", NeurIPS, 2021. [paper] [code]
- "Post-Training Sparsity-Aware Quantization", NeurIPS, 2021. [paper] [code] [
PTQ
] - "Diversifying Sample Generation for Accurate Data-Free Quantization", CVPR, 2021. [paper] [
PTQ
] - "Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks.", CVPR, 2021. [paper] [code]
- "Learnable Companding Quantization for Accurate Low-bit Neural Networks", CVPR, 2021. [paper]
- "Zero-shot Adversarial Quantization", CVPR, 2021. [paper] [code]
- "Network Quantization with Element-wise Gradient Scaling", CVPR, 2021. [paper] [code]
- "High-Capacity Expert Binary Networks", ICLR, 2021. [paper] [code] [
Extreme
] - "Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network", ICLR, 2021. [paper] [code] [
Extreme
] - "BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction", ICLR, 2021. [paper] [code] [
PTQ
] - "Neural gradients are near-lognormal: improved quantized and sparse training", ICLR, 2021. [paper]
- "Training with Quantization Noise for Extreme Model Compression", ICLR, 2021. [paper]
- "BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization", ICLR, 2021. [paper] [code] [
MP
] - "Simple Augmentation Goes a Long Way: ADRL for DNN Quantization", ICLR, 2021. [paper]
- "Distribution Adaptive INT8 Quantization for Training CNNs", AAAI, 2021. [paper]
- "Stochastic Precision Ensemble: Self‐Knowledge Distillation for Quantized Deep Neural Networks", AAAI, 2021. [paper]
- "Optimizing Information Theory Based Bitwise Bottlenecks for Efficient Mixed-Precision Activation Quantization", AAAI, 2021. [paper] [
MP
] - "OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization", AAAI, 2021. [paper]
- "Scalable Verification of Quantized Neural Networks", AAAI, 2021. [paper] [code]
- "Uncertainty Quantification in CNN through the Bootstrap of Convex Neural Networks", AAAI, 2021. [paper]
- "FracBits: Mixed Precision Quantization via Fractional Bit-Widths", AAAI, 2021. [paper] [
MP
] - "Post-training Quantization with Multiple Points: Mixed Precision without Mixed Precision", AAAI, 2021. [paper] [
PTQ
] [MP
] - "ZeroQ: A Novel Zero Shot Quantization Framework", CVPR, 2020. [paper] [code] [
PTQ
] - "LSQ+: Improving Low-bit Quantization Through Learnable Offsets and Better Initialization", CVPR, 2020. [paper]
- "HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks", NeurIPS, 2020. [paper] [
MP
] - "Learned step size quantization", ICLR, 2020. [paper]
- "HAWQ: Hessian AWare Quantization of Neural Networks With Mixed-Precision", ICCV, 2019. [paper] [
MP
] - "Data-Free Quantization Through Weight Equalization and Bias Correction", ICCV, 2019. [paper] [
PTQ
] - "HAQ: Hardware-Aware Automated Quantization with Mixed Precision", CVPR, 2019. [paper] [code] [
MP
] - "PACT: Parameterized Clipping Activation for Quantized Neural Networks", arXiv, 2018. [paper]
- "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference", CVPR, 2018. [paper]
- "Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector", CVPR, 2024. [paper] [
PTQ
] - "Improving Post-Training Quantization on Object Detection with Task Loss-Guided Lp Metric", arXiv, 2023. [paper] [
PTQ
] - "AQD: Towards Accurate Quantized Object Detection", CVPR, 2021. [paper]
- "BiDet: An Efficient Binarized Object Detector", CVPR, 2020. [paper] [code] [
Extreme
] - "Fully Quantized Network for Object Detection", CVPR, 2019. [paper]
- "QuantSR: Accurate Low-bit Quantization for Efficient Image Super-Resolution", NeurIPS, 2023. [paper]
- "Toward Accurate Post-Training Quantization for Image Super Resolution", CVPR, 2023. [paper] [code] [
PTQ
] - "EBSR: Enhanced Binary Neural Network for Image Super-Resolution", arXiv, 2023. [paper] [
Extreme
] - "CADyQ: Content-Aware Dynamic Quantization for Image Super-Resolution ", ECCV, 2022. [paper] [code]
- "Dynamic Dual Trainable Bounds for Ultra-low Precision Super-Resolution Networks", ECCV, 2022. [paper] [code]
- "DAQ: Channel-Wise Distribution-Aware Quantization for Deep Image Super-Resolution Networks", WACV, 2022. [paper] [code]
- "Fully Quantized Image Super-Resolution Networks", ACM MM, 2021. [paper] [code]
- "PAMS: Quantized Super-Resolution via Parameterized Max Scale", ECCV, 2020. [paper] [code]
- "Training Binary Neural Network without Batch Normalization for Image Super-Resolution", AAAI, 2021. [paper] [
Extreme
]
- "LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection", ICLR, 2024. [paper] [
PTQ
] - "Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis", arXiv, 2023. [paper] [
Extreme
] - "BiPointNet: Binary Neural Network for Point Clouds", ICLR, 2021. [paper] [code] [
Extreme
]