- Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples
- Post-Training Quantization for Vision Transformer
- Post-Training Sparsity-Aware Quantization
- BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer
- Pruning Randomly Initialized Neural Networks with Iterative Randomization
- Rethinking the Pruning Criteria for Convolutional Neural Network
- QPP: Real-Time Quantization Parameter Prediction for Deep Neural Networks
- Automated Log-Scale Quantization for Low-Cost Deep Neural Networks
- AQD: Towards Accurate Quantized Object Detection
- Diversifying Sample Generation for Accurate Data-Free Quantization
- Learnable Companding Quantization for Accurate Low-bit Neural Networks
- Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks
- Convolutional Neural Network Pruning with Structural Redundancy Reduction
- Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
- Manifold Regularized Dynamic Network Pruning
- Network Pruning via Performance Maximization
- Towards Efficient Tensor Decomposition-Based DNN Model Compression with Optimization Framework
- Towards Compact CNNs via Collaborative Compression
- Content-Aware GAN Compression
- Bi-GCN: Binary Graph Convolutional Network
- Binary Graph Neural Networks
- Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework
- SparseBERT: Rethinking the Importance Analysis in Self-attention
- Group Fisher Pruning for Practical Network Compression
- A Probabilistic Approach to Neural Network Pruning
- Differentiable Dynamic Quantization with Mixed Precision and Adaptive Resolution
- HAWQ-V3: Dyadic Neural Network Quantization
- I-BERT: Integer-only BERT Quantization
- Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming
- Training Quantized Neural Networks to Global Optimality via Semidefinite Programming
- A Gradient Flow Framework for Analyzing Network Pruning
- Group Fisher Pruning for Practical Network Compression
- Degree-quant: Quantization-aware Training for Graph Neural Networks
- Training With Quantization Noise for Extreme Model Compression
- Brecq: Pushing the Limit of Post-training Quantization by Block Reconstruction
- Neural Gradients Are Near-lognormal: Improved Quantized and Sparse Training
- Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks
- Bipointnet: Binary Neural Network for Point Clouds
- Faster Binary Embeddings for Preserving Euclidean Distances
- Growing Efficient Deep Networks by Structured Continuous Sparsification
- CPT: Efficient Deep Neural Network Training Via Cyclic Precision
- Mixkd: Towards Efficient Distillation of Large-scale Language Models
- Knowledge Distillation As Semiparametric Inference
- A Teacher-student Framework to Distill Future Trajectories
- Is Label Smoothing Truly Incompatible with Knowledge Distillation: an Empirical Study
- Rethinking Soft Labels for Knowledge Distillation: a Bias-variance Tradeoff Perspective
- Neural Attention Distillation: Erasing Back-door Triggers from Deep Neural Networks
- Knowledge Distillation Via Softmax Regres-sion Representation Learning