- Revisiting Pruning at Initialization through the Lens of Ramanujan Graph
- Minimum Variance Unbiased N:M Sparsity for the Neural Gradients
- Pruning Deep Neural Networks from a Sparsity Perspective
- LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification
- A Unified Framework for Soft Threshold Pruning
- Unmasking the Lottery Ticket Hypothesis: What’s Encoded in a Winning Ticket's Masks
- Searching Lottery Tickets in Graph Neural Networks: A Dual Perspective
- Token Merging: Your VIT but Faster
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
- Accurate Neural Training with 4-bit Matrix Multiplications at Standard Formats
- Aggregation-Aware Quantization for Graph Neural Networks
- OPTQ: Accurate Quantization for Generative Pre-trained Transformers
- Globally Optimal Training of Neural Networks with Threshold Activation Functions
- FIT: A Metric for Model Sensitivity
- Oscillation-Free Quantization for Low-Bit Vision Transformers
- Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
- Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks
- Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
- SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot
- Pruning via Sparsity-indexed ODE: a Continuous Sparsity Viewpoint
- Gradient-Free Structured Pruning with Unlabeled Data
- UPSCALE: Unconstrained Channel Pruning
- Why Random Pruning Is All We Need to Start Sparse
- Fast as CHITA: Neural Network Pruning with Combinatorial Optimization
- LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression
- Revisiting Data-Free Knowledge Distillation with Poisoned Teachers
- Understanding Self-Distillation in the Presence of Label Noise
- Dataset Distillation with Convexified Implicit Gradients
- Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory
- All in a Row: Compressed Convolution Networks for Graphs
- Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming
- COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
- Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing
- QLoRA: Efficient Finetuning of Quantized LLMs