Change the repository type filter
All
Repositories list
3 repositories
FlexLLMGen
PublicRunning large language models on a single GPU for throughput-oriented scenarios.H2O
Public[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.DejaVu
Public