Skip to content
Change the repository type filter

All

    Repositories list

    • UMbreLLa

      Public
      LLM Inference on consumer devices
      Python
      Apache License 2.0
      78393Updated Feb 2, 2025Feb 2, 2025
    • APE

      Public
      Python
      0000Updated Jan 30, 2025Jan 30, 2025
    • RULER

      Public
      This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
      Python
      Apache License 2.0
      64000Updated Jan 30, 2025Jan 30, 2025
    • Sequoia

      Public
      scalable and robust tree-based speculative decoding algorithm
      Python
      3833273Updated Jan 28, 2025Jan 28, 2025
    • A framework for few-shot evaluation of language models.
      Python
      MIT License
      2.1k000Updated Jan 11, 2025Jan 11, 2025
    • S2FT

      Public
      Python
      21200Updated Jan 3, 2025Jan 3, 2025
    • S2FT-Page

      Public
      JavaScript
      0000Updated Dec 30, 2024Dec 30, 2024
    • MagicPIG

      Public
      [ICLR2025] MagicPIG: LSH Sampling for Efficient LLM Generation
      Python
      Apache License 2.0
      1418491Updated Dec 16, 2024Dec 16, 2024
    • MagicDec

      Public
      [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
      Python
      Apache License 2.0
      610660Updated Dec 4, 2024Dec 4, 2024
    • JavaScript
      1000Updated Dec 2, 2024Dec 2, 2024
    • Factor

      Public
      0100Updated Nov 7, 2024Nov 7, 2024
    • Speculative decoding for high-throughput long-context inference
      JavaScript
      Apache License 2.0
      0000Updated Sep 10, 2024Sep 10, 2024
    • Sirius

      Public
      Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its efficiency gain.
      Python
      42100Updated Sep 10, 2024Sep 10, 2024
    • MagicDec: Breaking the Latency-Throughput Tradeoff for Long Contexts with Speculative Decoding
      JavaScript
      Apache License 2.0
      0000Updated Sep 5, 2024Sep 5, 2024
    • TriForce

      Public
      [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
      Python
      1523870Updated Aug 31, 2024Aug 31, 2024
    • JavaScript
      1000Updated May 21, 2024May 21, 2024