Skip to content

Latest commit

 

History

History
180 lines (157 loc) · 24.1 KB

README.md

File metadata and controls

180 lines (157 loc) · 24.1 KB

Awesome-SLM Awesome

Awesome-SLM: a curated list of Small Language Model

🔥 Small Language Models(SLM) are streamlined versions of large language models designed to retain much of the original capabilities while being more efficient and manageable. Here is a curated list of papers about small language models.

Table of Content

Milestone Papers

Date keywords Institute Paper Publication
2017-06 Transformers Google Attention Is All You Need NeurIPS
Dynamic JSON Badge
2018-06 GPT 1.0 OpenAI Improving Language Understanding by Generative Pre-Training Dynamic JSON Badge
2018-10 BERT Google BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding NAACL
Dynamic JSON Badge
2019-10 T5 Google Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer JMLR
Dynamic JSON Badge
2020-01 Scaling Law OpenAI Scaling Laws for Neural Language Models Dynamic JSON Badge
2021-09 FLAN Google Finetuned Language Models are Zero-Shot Learners ICLR
Dynamic JSON Badge
2021-10 T0 HuggingFace et al. Multitask Prompted Training Enables Zero-Shot Task Generalization ICLR
Dynamic JSON Badge
2022-01 LaMDA Google LaMDA: Language Models for Dialog Applications Dynamic JSON Badge
2022-06 Emergent Abilities Google Emergent Abilities of Large Language Models TMLR
Dynamic JSON Badge
2023-02 LLaMA Meta LLaMA: Open and Efficient Foundation Language Models Dynamic JSON Badge
2023-03 Alpaca Stanford Alpaca: A Strong, Replicable Instruction-Following Model No paper
2023-06 Orca Microsoft Orca: Progressive Learning from Complex Explanation Traces of GPT-4 Dynamic JSON Badge
2023-07 LLaMA 2 Meta Llama 2: Open Foundation and Fine-Tuned Chat Models Dynamic JSON Badge
2023-07 Stable Beluga stability.ai Meet Stable Beluga 1 and Stable Beluga 2, Our Large and Mighty Instruction Fine-Tuned Language Models No paper
2023-09 Xgen 7B Salesforce Research XGen-7B Technical Report
Dynamic JSON Badge
2023-09 Qwen Alibaba QWEN TECHNICAL REPORT
Dynamic JSON Badge
2023-10 Mistral 7B Mistral Mistral 7B
Dynamic JSON Badge
2023-10 Zephyr 7B HuggingFace et al. Zephyr: Direct Distillation of LM Alignment
Dynamic JSON Badge
2023-10 MBT 7B Mosaic Research Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs No paper
2023-11 Falcon 7B Technology Innovation Institute The Falcon Series of Open Language Models
Dynamic JSON Badge
2023-11 Orca 2 Microsoft Orca 2: Teaching Small Language Models How to Reason
Dynamic JSON Badge
2023-12 Phi-2 Microsoft Phi-2: The surprising power of small language models No paper
2024-01 TinyLlama 1.1B StatNLP Group TinyLlama: An Open-Source Small Language Model
Dynamic JSON Badge
2024-01 LLaVA-Phi Midea Group LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
Dynamic JSON Badge
2024-01 H2O-Danube 1.8B H2O.ai H2O-Danube-1.8B Technical Report
Dynamic JSON Badge
2024-01 TeleChat chinatelecom TeleChat Technical Report
Dynamic JSON Badge
2024-02 Nemotron-4 15B NVIDIA Nemotron-4 15B Technical Report
Dynamic JSON Badge
2024-03 Yi HuggingFace et al. Yi: Open Foundation Models by 01.AI
Dynamic JSON Badge
2024-03 Gemma Google DeepMind Gemma: Open Models Based on Gemini Research and Technology
Dynamic JSON Badge
2024-03 Jamba AI21labs Jamba:A Hybrid Transformer-Mamba Language Model
Dynamic JSON Badge
2024-04 CT-LLM 2B Multimodal Art Projection Research Community Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
Dynamic JSON Badge
2024-04 RHO-1 Microsoft RHO-1: Not All Tokens Are What You Need
Dynamic JSON Badge
2024-04 Phi-3 Microsoft Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Dynamic JSON Badge
2024-05 Zamba 7B Zyphra Zamba: A Compact 7B SSM Hybrid Model
Dynamic JSON Badge
2024-05 ChuXin 1.6B HuggingFace et al. ChuXin: 1.6B Technical Report
Dynamic JSON Badge
2024-05 OpenBA-V2 3.4B Soochow University OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Dynamic JSON Badge
2024-05 Aya 23 Cohere Aya 23: Open Weight Releases to Further Multilingual Progress
Dynamic JSON Badge
2024-06 Xmodel-LM 1.1B XiaoduoAI Xmodel-LM Technical Report
Dynamic JSON Badge
2024-06 SAMBA 3.8B microsoft SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Dynamic JSON Badge

Other Papers

If you're interested in the field of SLM, you may find the above list of milestone papers helpful to explore its history and state-of-the-art. However, each direction of SLM offers a unique set of insights and contributions, which are essential to understanding the field as a whole. For a detailed list of papers in various subfields, please refer to the following link:

Date keywords Institute Paper Publication
2024-02 Ensemble SLMs Nanyang Technological University Purifying Large Language Models by Ensembling a Small Language Model
Dynamic JSON Badge
2024-01 Vary-toy 1.8B MEGVII Technology Small Language Model Meets with Reinforced Vision Vocabulary
Dynamic JSON Badge

SLM Leaderboard

  • Chatbot Arena Leaderboard - a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
  • AlpacaEval Leaderboard - An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
  • Open LLM Leaderboard - aims to track, rank and evaluate LLMs and chatbots as they are released.
  • OpenCompass 2.0 LLM Leaderboard - OpenCompass is an LLM evaluation platform, supporting a wide range of models (InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Open SLM

SLM Data

  • LLMDatahub - a curated collection of datasets specifically designed for chatbot training, including links, size, language, usage, and a brief description of each dataset
  • Zyda_processing - a dataset under a permissive license comprising 1.3 trillion tokens, assembled by integrating several major respected open-source datasets into a single, high-quality corpus

SLM Evaluation:

  • lm-evaluation-harness - A framework for few-shot evaluation of language models.
  • lighteval - a lightweight LLM evaluation suite that Hugging Face has been using internally.
  • OLMO-eval - a repository for evaluating open language models.
  • instruct-eval - This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
  • simple-evals - Eval tools by OpenAI.
  • Giskard - Testing & evaluation library for LLM applications, in particular RAGs
  • LangSmith - a unified platform from LangChain framework for: evaluation, collaboration HITL (Human In The Loop), logging and monitoring LLM applications.
  • Ragas - a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.

Miscellaneous

This repo contains awesome LLM paper list and frameworks for LLM training, tools to deploy LLM, courses and tutorials about LLM and all publicly available LLM checkpoints and APIs. Since SLM shares many of the same issues as LLM, I recommend that you also look at the contents related to LLM.

Contributing

This is an active repository and your contributions are always welcome!

I will keep some pull requests open if I'm not sure if they are awesome for LLM, you could vote for them by adding 👍 to them.


If you have any question about this opinionated list, do not hesitate to contact me [email protected].