Awesome-SLM

Awesome-SLM: a curated list of Small Language Model

🔥 Small Language Models(SLM) are streamlined versions of large language models designed to retain much of the original capabilities while being more efficient and manageable. Here is a curated list of papers about small language models.

Table of Content

Awesome-SLM

Milestone Papers

Date	keywords	Institute	Paper	Publication
2017-06	Transformers	Google	Attention Is All You Need	NeurIPS
2018-06	GPT 1.0	OpenAI	Improving Language Understanding by Generative Pre-Training
2018-10	BERT	Google	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	NAACL
2019-10	T5	Google	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	JMLR
2020-01	Scaling Law	OpenAI	Scaling Laws for Neural Language Models
2021-09	FLAN	Google	Finetuned Language Models are Zero-Shot Learners	ICLR
2021-10	T0	HuggingFace et al.	Multitask Prompted Training Enables Zero-Shot Task Generalization	ICLR
2022-01	LaMDA	Google	LaMDA: Language Models for Dialog Applications
2022-06	Emergent Abilities	Google	Emergent Abilities of Large Language Models	TMLR
2023-02	LLaMA	Meta	LLaMA: Open and Efficient Foundation Language Models
2023-03	Alpaca	Stanford	Alpaca: A Strong, Replicable Instruction-Following Model	No paper
2023-06	Orca	Microsoft	Orca: Progressive Learning from Complex Explanation Traces of GPT-4
2023-07	LLaMA 2	Meta	Llama 2: Open Foundation and Fine-Tuned Chat Models
2023-07	Stable Beluga	stability.ai	Meet Stable Beluga 1 and Stable Beluga 2, Our Large and Mighty Instruction Fine-Tuned Language Models	No paper
2023-09	Xgen 7B	Salesforce Research	XGen-7B Technical Report
2023-09	Qwen	Alibaba	QWEN TECHNICAL REPORT
2023-10	Mistral 7B	Mistral	Mistral 7B
2023-10	Zephyr 7B	HuggingFace et al.	Zephyr: Direct Distillation of LM Alignment
2023-10	MBT 7B	Mosaic Research	Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs	No paper
2023-11	Falcon 7B	Technology Innovation Institute	The Falcon Series of Open Language Models
2023-11	Orca 2	Microsoft	Orca 2: Teaching Small Language Models How to Reason
2023-12	Phi-2	Microsoft	Phi-2: The surprising power of small language models	No paper
2024-01	TinyLlama 1.1B	StatNLP Group	TinyLlama: An Open-Source Small Language Model
2024-01	LLaVA-Phi	Midea Group	LLaVA-Phi: Efficient Multi-Modal Assistant with Small Language Model
2024-01	H2O-Danube 1.8B	H2O.ai	H2O-Danube-1.8B Technical Report
2024-01	TeleChat	chinatelecom	TeleChat Technical Report
2024-02	Nemotron-4 15B	NVIDIA	Nemotron-4 15B Technical Report
2024-03	Yi	HuggingFace et al.	Yi: Open Foundation Models by 01.AI
2024-03	Gemma	Google DeepMind	Gemma: Open Models Based on Gemini Research and Technology
2024-03	Jamba	AI21labs	Jamba:A Hybrid Transformer-Mamba Language Model
2024-04	CT-LLM 2B	Multimodal Art Projection Research Community	Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model
2024-04	RHO-1	Microsoft	RHO-1: Not All Tokens Are What You Need
2024-04	Phi-3	Microsoft	Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
2024-05	Zamba 7B	Zyphra	Zamba: A Compact 7B SSM Hybrid Model
2024-05	ChuXin 1.6B	HuggingFace et al.	ChuXin: 1.6B Technical Report
2024-05	OpenBA-V2 3.4B	Soochow University	OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
2024-05	Aya 23	Cohere	Aya 23: Open Weight Releases to Further Multilingual Progress
2024-06	Xmodel-LM 1.1B	XiaoduoAI	Xmodel-LM Technical Report
2024-06	SAMBA 3.8B	microsoft	SAMBA: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling

Other Papers

If you're interested in the field of SLM, you may find the above list of milestone papers helpful to explore its history and state-of-the-art. However, each direction of SLM offers a unique set of insights and contributions, which are essential to understanding the field as a whole. For a detailed list of papers in various subfields, please refer to the following link:

Date	keywords	Institute	Paper	Publication
2024-02	Ensemble SLMs	Nanyang Technological University	Purifying Large Language Models by Ensembling a Small Language Model
2024-01	Vary-toy 1.8B	MEGVII Technology	Small Language Model Meets with Reinforced Vision Vocabulary

SLM Leaderboard

Chatbot Arena Leaderboard - a benchmark platform for large language models (LLMs) that features anonymous, randomized battles in a crowdsourced manner.
AlpacaEval Leaderboard - An Automatic Evaluator for Instruction-following Language Models using Nous benchmark suite.
Open LLM Leaderboard - aims to track, rank and evaluate LLMs and chatbots as they are released.
OpenCompass 2.0 LLM Leaderboard - OpenCompass is an LLM evaluation platform, supporting a wide range of models (InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Open SLM

Meta
- Llama 2-7|13|70B
- Llama 1-7|13|33|65B
- OPT-1.3|6.7|13|30|66B
Mistral AI
- Mistral-7B
Google
- Gemma-2|7B
- RecurrentGemma-2B
- T5
Apple
- OpenELM-1.1|3B
Microsoft
- Phi1-1.3B
- Phi2-2.7B
- Phi3-3.8|7|14B
AllenAI
- OLMo-7B
xAI
- Grok-1-314B-MoE
DeepSeek
- DeepSeek-Math-7B
- DeepSeek-Coder-1.3|6.7|7|33B
- DeepSeek-VL-1.3|7B
- DeepSeek-MoE-16B
Alibaba
- Qwen-1.8|7|14|72B
- Qwen1.5-1.8|4|7|14|32|72|110B
- CodeQwen-7B
- Qwen-VL-7B
- Qwen2-0.5|1.5|7|57-MOE|72B
01-ai
- Yi-34B
- Yi1.5-6|9|34B
- Yi-VL-6B|34B
Baichuan
- Baichuan-7|13B
- Baichuan2-7|13B
BLOOM
- BLOOMZ&mT0
Zhipu AI
- GLM-2|6|10|13|70B
- CogVLM2-19B
OpenBMB
- MiniCPM-2B
- OmniLLM-12B
- VisCPM-10B
- CPM-Bee-1|2|5|10B
RWKV Foundation
- RWKV-v4|5|6
ElutherAI
- Pythia-1|1.4|2.8|6.9|12B
Stability AI
- StableLM-3B
- StableLM-v2-1.6|12B
- StableCode-3B
BigCode
- StarCoder-1|3|7B
- StarCoder2-3|7|15B
DataBricks
- MPT-7B
Shanghai AI Laboratory
- InternLM2-1.8|7|20B
- InternLM-Math-7B|20B
- InternLM-XComposer2-1.8|7B
- InternVL-2|6|14|26

SLM Data

LLMDatahub - a curated collection of datasets specifically designed for chatbot training, including links, size, language, usage, and a brief description of each dataset
Zyda_processing - a dataset under a permissive license comprising 1.3 trillion tokens, assembled by integrating several major respected open-source datasets into a single, high-quality corpus

SLM Evaluation:

lm-evaluation-harness - A framework for few-shot evaluation of language models.
lighteval - a lightweight LLM evaluation suite that Hugging Face has been using internally.
OLMO-eval - a repository for evaluating open language models.
instruct-eval - This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
simple-evals - Eval tools by OpenAI.
Giskard - Testing & evaluation library for LLM applications, in particular RAGs
LangSmith - a unified platform from LangChain framework for: evaluation, collaboration HITL (Human In The Loop), logging and monitoring LLM applications.
Ragas - a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines.

Miscellaneous

This repo contains awesome LLM paper list and frameworks for LLM training, tools to deploy LLM, courses and tutorials about LLM and all publicly available LLM checkpoints and APIs. Since SLM shares many of the same issues as LLM, I recommend that you also look at the contents related to LLM.

Awesome-LLM

Contributing

This is an active repository and your contributions are always welcome!

I will keep some pull requests open if I'm not sure if they are awesome for LLM, you could vote for them by adding 👍 to them.

If you have any question about this opinionated list, do not hesitate to contact me ro_keonwoo@korea.ac.kr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Awesome-SLM

Table of Content

Milestone Papers

Other Papers

SLM Leaderboard

Open SLM

SLM Data

SLM Evaluation:

Miscellaneous

Contributing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Awesome-SLM

Table of Content

Milestone Papers

Other Papers

SLM Leaderboard

Open SLM

SLM Data

SLM Evaluation:

Miscellaneous

Contributing