Skip to content

v0.4.0

Latest
Compare
Choose a tag to compare
@jean-francoisreboud jean-francoisreboud released this 01 Sep 21:41
a6ca885

0.4.0 (2024-09-01)

Features

πŸš€ examples: integrate Gemma2-2B (#132)
✨ layer_seq: LLM sliding window (#131)
πŸš€ examples: 3 LLMs examples (#130)
✨ layer_seq: LLM generate (128)
✨ layer_seq: MultiplySeq, SiLU & LLM test (127)
✨ layer_seq: ValueCausalSeq (126)
✨ layer_seq: QueryCausalSeq (125)
✨ layer_seq: RoPESeq (124)
✨ layer_seq: RMSNormSeq (123)
✨ layer_seq: EmbeddingSeq (122)
πŸͺœ feat: LayerCAM2D -> VQGrad2D, LayerCAMSeq -> VQGradSeq (#117)
βš™οΈ core: GELU vs GELUApprox (113)
πŸš€ perf: QuerySelf & ValueSelf (112)
πŸš€ perf: benchmark ViT base model (111)
βš™οΈ core: initForward,Backward model API (109)
πŸͺœ layer_1d: Dropout1D (#108)
πŸͺœ feat: VQGrad, VQGradSeq (#107)

Bug Fixes

πŸ› fix: run on Apple Silicon (110)

Miscellaneous Tasks

πŸ“š docs: LLM doc & split tests (129)
πŸš€ perf: use half in Metal kernels (121)
πŸ”¨ refactor: handle float16 along float on GPU (#120)
πŸš€ perf: copy & generate weights faster (119)
πŸš€ perf: Convolution2D (118)