0.4.0 (2024-09-01)
Features
π examples: integrate Gemma2-2B (#132)
β¨ layer_seq: LLM sliding window (#131)
π examples: 3 LLMs examples (#130)
β¨ layer_seq: LLM generate (128)
β¨ layer_seq: MultiplySeq, SiLU & LLM test (127)
β¨ layer_seq: ValueCausalSeq (126)
β¨ layer_seq: QueryCausalSeq (125)
β¨ layer_seq: RoPESeq (124)
β¨ layer_seq: RMSNormSeq (123)
β¨ layer_seq: EmbeddingSeq (122)
πͺ feat: LayerCAM2D -> VQGrad2D, LayerCAMSeq -> VQGradSeq (#117)
βοΈ core: GELU vs GELUApprox (113)
π perf: QuerySelf & ValueSelf (112)
π perf: benchmark ViT base model (111)
βοΈ core: initForward,Backward model API (109)
πͺ layer_1d: Dropout1D (#108)
πͺ feat: VQGrad, VQGradSeq (#107)
Bug Fixes
π fix: run on Apple Silicon (110)
Miscellaneous Tasks
π docs: LLM doc & split tests (129)
π perf: use half in Metal kernels (121)
π¨ refactor: handle float16 along float on GPU (#120)
π perf: copy & generate weights faster (119)
π perf: Convolution2D (118)