Notable Changes
- ONNX runtime for CPU deployments: greatly improve CPU deployment throughput
- Add
/similarity
route
What's Changed
- tokenizer max limit on input size by @ErikKaum in #324
- docs: air-gapped deployments by @OlivierDehaene in #326
- feat(onnx): add onnx runtime for better CPU perf by @OlivierDehaene in #328
- feat: add
/similarity
route by @OlivierDehaene in #331 - fix(ort): fix mean pooling by @OlivierDehaene in #332
- chore(candle): update flash attn by @OlivierDehaene in #335
- v1.5.0 by @OlivierDehaene in #336
New Contributors
Full Changelog: v1.4.0...v1.5.0