Releases: onnx/turnkeyml
v6.0.0
Summary
This is a major release that introduces an OpenAI-compatible server in a completely new serve
tool, support for Quark quantization in the new quark
tool, and many other fixes/improvements.
Breaking Changes
New OpenAI-Compatible Server
The previous serve
Tool
has been replaced by a new standalone serving command. This new server has OpenAI API compatibility and will add Ollama compatibility in the near future.
- Old usage:
lemoande -i CHECKPOINT oga-load --args serve
- New usage:
lemonade serve
, then use REST APIs to control model loading, completions, etc. See https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md to learn more.
The server can also be installed and used with no-code by running Lemonade_Server_Installer.exe
, which is provided as a release asset in this and all future releases.
The server code was also moved out of tools/chat.py into its own file in tools/serve.py. We also renamed chat.py to prompt.py for clarity, since that file now only contains the prompting tool.
The LEAP name has been deprecated
In the interest of reducing naming confusion, the "LEAP API" is now simply the "high-level lemonade API".
- Old usage:
from lemonade.leap import from_pretrained
- New usage:
from lemonade.api import from_pretrained
Summary of Contributions
- The base checkpoint for models is retrieved from the Hugging Face API at loading time (@ramkrishna2910)
- The benchmarking tools (huggingface-bench, oga-bench, and llamacpp-bench) have been refactored to reduce code duplication and improve maintainability. They now also support a list of prompts (or prompt lengths) to be benchmarked:
--prompts 128 256 512
(@amd-pworfolk) - The
avg_accuracy
stats has been renamed toaverage_mmlu_accuracy
for clarity with respect to non-MMLU accuracy tests (@jeremyfowers), (attn @apsonawane) - Introduce
Lemonade_Server_Installer.exe
(@jeremyfowers) - Implement an OpenAI-compatible server and remove the old
serve
tool (@danielholanda) - Rename
chat
module toprompt
(@jeremyfowers) - Improved lemonade getting started documentation and remove the "LEAP" branding (@jeremyfowers)
- OGA 0.6.0 is the default package for CPU, CUDA, and DML (@jeremyfowers)
- Add support for Quark quantization with a new
quark-quantize
tool (@iswaryaalex) - Clean up the lemonade getting started docs and remove some deprecated tools (@jeremyfowers)
New Contributors
- @iswaryaalex made their first contribution in #290
Full Changelog: v5.1.1...v6.0.0
v5.1.1
What's Changed
- Fix broken lemonade link by @jeremyfowers in #278
- Update getting_started.md by @jeremyfowers in #282
- Avoid lemonade build cache collisions (@jeremyfowers).
- All builds are now placed under
<cache_dir>/builds/<build_name>
instead of<cache_dir>/<build_name>
- This creates a more hierarchical cache structure, where builds are peer to models and data.
- All build names now include a timestamp
- This ensures that build stats and logs will not collide with each other if we build the same model in the same cache, but with different parameters.
- Revs the minor version number because all previous caches are invalidated.
- All builds are now placed under
- Enable ONNX model download for cpu and igpu in oga-load (@jeremyfowers)
- Improvements to memory tracking (@amd-pworfolk)
- Improve OGA testing (@jeremyfowers).
- Run the sever test last, since it is the most complex and has the worst telemetry
- Stop deleting the entire cache directory between every test, since that deletes the model builder cache. Instead, just delete the cache/builds directory.
- Add average mmlu accuracy by @apsonawane in #287
- Update OGA LEAP recipes by @jeremyfowers in #289
Full Changelog: v5.0.5...v5.1.1
v5.0.5
What's Changed
- Early preview of new server interface by @danielholanda in #277
Full Changelog: v5.0.4...v5.0.5
v5.0.4
v5.0.3
What's Changed
- Bring OGA under test and fix OGA server. Improve llm-prompt. by @jeremyfowers in #272
- Always move HF tozenizer encodings to the target device by @jeremyfowers in #274
- Release v5.0.3: Lemonade installer and examples, repo reorg, and lots more by @jeremyfowers in #275
- Docs, test, and examples have been moved into turnkey (CNNs and Transformers) vs. lemonade (LLMs) directories (@jeremyfowers)
- For example: docs/lemonade/getting_started.md instead of docs/lemonade_getting_started.md
- Track the memory utilization of any lemonade or turnkey command and plot it on a graph by setting the --memory option (@amd-pworfolk).
- Add examples and demo applications for the high-level LEAP APIs in examples/lemonade (@jeremyfowers).
- Add LEAP support for all OGA backends (@jeremyfowers).
- Extend the llm-prompt tool to make it more useful for model and framework validation (@amd-pworfolk).
- Updates and fixes to lemonade test code in llm_api.py (@jeremyfowers).
- Fix not_enough_tokens bug on oga-bench (@danielholanda).
Full Changelog: v5.0.2...v5.0.3
v5.0.2
What's Changed
Re-issuing v5.0.1 to fix a pypi release bug.
- Moving HumanEval to pypi (@ramkrishna2910)
- Adds std dev for oga-bench (@amd-pworfolk)
- Updates build status monitor to change update frequency (@danielholanda)
- Fix linter issue (@ramkrishna2910)
- Fix llama.cpp issue introduced by their breaking change (@jeremyfowers)
- Polish llama.cpp implementation (@ramkrishna2910)
- Minor changes fixing onnxruntime_genai issue and input_path by @apsonawane in #267
New Contributors
- @apsonawane made their first contribution in #267
Full Changelog: v5.0.0...v5.0.2
v5.0.1
What's Changed
- Moving HumanEval to pypi (@ramkrishna2910)
- Adds std dev for oga-bench (@amd-pworfolk)
- Updates build status monitor to change update frequency (@danielholanda)
- Fix linter issue (@ramkrishna2910)
- Fix llama.cpp issue introduced by their breaking change (@jeremyfowers)
- Polish llama.cpp implementation (@ramkrishna2910)
- Minor changes fixing onnxruntime_genai issue and input_path by @apsonawane in #267
New Contributors
- @apsonawane made their first contribution in #267
Full Changelog: v5.0.0...v5.0.1
v5.0.0
What's Changed
- Improve documentation and LLM status clarity by @jeremyfowers in #261
- Move llm source code into src/lemonade dir. Add HumanEval. by @jeremyfowers in #262
- Adds llamacpp benchmarking support by @ramkrishna2910 in #263
Full Changelog: v4.0.11...v5.0.0
v4.0.11
What's Changed
- Hotfix: monitor progress bug by @jeremyfowers in #259
Full Changelog: v4.0.10...v4.0.11
v4.0.10
What's Changed
- Update ort_genai_hybrid.md by @jeremyfowers in #256
- Standardize Timestamps to Fixed Time Zone in TKML Runs by @danielholanda in #257
- Allow tools to display percent progress in the monitor by @jeremyfowers in #258
Full Changelog: v4.0.9...v4.0.10