Releases: aws-samples/foundation-model-benchmarking-tool
Releases · aws-samples/foundation-model-benchmarking-tool
torch version 2.4
Ollama support
What's Changed
- Add BYO Ollama Support by @dheerajoruganty in #223
- Change Llama 3 8b and 70b Model IDs by @dheerajoruganty in #225
Full Changelog: v2.0.14...v2.0.15
FMBench orchestrator
What's Changed
- Configuration files for llama3.1 70b on large prompt payloads + longbench dataset by @madhurprash in #216
- adding support for llama3 summarization prompt by @madhurprash in #217
- changing file name for llama3 summarization prompt by @madhurprash in #218
- Config files for llama3.1 8b instruct on g6e instances by @madhurprash in #219
- All config files for llama3.1 8b on g6e instances using DJL by @madhurprash in #220
- make config file naming convention consistent for llama3.1 8b/70b on g6e by @madhurprash in #221
- Config files for all llama3.2 models - tested by @madhurprash in #222
Full Changelog: v2.0.13...v2.0.14
pricing.yml updates
What's Changed
- Update pricing.yml by @aarora79 in #210
- Rename config-llama3-8b-g6e.4xl-tp-2-mc-max-djl-ec2.yml to config-lla… by @aarora79 in #212
- add mixtral config file for AWQ version - g6e.48xl by @madhurprash in #214
- pricing update + retry logic added to bedrock predictor by @madhurprash in #215
Full Changelog: v2.0.11...v2.0.13
Llama3 with Triton+DJL on Neuron
Full Changelog: v2.0.10...v2.0.11
Llama3 on g6e
What's Changed
- Add support and pricing for g6e instances by @dheerajoruganty in #207
- Config file for llama3 8b on inf2 using triton with DJL by @madhurprash in #205
- Add config files for g6e instances by @dheerajoruganty in #208
- Add concurrency=3 for g6e instance configs by @dheerajoruganty in #209
Full Changelog: v2.0.9...v2.0.10
Triton-DJL support, Tokenizer from HF
What's Changed
- Contains a configuration file trn and doc fix for triton on AWS chips by @madhurprash in #201
- Integration triton inference server with djl by @madhurprash in #204
Full Changelog: v2.0.8...v2.0.9
v2.0.8
What's Changed
- bug fix + updated triton vllm config file by @madhurprash in #200
- Add Support for Metrics While Benchmarking on Neuron and EC2 by @dheerajoruganty in #193
- Fix to correctly parse response in message API format for SageMaker inference by @fespigares in #165
- Tagging by @antara678 in #188
- adding platform identification for deployment by @madhurprash in #187
- update llama2 7b quick file by @madhurprash in #198
- Bug fix for evals by @madhurprash in #196
- Update config-ec2-llama3-8b.yml by @antara678 in #199
- Triton integration by @madhurprash in #194
New Contributors
- @fespigares made their first contribution in #165
Full Changelog: v2.0.7...v2.0.8
Triton inference server
What's Changed
- Add Support for Metrics While Benchmarking on Neuron and EC2 by @dheerajoruganty in #193
- Fix to correctly parse response in message API format for SageMaker inference by @fespigares in #165
- Tagging by @antara678 in #188
- adding platform identification for deployment by @madhurprash in #187
- update llama2 7b quick file by @madhurprash in #198
- Bug fix for evals by @madhurprash in #196
- Update config-ec2-llama3-8b.yml by @antara678 in #199
- Triton integration by @madhurprash in #194
New Contributors
- @fespigares made their first contribution in #165
Full Changelog: v2.0.6...v2.0.7
Multiple model copies on a single EC2 instance
What's Changed
- Fix EC2 container not running due to bad flag by @dheerajoruganty in #185
- remove unecessary --runtime=nvidia from the docker command by @madhurprash in #186
- Multi model w djl by @aarora79 in #191
Full Changelog: v2.0.5...v2.0.6