Skip to content

Releases: triton-inference-server/triton_cli

0.0.11

05 Sep 22:22
1d872d7
Compare
Choose a tag to compare

What's Changed

  • chore: Add Llama3.1-8B support for vLLM and use KIND_MODEL for vLLM config by default by @rmccorm4 in #82
  • build: Upgrade to 24.08, TRT-LLM 0.12.0, and Triton CLI v0.0.11 by @rmccorm4 in #83

Full Changelog: 0.0.10...0.0.11

0.0.10

06 Aug 19:07
a050ec1
Compare
Choose a tag to compare

What's Changed

  • Upgrade to 24.07, TRT-LLM 0.11.0, and Triton CLI v0.0.10 by @rmccorm4 in #81
  • Log infer inputs when using triton infer
  • Add more sensible TRTLLM config.pbtxt template parsing values to engine_config_parser.py

Full Changelog: 0.0.9...0.0.10

0.0.9

26 Jul 23:11
449f6b8
Compare
Choose a tag to compare

What's Changed

  • chore: Update TRT-LLM checkpoint scripts to v0.10 and Fix Github Actions Pipeline by @KrishnanPrash in #78
  • test: Shorten genai-perf test time, fail fast on server startup, and upgrade to 24.06 by @rmccorm4 in #76
  • Tag 0.0.9 and update versions to 24.06 by @rmccorm4 in #79

New Contributors

  • @mc-nv made their first contribution in #74

Full Changelog: 0.0.8...0.0.9

0.0.8

11 Jun 22:49
8f577d3
Compare
Choose a tag to compare

What's Changed

  • Disable Echo (exclude input text from output text) in TRT-LLM by default by @nnshah1 in #58
  • Enable calls to GenAI-Perf for profile subcommand by @dyastremsky in #52
  • Fix wrong huggingface login command in readme by @matthewkotila in #60
  • Tweak test timeouts to account for testing Llama 2 and Llama 3 models by @rmccorm4 in #61
  • Add GitLab CI trigger in GitHub checks by @nvda-mesharma in #64
  • test: Unit Tests for triton {metrics, config, status} by @KrishnanPrash in #66
  • chore: Upgrade dependencies for 24.05 by @KrishnanPrash in #67
  • refactor: Simplify testing with ScopedTritonServer instead of pytest fixtures by @KrishnanPrash in #68
  • ci: Restrict numpy to version 1.x by @KrishnanPrash in #70
  • refactor: Add TritonCLIException to denote expected vs unexpected errors by @rmccorm4 in #69
  • build: Update CLI version references to 0.0.8 and Triton references to 24.05 by @rmccorm4 in #72

New Contributors

Full Changelog: 0.0.7...0.0.8

0.0.7

08 May 02:58
8c491be
Compare
Choose a tag to compare

What's Changed

  • Sync with Triton 24.04
  • Bump TRT-LLM version to 0.9.0
  • Add support for llama-2-7b-chat, llama-3-8b, and llama-3-8b-instruct for both vLLM and TRT-LLM
  • Improve error checking and error messages of building TRT-LLM engines
  • Log the underlying convert_checkpoint.py and trtllm-build commands for reproducibility/visibility
  • Don't call convert_checkpoint.py if converted weights are already found
  • Call convert_checkpoint.py via subprocess to improve total memory usage
  • Attempt to cleanup failed trtllm models in model repository if import or engine building fails, rather than leaving the model repository in an unfinished state.
  • Update tests to wait for both HTTP and GRPC server endpoints to be ready before testing
    • Fixes intermittent ConnectionRefusedError in CI tests

Full Changelog: 0.0.6...0.0.7

0.0.6

24 Apr 00:53
039b165
Compare
Choose a tag to compare
0.0.6 Pre-release
Pre-release

What's Changed

  • GPT Engine Builder by @fpetrini15 in #24
  • Modularize TRT LLM Builders by @fpetrini15 in #26
  • Add --backend support to bench command and default to custom image by @rmccorm4 in #27
  • Fix model infer on TRT LLM with negative ints, and minor cleanup by @rmccorm4 in #28
  • Fix profile subcommand to account for offline (non-streaming) metrics and V1 batching by @rmccorm4 in #29
  • Minor Repo Optimizations by @fpetrini15 in #30
  • Bring back IFB default to TRT LLM models and bump to 24.01 by @rmccorm4 in #31
  • Bump cli version to 0.0.3, bump trtllm version to 0.7.1, and bump vllm version to 0.3.0 by @rmccorm4 in #32
  • Give GPT2 quicker build/load settings for demos, fix Dockerfile version syntax, bump CLI version to 0.0.4 by @rmccorm4 in #33
  • Add note on MPI dependencies by @rmccorm4 in #34
  • Add CLI subcommand tests to CI by @krishung5 in #35
  • Bump to v0.0.5 - CI testing working for 24.01 by @rmccorm4 in #38
  • Add extra tests for CLI by @krishung5 in #36
  • CLI TRT LLM v0.8.0 Refresh by @fpetrini15 in #37
  • Bump to v0.0.6 - CI testing working for 24.02 by @fpetrini15 in #39
  • Flatten CLI Args by @fpetrini15 in #40
  • Update README commands by @rmccorm4 in #42
  • Enable CLI Concurrent Testing by @fpetrini15 in #41
  • README Restructuring by @fpetrini15 in #43
  • Address some documentation issues by @rmccorm4 in #50

New Contributors

Full Changelog: 0.0.2...0.0.6

0.0.2

17 Jan 21:03
b87a553
Compare
Choose a tag to compare
0.0.2 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: https://github.com/triton-inference-server/triton_cli/commits/0.0.2