Skip to content

Commit

Permalink
Lemonade release v6.0.0: new OpenAI server, improvements, fixes (#291)
Browse files Browse the repository at this point in the history
Co-authored-by: amd-pworfolk <[email protected]>
Co-authored-by: Daniel Holanda <[email protected]>
Co-authored-by: Ramakrishnan Sivakumar <[email protected]>
  • Loading branch information
4 people committed Feb 27, 2025
1 parent 633913c commit 478bf5b
Show file tree
Hide file tree
Showing 48 changed files with 2,110 additions and 1,272 deletions.
170 changes: 170 additions & 0 deletions .github/workflows/server_installer_windows_latest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
name: Server Installer Windows-Latest Build and Test

on:
push:
branches: ["main"]
tags:
- v*
pull_request:
branches: ["main"]
workflow_dispatch:

jobs:
make-server-installer:
runs-on: windows-latest
steps:
- uses: actions/checkout@v4

- name: Install NSIS
shell: PowerShell
run: |
# Download NSIS installer
Invoke-WebRequest -UserAgent "Wget" -Uri "https://sourceforge.net/projects/nsis/files/NSIS%203/3.10/nsis-3.10-setup.exe" -OutFile "nsis.exe"
# Install NSIS
Start-Process nsis.exe -ArgumentList '/S' -Wait
- name: Verify NSIS installation
shell: PowerShell
run: |
# Check if NSIS is installed
& 'C:\Program Files (x86)\NSIS\makensis.exe' /VERSION
- name: Build the Lemonade Server installer
shell: PowerShell
run: |
cd installer
& 'C:\Program Files (x86)\NSIS\makensis.exe' 'Installer.nsi'
if (Test-Path "Lemonade_Server_Installer.exe") {
Write-Host "Lemonade_Server_Installer.exe has been created successfully."
} else {
Write-Host "Lemonade_Server_Installer.exe was not found."
exit 1
}
- name: Upload Installer
uses: actions/upload-artifact@v4
if: always()
with:
name: LemonadeServerInstaller
path: |
installer\Lemonade_Server_Installer.exe
- name: Attempt to install Lemonade Server using installer
shell: cmd
run: |
cd installer
Lemonade_Server_Installer.exe /S
- name: Ensure the Lemonade serer works properly
shell: pwsh
run: |
Write-Host "Use a function to determine the underlying command from the lemonade server shortcut"
function Get-ShortcutTarget {
param (
[string]$shortcutPath
)
$shell = New-Object -ComObject WScript.Shell
$shortcut = $shell.CreateShortcut($shortcutPath)
$targetPath = $shortcut.TargetPath
$arguments = $shortcut.Arguments
return "$targetPath $arguments"
}
Write-Host "ls of install directory to make sure the server is there"
ls "$HOME\AppData\Local\lemonade_server"
$shortcutPath = "$HOME\AppData\Local\lemonade_server\lemonade-server.lnk"
$fullCommand = Get-ShortcutTarget -shortcutPath $shortcutPath
Write-Host "Server shortcut full command: $fullCommand"
$quotedCommand = "`"$fullCommand`""
$outputFile = "output.log"
$errorFile = "error.log"
$serverProcess = Start-Process -FilePath "cmd.exe" -ArgumentList "/C $quotedCommand" -RedirectStandardOutput $outputFile -RedirectStandardError $errorFile -PassThru -NoNewWindow
Write-Host "Wait for 30 seconds to let the server come up"
Start-Sleep -Seconds 30
Write-Host "Check if server process successfully launched"
$serverRunning = Get-Process -Id $serverProcess.Id -ErrorAction SilentlyContinue
if (-not $serverRunning) {
Write-Host "Error: Server process isn't running, even though we just tried to start it!"
Write-Host "Standard Output:"
Get-Content $outputFile
Write-Host "Standard Error:"
Get-Content $errorFile
exit 1
} else {
Write-Host "Server process is alive."
}
Write-Host "Wait for the server port to come up"
while ($true) {
$llmPortCheck = Test-NetConnection -ComputerName 127.0.0.1 -Port 8000
if (-not $llmPortCheck.TcpTestSucceeded) {
Write-Host "LLM server is not yet running on port 8000!"
Write-Host "Standard Output:"
Get-Content $outputFile
Write-Host "Standard Error:"
Get-Content $errorFile
} else {
Write-Host "LLM server is running on port 8000."
break
}
Start-Sleep -Seconds 30
}
Write-Host "Checking the /health endpoint"
$response = Invoke-WebRequest -Uri http://localhost:8000/api/v0/health -UseBasicParsing
if ($response.StatusCode -eq 200) {
Write-Output "Good: /health status code is 200"
} else {
Write-Output "Error: /health status code is not 200"
Write-Host "Standard Output:"
Get-Content $outputFile
Write-Host "Standard Error:"
Get-Content $errorFile
exit 1
}
$jsonContent = $response.Content | ConvertFrom-Json
if ($jsonContent) {
Write-Output "Good: /health JSON content is not empty: $jsonContent"
} else {
Write-Output "Error: /health JSON content is empty"
Write-Host "Standard Output:"
Get-Content $outputFile
Write-Host "Standard Error:"
Get-Content $errorFile
exit 1
}
Write-Host "Close the server process"
function Kill-Tree {
Param([int]$ppid)
Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq $ppid } | ForEach-Object { Kill-Tree $_.ProcessId }
Stop-Process -Id $ppid
}
Kill-Tree $serverProcess.Id
- name: Release
uses: softprops/action-gh-release@v2
if: startsWith(github.ref, 'refs/tags/v')
with:
files: installer/Lemonade_Server_Installer.exe




17 changes: 7 additions & 10 deletions .github/workflows/test_lemonade.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,6 @@ jobs:
run: |
pylint src/lemonade --rcfile .pylintrc --disable E0401
pylint examples --rcfile .pylintrc --disable E0401,E0611 --jobs=1
- name: Test HF+CPU server
if: runner.os == 'Windows'
timeout-minutes: 10
uses: ./.github/actions/server-testing
with:
conda_env: -n lemon
load_command: -i facebook/opt-125m huggingface-load
- name: Run lemonade tests
shell: bash -el {0}
run: |
Expand All @@ -63,7 +56,11 @@ jobs:
python test/lemonade/llm_api.py
# Test high-level LEAP APIs
python examples/lemonade/leap_basic.py
python examples/lemonade/leap_streaming.py
# Test high-level APIs
python examples/lemonade/api_basic.py
python examples/lemonade/api_streaming.py
# Test server
python test/lemonade/server.py
14 changes: 3 additions & 11 deletions .github/workflows/test_lemonade_oga_cpu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,7 @@ jobs:
# Test low-level APIs
python test/lemonade/oga_cpu_api.py
# Test high-level LEAP APIs
python examples/lemonade/leap_oga_cpu.py
python examples/lemonade/leap_oga_cpu_streaming.py
- name: Test OGA+CPU server
if: runner.os == 'Windows'
timeout-minutes: 10
uses: ./.github/actions/server-testing
with:
conda_env: -n lemon
load_command: -i TinyPixel/small-llama2 oga-load --device cpu --dtype int4
hf_token: "${{ secrets.HUGGINGFACE_ACCESS_TOKEN }}" # Required by OGA model_builder in OGA 0.4.0 but not future versions
# Test high-level APIs
python examples/lemonade/api_oga_cpu.py
python examples/lemonade/api_oga_cpu_streaming.py
2 changes: 2 additions & 0 deletions NOTICE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ PORTIONS LICENSED AS FOLLOWS

\> TurnkeyML used code from the [MLAgility](https://github.com/groq/mlagility) and [GroqFlow](https://github.com/groq/groqflow) projects as a starting point. Much of that code was refactored, improved, or replaced by the time TurnkeyML was published.

\> TurnkeyML uses the [Microsoft lemon emoji](https://github.com/microsoft/fluentui-emoji) as an icon for the lemoande tool.

>The MIT License
>
>Copyright 2023 Groq Inc.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

We are on a mission to make it easy to use the most important tools in the ONNX ecosystem. TurnkeyML accomplishes this by providing no-code CLIs and low-code APIs for both general ONNX workflows with `turnkey` as well as LLMs with `lemonade`.

| [**Lemonade**](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md) | [**Turnkey**](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md) |
| [**Lemonade SDK**](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md) | [**Turnkey**](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md) |
|:----------------------------------------------: |:-----------------------------------------------------------------: |
| Serve and benchmark LLMs on CPU, GPU, and NPU. <br/> [Click here to get started with `lemonade`.](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md) | Export and optimize ONNX models for CNNs and Transformers. <br/> [Click here to get started with `turnkey`.](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md) |
| <img src="https://github.com/onnx/turnkeyml/blob/main/img/llm_demo.png?raw=true"/> | <img src="https://github.com/onnx/turnkeyml/blob/main/img/classic_demo.png?raw=true"/> |
Expand Down
50 changes: 18 additions & 32 deletions docs/lemonade/getting_started.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,11 @@
# Lemonade
# Lemonade SDK

Welcome to the project page for `lemonade` the Turnkey LLM Aide!

1. [Install](#install)
1. [CLI Commands](#cli-commands)
- [Syntax](#syntax)
- [Chatting](#chatting)
- [Accuracy](#accuracy)
- [Benchmarking](#benchmarking)
- [Memory Usage](#memory-usage)
- [Serving](#serving)
1. [API Overview](#api)
1. [Code Organization](#code-organization)
1. [Contributing](#contributing)
The `lemonade` SDK provides everything needed to get up and running quickly with LLMs on OnnxRuntime GenAI (OGA).

- [Quick installation from PyPI](#install).
- [CLI with tools for prompting, benchmarking, and accuracy tests](#cli-commands).
- [REST API with OpenAI compatibility](#serving).
- [Python API based on `from_pretrained()` for easy integration with Python apps](#api).

# Install

Expand Down Expand Up @@ -85,9 +77,9 @@ Can be read like this:
The `lemonade -h` command will show you which options and Tools are available, and `lemonade TOOL -h` will tell you more about that specific Tool.


## Chatting
## Prompting

To chat with your LLM try:
To prompt your LLM try:

OGA iGPU:
```bash
Expand Down Expand Up @@ -163,41 +155,35 @@ contains a figure plotting the memory usage over the build time. Learn more by
## Serving
You can launch a WebSocket server for your LLM with:
OGA iGPU:
```bash
lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 serve
```
You can launch an OpenAI-compatible server with:
Hugging Face:
```bash
lemonade -i facebook/opt-125m huggingface-load serve
lemonade serve
```
Once the server has launched, you can connect to it from your own application, or interact directly by following the on-screen instructions to open a basic web app.
Visit the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the endpoints provided.
# API
Lemonade is also available via API.
## LEAP APIs
## High-Level APIs
The lemonade enablement platform (LEAP) API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid). This makes it easy to integrate lemonade LLMs into Python applications.
The high-level lemonade API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid) using the popular `from_pretrained()` function. This makes it easy to integrate lemonade LLMs into Python applications.
OGA iGPU:
```python
from lemonade import leap
from lemonade.api import from_pretrained
model, tokenizer = leap.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", recipe="oga-igpu")
model, tokenizer = from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", recipe="oga-igpu")
input_ids = tokenizer("This is my prompt", return_tensors="pt").input_ids
response = model.generate(input_ids, max_new_tokens=30)
print(tokenizer.decode(response[0]))
```
You can learn more about the LEAP APIs [here](https://github.com/onnx/turnkeyml/tree/main/examples/lemonade).
You can learn more about the high-level APIs [here](https://github.com/onnx/turnkeyml/tree/main/examples/lemonade).
## Low-Level API
Expand All @@ -207,13 +193,13 @@ Here's a quick example of how to prompt a Hugging Face LLM using the low-level A

```python
import lemonade.tools.torch_llm as tl
import lemonade.tools.chat as cl
import lemonade.tools.prompt as pt
from turnkeyml.state import State
state = State(cache_dir="cache", build_name="test")
state = tl.HuggingfaceLoad().run(state, input="facebook/opt-125m")
state = cl.Prompt().run(state, prompt="hi", max_new_tokens=15)
state = pt.Prompt().run(state, prompt="hi", max_new_tokens=15)
print("Response:", state.response)
```
Expand Down
Loading

0 comments on commit 478bf5b

Please sign in to comment.