Lemonade release v6.0.0: new OpenAI server, improvements, fixes (#291)

Co-authored-by: amd-pworfolk <[email protected]> Co-authored-by: Daniel Holanda <[email protected]> Co-authored-by: Ramakrishnan Sivakumar <[email protected]>
onnx · Feb 27, 2025 · 478bf5b · 478bf5b
1 parent 633913c
commit 478bf5b
Show file tree

Hide file tree

Showing 48 changed files with 2,110 additions and 1,272 deletions.
diff --git a/.github/workflows/server_installer_windows_latest.yml b/.github/workflows/server_installer_windows_latest.yml
@@ -0,0 +1,170 @@
+name: Server Installer Windows-Latest Build and Test
+
+on:
+  push:
+    branches: ["main"]
+    tags:
+      - v*
+  pull_request:
+    branches: ["main"]
+  workflow_dispatch:
+
+jobs:
+  make-server-installer:
+    runs-on: windows-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install NSIS
+        shell: PowerShell
+        run: |
+          # Download NSIS installer
+          Invoke-WebRequest -UserAgent "Wget" -Uri "https://sourceforge.net/projects/nsis/files/NSIS%203/3.10/nsis-3.10-setup.exe" -OutFile "nsis.exe"
+          
+          # Install NSIS
+          Start-Process nsis.exe -ArgumentList '/S' -Wait
+
+      - name: Verify NSIS installation
+        shell: PowerShell
+        run: |
+          # Check if NSIS is installed
+          & 'C:\Program Files (x86)\NSIS\makensis.exe' /VERSION
+
+      - name: Build the Lemonade Server installer
+        shell: PowerShell
+        run: |
+          cd installer
+          & 'C:\Program Files (x86)\NSIS\makensis.exe' 'Installer.nsi'
+
+          if (Test-Path "Lemonade_Server_Installer.exe") {
+            Write-Host "Lemonade_Server_Installer.exe has been created successfully."
+          } else {
+            Write-Host "Lemonade_Server_Installer.exe was not found."
+            exit 1
+          }
+
+      - name: Upload Installer
+        uses: actions/upload-artifact@v4
+        if: always()
+        with:
+          name: LemonadeServerInstaller
+          path: |
+            installer\Lemonade_Server_Installer.exe
+
+      - name: Attempt to install Lemonade Server using installer
+        shell: cmd
+        run: |          
+          cd installer
+          Lemonade_Server_Installer.exe /S
+
+      - name: Ensure the Lemonade serer works properly
+        shell: pwsh
+        run: |
+          Write-Host "Use a function to determine the underlying command from the lemonade server shortcut"
+          function Get-ShortcutTarget {
+              param (
+                  [string]$shortcutPath
+              )
+              $shell = New-Object -ComObject WScript.Shell
+              $shortcut = $shell.CreateShortcut($shortcutPath)
+              $targetPath = $shortcut.TargetPath
+              $arguments = $shortcut.Arguments
+              return "$targetPath $arguments"
+          }
+
+          Write-Host "ls of install directory to make sure the server is there"
+          ls "$HOME\AppData\Local\lemonade_server"
+
+          $shortcutPath = "$HOME\AppData\Local\lemonade_server\lemonade-server.lnk"
+          $fullCommand = Get-ShortcutTarget -shortcutPath $shortcutPath
+
+          Write-Host "Server shortcut full command: $fullCommand"
+
+          $quotedCommand = "`"$fullCommand`""
+          
+          $outputFile = "output.log"
+          $errorFile = "error.log"
+          $serverProcess = Start-Process -FilePath "cmd.exe" -ArgumentList "/C $quotedCommand" -RedirectStandardOutput $outputFile -RedirectStandardError $errorFile -PassThru -NoNewWindow
+
+          Write-Host "Wait for 30 seconds to let the server come up"
+          Start-Sleep -Seconds 30
+          
+          Write-Host "Check if server process successfully launched"
+          $serverRunning = Get-Process -Id $serverProcess.Id -ErrorAction SilentlyContinue
+          if (-not $serverRunning) {
+            Write-Host "Error: Server process isn't running, even though we just tried to start it!"
+            Write-Host "Standard Output:"
+            Get-Content $outputFile
+
+            Write-Host "Standard Error:"
+            Get-Content $errorFile
+            exit 1
+          } else {
+            Write-Host "Server process is alive."
+          }
+
+          Write-Host "Wait for the server port to come up"
+          while ($true) {
+            
+            $llmPortCheck = Test-NetConnection -ComputerName 127.0.0.1 -Port 8000
+            if (-not $llmPortCheck.TcpTestSucceeded) {
+              Write-Host "LLM server is not yet running on port 8000!"
+              Write-Host "Standard Output:"
+              Get-Content $outputFile
+
+              Write-Host "Standard Error:"
+              Get-Content $errorFile
+            } else {
+              Write-Host "LLM server is running on port 8000."
+              break
+            }
+
+            Start-Sleep -Seconds 30
+          }
+
+          Write-Host "Checking the /health endpoint"
+          $response = Invoke-WebRequest -Uri http://localhost:8000/api/v0/health -UseBasicParsing
+
+          if ($response.StatusCode -eq 200) {
+              Write-Output "Good: /health status code is 200"
+          } else {
+              Write-Output "Error: /health status code is not 200"
+              Write-Host "Standard Output:"
+              Get-Content $outputFile
+
+              Write-Host "Standard Error:"
+              Get-Content $errorFile
+              exit 1
+          }
+
+          $jsonContent = $response.Content | ConvertFrom-Json
+          if ($jsonContent) {
+              Write-Output "Good: /health JSON content is not empty: $jsonContent"
+          } else {
+              Write-Output "Error: /health JSON content is empty"
+              Write-Host "Standard Output:"
+              Get-Content $outputFile
+
+              Write-Host "Standard Error:"
+              Get-Content $errorFile
+              exit 1
+          }
+
+          Write-Host "Close the server process"
+
+          function Kill-Tree {
+              Param([int]$ppid)
+              Get-CimInstance Win32_Process | Where-Object { $_.ParentProcessId -eq $ppid } | ForEach-Object { Kill-Tree $_.ProcessId }
+              Stop-Process -Id $ppid
+          }
+          Kill-Tree $serverProcess.Id
+
+      - name: Release
+        uses: softprops/action-gh-release@v2
+        if: startsWith(github.ref, 'refs/tags/v')
+        with:
+          files: installer/Lemonade_Server_Installer.exe
+
+
+
+
diff --git a/.github/workflows/test_lemonade.yml b/.github/workflows/test_lemonade.yml
@@ -46,13 +46,6 @@ jobs:
         run: |
           pylint src/lemonade --rcfile .pylintrc --disable E0401
           pylint examples --rcfile .pylintrc --disable E0401,E0611 --jobs=1
-      - name: Test HF+CPU server
-        if: runner.os == 'Windows'
-        timeout-minutes: 10
-        uses: ./.github/actions/server-testing
-        with:
-          conda_env: -n lemon
-          load_command: -i facebook/opt-125m huggingface-load
       - name: Run lemonade tests
         shell: bash -el {0}
         run: |
@@ -63,7 +56,11 @@ jobs:
           python test/lemonade/llm_api.py
           
 
-          # Test high-level LEAP APIs
-          python examples/lemonade/leap_basic.py
-          python examples/lemonade/leap_streaming.py
+          # Test high-level APIs
+          python examples/lemonade/api_basic.py
+          python examples/lemonade/api_streaming.py
+
+          # Test server
+          python test/lemonade/server.py
+
 
diff --git a/.github/workflows/test_lemonade_oga_cpu.yml b/.github/workflows/test_lemonade_oga_cpu.yml
@@ -53,15 +53,7 @@ jobs:
           # Test low-level APIs
           python test/lemonade/oga_cpu_api.py
 
-          # Test high-level LEAP APIs
-          python examples/lemonade/leap_oga_cpu.py
-          python examples/lemonade/leap_oga_cpu_streaming.py
-      - name: Test OGA+CPU server
-        if: runner.os == 'Windows'
-        timeout-minutes: 10
-        uses: ./.github/actions/server-testing
-        with:
-          conda_env: -n lemon
-          load_command: -i TinyPixel/small-llama2 oga-load --device cpu --dtype int4
-          hf_token: "${{ secrets.HUGGINGFACE_ACCESS_TOKEN }}" # Required by OGA model_builder in OGA 0.4.0 but not future versions
+          # Test high-level APIs
+          python examples/lemonade/api_oga_cpu.py
+          python examples/lemonade/api_oga_cpu_streaming.py
 
diff --git a/NOTICE.md b/NOTICE.md
@@ -2,6 +2,8 @@ PORTIONS LICENSED AS FOLLOWS
 
 \>  TurnkeyML used code from the [MLAgility](https://github.com/groq/mlagility) and [GroqFlow](https://github.com/groq/groqflow) projects as a starting point. Much of that code was refactored, improved, or replaced by the time TurnkeyML was published. 
 
+\> TurnkeyML uses the [Microsoft lemon emoji](https://github.com/microsoft/fluentui-emoji) as an icon for the lemoande tool.
+
 >The MIT License
 >
 >Copyright 2023 Groq Inc.

diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
 
 We are on a mission to make it easy to use the most important tools in the ONNX ecosystem. TurnkeyML accomplishes this by providing no-code CLIs and low-code APIs for both general ONNX workflows with `turnkey` as well as LLMs with `lemonade`.
 
-|                     [**Lemonade**](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md)                    	|                            [**Turnkey**](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md)                                	|
+|                     [**Lemonade SDK**](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md)                    	|                            [**Turnkey**](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md)                                	|
 |:----------------------------------------------:	|:-----------------------------------------------------------------:	|
 | Serve and benchmark LLMs on CPU, GPU, and NPU. <br/>	[Click here to get started with `lemonade`.](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/getting_started.md) | Export and optimize ONNX models for CNNs and Transformers. <br/>	[Click here to get started with `turnkey`.](https://github.com/onnx/turnkeyml/blob/main/docs/turnkey/getting_started.md)	|
 | <img src="https://github.com/onnx/turnkeyml/blob/main/img/llm_demo.png?raw=true"/> | <img src="https://github.com/onnx/turnkeyml/blob/main/img/classic_demo.png?raw=true"/> |

diff --git a/docs/lemonade/getting_started.md b/docs/lemonade/getting_started.md
@@ -1,19 +1,11 @@
-# Lemonade
+# Lemonade SDK
 
-Welcome to the project page for `lemonade` the Turnkey LLM Aide!
-
-1. [Install](#install)
-1. [CLI Commands](#cli-commands)
-    - [Syntax](#syntax)
-    - [Chatting](#chatting)
-    - [Accuracy](#accuracy)
-    - [Benchmarking](#benchmarking)
-    - [Memory Usage](#memory-usage)
-    - [Serving](#serving)
-1. [API Overview](#api)
-1. [Code Organization](#code-organization)
-1. [Contributing](#contributing)
+The `lemonade` SDK provides everything needed to get up and running quickly with LLMs on OnnxRuntime GenAI (OGA). 
 
+- [Quick installation from PyPI](#install). 
+- [CLI with tools for prompting, benchmarking, and accuracy tests](#cli-commands).
+- [REST API with OpenAI compatibility](#serving).
+- [Python API based on `from_pretrained()` for easy integration with Python apps](#api).  
 
 # Install
 
@@ -85,9 +77,9 @@ Can be read like this:
 The `lemonade -h` command will show you which options and Tools are available, and `lemonade TOOL -h` will tell you more about that specific Tool.
 
 
-## Chatting
+## Prompting
 
-To chat with your LLM try:
+To prompt your LLM try:
 
 OGA iGPU:
 ```bash
@@ -163,41 +155,35 @@ contains a figure plotting the memory usage over the build time.  Learn more by
 
 ## Serving
 
-You can launch a WebSocket server for your LLM with:
-
-OGA iGPU:
-```bash
-    lemonade -i microsoft/Phi-3-mini-4k-instruct oga-load --device igpu --dtype int4 serve
-```
+You can launch an OpenAI-compatible server with:
 
-Hugging Face:
 ```bash
-    lemonade -i facebook/opt-125m huggingface-load serve
+    lemonade serve
 ```
 
-Once the server has launched, you can connect to it from your own application, or interact directly by following the on-screen instructions to open a basic web app.
+Visit the [server spec](https://github.com/onnx/turnkeyml/blob/main/docs/lemonade/server_spec.md) to learn more about the endpoints provided.
 
 # API
 
 Lemonade is also available via API. 
 
-## LEAP APIs
+## High-Level APIs
 
-The lemonade enablement platform (LEAP) API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid). This makes it easy to integrate lemonade LLMs into Python applications.
+The high-level lemonade API abstracts loading models from any supported framework (e.g., Hugging Face, OGA) and backend (e.g., CPU, iGPU, Hybrid) using the popular `from_pretrained()` function. This makes it easy to integrate lemonade LLMs into Python applications.
 
 OGA iGPU:
 ```python
-from lemonade import leap
+from lemonade.api import from_pretrained
 
-model, tokenizer = leap.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", recipe="oga-igpu")
+model, tokenizer = from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", recipe="oga-igpu")
 
 input_ids = tokenizer("This is my prompt", return_tensors="pt").input_ids
 response = model.generate(input_ids, max_new_tokens=30)
 
 print(tokenizer.decode(response[0]))
 ```
 
-You can learn more about the LEAP APIs [here](https://github.com/onnx/turnkeyml/tree/main/examples/lemonade).
+You can learn more about the high-level APIs [here](https://github.com/onnx/turnkeyml/tree/main/examples/lemonade).
 
 ## Low-Level API
 
@@ -207,13 +193,13 @@ Here's a quick example of how to prompt a Hugging Face LLM using the low-level A
 
 ```python
 import lemonade.tools.torch_llm as tl
-import lemonade.tools.chat as cl
+import lemonade.tools.prompt as pt
 from turnkeyml.state import State
 
 state = State(cache_dir="cache", build_name="test")
 
 state = tl.HuggingfaceLoad().run(state, input="facebook/opt-125m")
-state = cl.Prompt().run(state, prompt="hi", max_new_tokens=15)
+state = pt.Prompt().run(state, prompt="hi", max_new_tokens=15)
 
 print("Response:", state.response)
 ```