-
Notifications
You must be signed in to change notification settings - Fork 261
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: niklub <[email protected]> Co-authored-by: Caitlin Wheeless <[email protected]> Co-authored-by: Micaela Kaplan <[email protected]> Co-authored-by: Micaela Kaplan <[email protected]>
- Loading branch information
1 parent
a50e31e
commit 6f7637c
Showing
14 changed files
with
1,002 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
*.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
# syntax=docker/dockerfile:1 | ||
ARG PYTHON_VERSION=3.11 | ||
|
||
FROM python:${PYTHON_VERSION}-slim AS python-base | ||
ARG TEST_ENV | ||
|
||
WORKDIR /app | ||
|
||
ENV PYTHONUNBUFFERED=1 \ | ||
PYTHONDONTWRITEBYTECODE=1 \ | ||
PORT=${PORT:-9090} \ | ||
PIP_CACHE_DIR=/.cache \ | ||
WORKERS=1 \ | ||
THREADS=8 | ||
|
||
# Update the base OS | ||
RUN --mount=type=cache,target="/var/cache/apt",sharing=locked \ | ||
--mount=type=cache,target="/var/lib/apt/lists",sharing=locked \ | ||
set -eux; \ | ||
apt-get update; \ | ||
apt-get upgrade -y; \ | ||
apt install --no-install-recommends -y \ | ||
git; \ | ||
apt-get autoremove -y | ||
|
||
# install base requirements | ||
COPY requirements-base.txt . | ||
RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \ | ||
pip install -r requirements-base.txt | ||
|
||
# install custom requirements | ||
COPY requirements.txt . | ||
RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \ | ||
pip install -r requirements.txt | ||
|
||
# install test requirements if needed | ||
COPY requirements-test.txt . | ||
# build only when TEST_ENV="true" | ||
RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \ | ||
if [ "$TEST_ENV" = "true" ]; then \ | ||
pip install -r requirements-test.txt; \ | ||
fi | ||
|
||
COPY . . | ||
|
||
EXPOSE 9090 | ||
|
||
CMD gunicorn --preload --bind :$PORT --workers $WORKERS --threads $THREADS --timeout 0 _wsgi:app |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
|
||
|
||
# Integrate WatsonX to Label Studio | ||
WatsonX offers a suite of machine learning tools, including access to many LLMs, prompt | ||
refinement interfaces, and datastores via WatsonX.data. When you integrate WatsonX with Label Studio, you get | ||
access to these models and can automatically keep your annotated data up to date in your WatsonX.data tables. | ||
|
||
To run the integration, you'll need to pull this repo and host it locally or in the cloud. Then, you can link the model | ||
to your Label Studio project under the `models` section in the settings. To use the WatsonX.data integration, | ||
set up a webhook in settings under `webhooks` by using the following structure for the link: | ||
`<link to your hosted container>/data/upload` and set the triggers to `ANNOTATION_CREATED` and `ANNOTATION_UPDATED`. For more | ||
on webhooks, see [our documentation](https://labelstud.io/guide/webhooks) | ||
|
||
See the configuration notes at the bottom for details on how to set up your environment variables to get the system to work. | ||
|
||
## Setting up your label_config | ||
For this project, we reccoment you start with the labeling config as defined below, but you can always edit it or expand it to | ||
meet your needs! Crucially, there must be a `<TextArea>` tag for the model to insert its response into. | ||
|
||
<View> | ||
<Style> | ||
.lsf-main-content.lsf-requesting .prompt::before { content: ' loading...'; color: #808080; } | ||
|
||
.text-container { | ||
background-color: white; | ||
border-radius: 10px; | ||
box-shadow: 0px 4px 6px rgba(0, 0, 0, 0.1); | ||
padding: 20px; | ||
font-family: 'Courier New', monospace; | ||
line-height: 1.6; | ||
font-size: 16px; | ||
} | ||
</Style> | ||
<Header value="Context:"/> | ||
<View className="text-container"> | ||
<Text name="context" value="$text"/> | ||
</View> | ||
<Header value="Prompt:"/> | ||
<View className="prompt"> | ||
<TextArea name="prompt" | ||
toName="context" | ||
rows="4" | ||
editable="true" | ||
maxSubmissions="1" | ||
showSubmitButton="false" | ||
placeholder="Type your prompt here then Shift+Enter..." | ||
/> | ||
</View> | ||
<Header value="Response:"/> | ||
<TextArea name="response" | ||
toName="context" | ||
rows="4" | ||
editable="true" | ||
maxSubmissions="1" | ||
showSubmitButton="false" | ||
smart="false" | ||
placeholder="Generated response will appear here..." | ||
/> | ||
<Header value="Overall response quality:"/> | ||
<Rating name="rating" toName="context"/> | ||
</View> | ||
|
||
|
||
## Setting up WatsonX.Data | ||
To use your WatsonX.data integration, follow the steps below. | ||
1. First, get the host and port information of the engine that you'll be using. To do this, navigate to the Infrastructure Manager | ||
on the left sidebar of your WatsonX.data page and select the Infrastructure Manager. Change to list view by clicking the symbol in | ||
the upper right hand corner. From there, click on the name of the engine you'll be using. This will bring up a pop up window, | ||
where you can see the host and port information under "host". The port is the part after the `:` at the end of the url. | ||
2. Next, make sure your catalog is set up. To create a new catalog, follow [these instructions](https://dataplatform.cloud.ibm.com/docs/content/wsj/catalog/create-catalog.html?context=wx&locale=en) | ||
3. Once your catalog is set up, make sure that the correct schema is also set up. Navigate to your Data Manager and select `create` to create a new schema | ||
4. With all of this information, you're ready to update the environment variables listed at the bottom of this page and get started with your WatsonX.data integration! | ||
|
||
|
||
## Running with Docker (recommended) | ||
|
||
1. Start Machine Learning backend on `http://localhost:9090` with prebuilt image: | ||
|
||
```bash | ||
docker-compose up | ||
``` | ||
|
||
2. Validate that backend is running | ||
|
||
```bash | ||
$ curl http://localhost:9090/ | ||
{"status":"UP"} | ||
``` | ||
|
||
3. Create a project in Label Studio. Then from the **Model** page in the project settings, [connect the model](https://labelstud.io/guide/ml#Connect-the-model-to-Label-Studio). The default URL is `http://localhost:9090`. | ||
|
||
|
||
## Building from source (advanced) | ||
|
||
To build the ML backend from source, you have to clone the repository and build the Docker image: | ||
|
||
```bash | ||
docker-compose build | ||
``` | ||
|
||
## Running without Docker (advanced) | ||
|
||
To run the ML backend without Docker, you have to clone the repository and install all dependencies using pip: | ||
|
||
```bash | ||
python -m venv ml-backend | ||
source ml-backend/bin/activate | ||
pip install -r requirements.txt | ||
``` | ||
|
||
Then you can start the ML backend: | ||
|
||
```bash | ||
label-studio-ml start ./dir_with_your_model | ||
``` | ||
|
||
## Configuration | ||
|
||
Parameters can be set in `docker-compose.yml` before running the container. | ||
|
||
The following common parameters are available: | ||
- `BASIC_AUTH_USER` - Specify the basic auth user for the model server. | ||
- `BASIC_AUTH_PASS` - Specify the basic auth password for the model server. | ||
- `LOG_LEVEL` - Set the log level for the model server. | ||
- `WORKERS` - Specify the number of workers for the model server. | ||
- `THREADS` - Specify the number of threads for the model server. | ||
|
||
The following parameters allow you to link the WatsonX models to Label Studio: | ||
|
||
- `LABEL_STUDIO_URL` - Specify the URL of your Label Studio instance. Note that this might need to be `http://host.docker.internal:8080` if you are running Label Studio on another Docker container. | ||
- `LABEL_STUDIO_API_KEY`- Specify the API key for authenticating your Label Studio instance. You can find this by logging into Label Studio and and [going to the **Account & Settings** page](https://labelstud.io/guide/user_account#Access-token). | ||
- `WATSONX_API_KEY`- Specify the API key for authenticating into WatsonX. You can generate this by following the instructions at [here](https://www.ibm.com/docs/en/watsonx/watsonxdata/1.0.x?topic=started-generating-api-keys) | ||
- `WATSONX_PROJECT_ID`- Specify the ID of your WatsonX project from which you will run the model. Must have WML capabilities. You can find this in the `General` section of your project, which is accessible by clicking on the project from the homepage of WatsonX. | ||
- `WATSONX_MODELTYPE`- Specify the name of the WatsonX model you'd like to use. A full list can be found in [IBM's documentation](https://ibm.github.io/watsonx-ai-python-sdk/fm_model.html#TextModels:~:text=CODELLAMA_34B_INSTRUCT_HF) | ||
- `DEFAULT_PROMPT` - If you want the model to automatically predict on new data samples, you'll need to provide a default prompt or the location to a default prompt file. | ||
- `USE_INTERNAL_PROMPT` - If using a default prompt, set to 0. Otherwise, set to 1. | ||
|
||
The following parameters allow you to use the webhook connection to transfer data from Label Studio to WatsonX.data: | ||
|
||
-`WATSONX_ENG_USERNAME`- MUST be `ibmlhapikey` for the intergration to work. | ||
|
||
To get the host and port information below, you can folllow the steps under [Pre-requisites](https://cloud.ibm.com/docs/watsonxdata?topic=watsonxdata-con-presto-serv#conn-to-prestjava). | ||
|
||
- `WATSONX_ENG_HOST` - the host information for your WatsonX.data Engine | ||
- `WATSONX_ENG_PORT` - the port information for your WatsonX.data Engine | ||
- `WATSONX_CATALOG` - the name of the catalog for the table you'll insert your data into. Must be created in the WatsonX.data platform. | ||
- `WATSONX_SCHEMA` - the name of the schema for the table you'll insert your data into. Must be created in the WatsonX.data platofrm. | ||
- `WATSONX_TABLE` - the name of the table you'll insert your data into. Does not need to be already created. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
from werkzeug.middleware.dispatcher import DispatcherMiddleware | ||
from flask import Flask | ||
|
||
from data_wsgi import application as data | ||
import model_wsgi as model | ||
|
||
""" | ||
Here, we create a Flask app to serve as a wrapper for both the ml-backend model api and the webhook api. By doing this, | ||
we can host both behind the same endpoint, with the model accessible at <host_url>/ and the webhook accessible at | ||
<host_url>/data/. We set app.wsgi_app in this way so that we can run our tests. | ||
""" | ||
app = Flask(__name__) | ||
|
||
app.wsgi_app = DispatcherMiddleware(model.application.wsgi_app, { | ||
'/data': data.wsgi_app | ||
}) |
174 changes: 174 additions & 0 deletions
174
label_studio_ml/examples/watsonx_llm/data_transfer_app.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,174 @@ | ||
import csv | ||
import logging | ||
import os | ||
import prestodb | ||
import traceback | ||
from flask import Flask, request, jsonify, Response | ||
from label_studio_sdk.client import LabelStudio | ||
from typing import List | ||
|
||
logger = logging.getLogger(__name__) | ||
_server = Flask(__name__) | ||
|
||
|
||
def init_app(): | ||
return _server | ||
|
||
|
||
@_server.route('/health', methods=['GET']) | ||
@_server.route('/', methods=['GET']) | ||
def health(): | ||
return jsonify({ | ||
'status': 'UP', | ||
'model_class': MODEL_CLASS.__name__ | ||
}) | ||
|
||
|
||
@_server.route('/upload', methods=['POST']) | ||
def upload_to_watsonx(): | ||
# First, collect data from the request object passed by label studio | ||
input_request = request.json | ||
action = input_request["action"] | ||
annotation = input_request["annotation"] | ||
task = input_request["task"] | ||
|
||
# Connect to Label Studio | ||
client = connect_ls() | ||
data = get_data(annotation, task, client) | ||
|
||
# Then, connect to WatsonX.data via prestodb | ||
eng_username = os.getenv("WATSONX_ENG_USERNAME") | ||
eng_password = os.getenv("WATSONX_API_KEY") | ||
eng_host = os.getenv("WATSONX_ENG_HOST") | ||
eng_port = os.getenv("WATSONX_ENG_PORT") | ||
catalog = os.getenv("WATSONX_CATALOG") | ||
schema = os.getenv("WATSONX_SCHEMA") | ||
table = os.getenv("WATSONX_TABLE") | ||
|
||
if None in [eng_username, eng_password, eng_host, eng_port, catalog, schema, table]: | ||
raise Exception("You must provide the required WATSONX variables in your docker-compose.yml file!") | ||
|
||
try: | ||
with prestodb.dbapi.connect(host=eng_host, port=eng_port, user=eng_username, catalog=catalog, | ||
schema=schema, http_scheme='https', | ||
auth=prestodb.auth.BasicAuthentication(eng_username, eng_password)) as conn: | ||
|
||
cur = conn.cursor() | ||
# dynamically create table schema | ||
table_create, table_info_keys = create_table(table, data) | ||
cur.execute(table_create) | ||
|
||
if action == "ANNOTATION_CREATED": | ||
# upload new annotation to watsonx | ||
values = tuple([data[key] for key in table_info_keys]) | ||
insert_command = f"""INSERT INTO {table} VALUES {values}""" | ||
logger.debug(insert_command) | ||
cur.execute(insert_command) | ||
|
||
elif action == "ANNOTATION_UPDATED": | ||
# update existing annotation in watsonx by deleting the old one and uploading a new one | ||
delete = f"""DELETE from {table} WHERE ID={data["ID"]}""" | ||
logger.debug(delete) | ||
cur.execute(delete) | ||
values = tuple([data[key] for key in table_info_keys]) | ||
insert_command = f"""INSERT INTO {table} VALUES {values}""" | ||
logger.debugint(insert_command) | ||
cur.execute(insert_command) | ||
|
||
elif action == "ANNOTATIONS_DELETED": | ||
# delete existing annotation in watsonx | ||
delete = f"""DELETE from {table} WHERE ID={data["ID"]}""" | ||
logger.debug(delete) | ||
cur.execute(delete) | ||
|
||
conn.commit() | ||
except Exception as e: | ||
logger.debug(traceback.format_exc()) | ||
logger.debug(e) | ||
|
||
|
||
def connect_ls(): | ||
try: | ||
base_url = os.getenv("LABEL_STUDIO_URL") | ||
api_key = os.getenv("LABEL_STUDIO_API_KEY") | ||
|
||
if None in [base_url, api_key]: | ||
raise Exception( | ||
"You must provide your LABEL_STUDIO_URL and LABEL_STUDIO_API_KEY in your docker-compose.yml file!") | ||
|
||
client = LabelStudio( | ||
base_url=base_url, | ||
api_key=api_key | ||
) | ||
|
||
return client | ||
|
||
except Exception as e: | ||
logger.debug(traceback.format_exc()) | ||
logger.debug(e) | ||
|
||
|
||
def get_data(annotation, task, client): | ||
"""Collect the data to be uploaded to WatsonX.data""" | ||
info = {} | ||
|
||
try: | ||
users = client.users.list() | ||
id = task["id"] | ||
annotator_complete = annotation["completed_by"] | ||
annotator_update = annotation["updated_by"] | ||
annotator_complete = next((x.email for x in users if x.id == annotator_complete), "") | ||
annotator_update = next((x.email for x in users if x.id == annotator_update), "") | ||
info.update({"ID": int(id), "completed_by": annotator_complete, "updated_by": annotator_update}) | ||
for key, value in task["data"].items(): | ||
if isinstance(value, List): | ||
value = value[0] | ||
elif isinstance(value, str) and value.isnumeric(): | ||
value = int(value) | ||
|
||
if isinstance(value, str): | ||
value = value.strip("\"") | ||
info.update({key: value}) | ||
|
||
for result in annotation["result"]: | ||
logger.debug(result) | ||
val_dict_key = list(result["value"].keys())[0] | ||
value = result["value"][val_dict_key] | ||
key = result["from_name"] | ||
if isinstance(value, List): | ||
value = value[0] | ||
elif isinstance(value, str) and value.isnumeric(): | ||
value = int(value) | ||
|
||
if isinstance(value, str): | ||
value = value.strip("\"") | ||
info.update({key: value}) | ||
|
||
logger.debug(f"INFO {info}") | ||
return info | ||
except Exception as e: | ||
logger.debug(traceback.format_exc()) | ||
logger.debug(e) | ||
|
||
|
||
def create_table(table, data): | ||
""" | ||
Create the command for building a new table | ||
""" | ||
table_info = {} | ||
for key, value in data.items(): | ||
if isinstance(value, int): | ||
table_info.update({key: "bigint"}) | ||
else: | ||
table_info.update({key: "varchar"}) | ||
|
||
table_info_keys = sorted(table_info.keys()) | ||
table_info_keys.insert(0, table_info_keys.pop(table_info_keys.index("ID"))) | ||
nl = ",\n" | ||
strings = [f"{key} {table_info[key]}" for key in table_info_keys] | ||
table_create = f""" | ||
CREATE TABLE IF NOT EXISTS {table} ({nl.join(strings)}) | ||
""" | ||
logger.debug(table_create) | ||
return table_create, table_info_keys |
Oops, something went wrong.