Skip to content

AndriiZelenko/mmpie1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mmpie1

Discord Demo

Awesome mmpie1 created by AndriiZelenko

Installation

$ git clone [email protected]:AndriiZelenko/mmpie1.git && cd mmpie1

You may use standard python tools (pip) as desired, but it is recommended to use Rye, in which case all you need to do is:

$ rye sync

To run the end-to-end demo application, refer to the Demo section below.

Package Structure

The mmpie1 package is an end-to-end machine learning framework for training and deploying models. It is meant to help kickstart new projects, containing all the structural components necessary to train, manage and deploy models. Mmpie1 has been created in a modular design, so if a particular component is not needed, it can simply be deleted.

At a high level, the mmpie1 package covers the following components of the ML lifecycle:

  • Training framework, built on top of PyTorch Lightning, including data, model, and configuration management;
  • Lifecycle management, built on top of MLFlow, including experiment tracking, model versioning, and model serving;
  • Packaging, including both python (pip install mmpie1) and docker (docker compose up) functionality;
  • Model serving, built on top of FastAPI, including a REST API for model access through a discord client
  • Quality of life tools, including unit tests, automated documentation, and linting.

The Mmpie1 package has been tested on Ubuntu 22.04 with Python 3.10.12.

Package Components

The mmpie1 package is organized into the following components:

Components
  • docs. Contains the build files for Sphinx documentation. Refer to the Documentation section for more details;
  • docker. Contains Dockerfiles for building the mmpie1 Docker image. Refer to the Docker module for more details;
  • mmpie1: The main package, containing all the core functionality of the project;
    • mmpie1/backend: Contains modules for running each of the discord, gateway and training servers;;
    • mmpie1/configs: Contains the (Hydra) yaml files for model training;
    • mmpie1/core: Contains core mmpie1 packages, including:
      • Config. This class provides access to the associated config.ini file, used for package-level configurations;
      • Mmpie1Base. This class serves as a helper base class. It provides unified logging functionality, among other features.
    • mmpie1/data. Contains modules for individual pl.LightningDataModule datasets;
    • mmpie1/models. Contains modules for individual torch.nn.Module models; also includes a torch Lightning wrapper class for torch.nn.Modules, which can be used to wrap models during training;
    • mmpie1/modules: Modules used within the package, including model Registry and GPT classes;
    • mmpie1/utils: Utility functions used throughout the package. Notable utilities include:
      • conversions. Provides methods for converting and serializing PIL images, used to pass images between servers;
      • logging. Provides a default_logger method, may be used to customize the logger provided by the BaseMmpie1 class;
    • mmpie1/scripts: Standalone scripts built on top of the mmpie1 package; the project starts with a training script that demonstrates how to set up a training pipeline to store models to the model registry;
  • tests: Contains unit tests for the mmpie1 package. Refer to the Unit Tests section for more details;

Package Management

It is highly recommended to use Rye as your package manager. In addition to handling your virtual environment and dependencies for you, a number of useful commands have also been included, which you can use through rye run <command>.

Linting

To lint your code with pylint, isort and black:

$ rye run lint
Output
pylint mmpie1/

------------------------------------
Your code has been rated at 10.00/10

pylint --disable=protected-access tests/

-------------------------------------------------------------------
Your code has been rated at 10.00/10


isort -l 120 --check mmpie1/

isort -l 120 --check tests/

black -l 120 --check mmpie1/
All done! ✨ 🍰 ✨
43 files would be left unchanged.

black -l 120 --check tests/
All done! ✨ 🍰 ✨
14 files would be left unchanged.

Unit Tests

To run unit tests with pytest:

$ rye run test
Output
============================= test session starts ==============================
platform linux -- Python 3.11.6, pytest-7.4.4, pluggy-1.3.0
rootdir: ~/mmpie1
plugins: cov-4.1.0, anyio-4.2.0, hydra-core-1.3.2
collected 13 items                                                             

tests/mmpie1/core/test_config.py ...                                 [ 23%]
...
tests/mmpie1/utils/test_timer_collection.py .                        [100%]

---------- coverage: platform linux, python 3.11.6-final-0 -----------
Name                                               Stmts   Miss  Cover   Missing
--------------------------------------------------------------------------------
mmpie1/__init__.py                                 3      0   100%
...
mmpie1/utils/timer_collection.py                  28      0   100%
--------------------------------------------------------------------------------
TOTAL                                               1134    854    25%


=========================== short test summary info ============================
SKIPPED [1] tests/mmpie1/core/test_registry.py:12: No runs found in MLflow.
======================== 12 passed, 1 skipped in 7.31s =========================

Auto-formatting

To auto-format your code with isort and black:

$ rye run format 
Output
isort -l 120 mmpie1/

black -l 120 mmpie1/
All done! ✨ 🍰 ✨
43 files left unchanged.

isort -l 120 tests/

black -l 120 tests/
All done! ✨ 🍰 ✨
14 files left unchanged.

Building the Docs

Build the docs using sphinx:

$ rye run docs

Both HTML and PDF docs will be built, located in docs/_build/html (i.e. open index.html in a browser) and docs/_build/simplepdf/Mmpie1.pdf respectively.

Dependency Graph

To generate a dependency graph of the project, use pylint and graphviz. Make sure graphviz is installed:

apt-get install graphviz

And then run:

rye run graph-dependencies

Which should generate two files in the root directory:

packages.png

Packages

classes.png

Classes

You may use these graphs to help get a quick overview of the project, delete superfluous code and avoid circular dependencies.

Building the Package

To build the package:

rye build
Output
building mmpie1
* Creating virtualenv isolated environment...
* Installing packages in isolated environment... (hatchling)
* Getting build dependencies for sdist...
* Building sdist...
* Building wheel from sdist
* Creating virtualenv isolated environment...
* Installing packages in isolated environment... (hatchling)
* Getting build dependencies for wheel...
* Building wheel...
Successfully built mmpie1-0.1.0.tar.gz and mmpie1-0.1.0-py3-none-any.whl

Continuous Integration

You have a starter CI workflow in .github/workflows/ci.yml that will lint and test your project on Linux/MacOS/Windows. By default they will run with every push / pull request and can be accessed directly from GithubActions.

End-to-End Demo

The end-to-end application demo allows training and deploying models through a private discord server. It is meant to showcase the full functionality of the mmpie1 package, and provide a starting point for deploying your own models.

After setting up the backend servers, an end user will be able to do the following through your private discord server:

  1. Train a model. The user may start a training job, passing in Hydra command line options to configure the training or run sweeps. The training job will be tracked in MLFlow, with logs and metrics available through Tensorboard. The user will be notified when the training job completes, with all resulting models stored in the model registry.
  2. Deploy a model. The user may deploy a model from the model registry, which will be served through the gateway server on subsequent requests.
  3. Run inference. The user may run the deployed model, passing an image through the discord bot to be classified.

The following helper commands are also exposed through the demo:

  1. Registry summary. The user may request a summary of the model registry, including all models and their associated metrics.
  2. Server logs. The user may request the server logs, which are kept by each backend server separately. In practice these logs would likely be used through an administrator or developer account.

Finally, the demo is set up to also make use of GPT, if an OpenAI API key is provided. In this case, the following additional commands are available:

  1. Chat. The user may chat with GPT, which will be used as a general chat agent.
  2. Debug. The user may request GPT to help debug any problem with the model or servers. In this case the server logs are automatically passed to GPT, which will provide debug advice on any errors to the user.

Application Structure

Package Structure

The overall component structure of the demo application is shown above. To run the demo, you will need to set up a private Discord server for the frontend, and then run each of the backend servers in turn. The instructions for each step are detailed below.

Set up your front-end deployment

The end-to-end demo uses Discord as the front-end deployment environment, as it is generally very easy to set up a new discord server and associated bot for deployment. Indeed, many companies have used Discord as a deployment environment for their products at scale to great success (e.g. consider Midjourney).

First, create a new discord server, and then create an associated new discord bot. Add the bot to your server as instructed.

Finally, add your discord bot token to your config.ini file under API_KEYS/DISCORD. You may find this key by going to the discord developer portal and navigating to the Bot tab. Click on Reset Token and then copy/paste the new token to your config.ini file.

Congrats! Your application now has a front-end deployment environment!

(Optional) Set up GPT

While not absolutely necessary, the demo is set up to incorporate GPT for end-client ease-of-use. By default, it will be used as a general chat agent for anyone DMing the bot, and can be prompted more directly by setting its system prompt. More, as a concrete example, a sample debug command is provided, which will pass the server logs to GPT in an effort to get it to provide debug advice on any errors.

If you have not, sign up and log into the OpenAI Developer Platform and then navigate to your API keys. Create a new key and copy/paste it into your config.ini file under API_KEYS/OPENAI.

Start the backend servers

The demo includes a relatively sophisticated end-to-end deployment architecture. While it has many components, they are all modular and meant to be able to put on separate machines, as appropriate, to scale to an actual production environment.

The backend consists of the following servers:

  1. MLFlow Tracking and Registry Server. The MLFlow server is used to track experiments and store models in the registry;
  2. Training Server (Optional). The training server is used to train models and store them in the registry; it is only needed if you wish to train models from discord (i.e. use the >train command in discord). It is not needed to train models locally.
  3. Deployment Server. The deployment server is used to serve models from the registry.
  4. Discord Client. The discord client receives requests from the discord channel and forwards them to the backend to be processed;
  5. Gateway Server. The gateway server is used to serve models from the registry. It accepts any user requests not processed by the discord client (i.e. all requests relating to model inference/training and the model registry). Under the hood it communicates with the MLFlow, Training and Deployment servers, as appropriate;
  6. Tensorboard (Optional). Tensorboard is used to monitor training progress. It is not accessible from the discord client, but can be accessed directly through a local browser. It is only needed if you wish to monitor training progress.

All servers may be run locally, and may be started as described below.

Notes:

  • If you choose different ports for the servers, you will need to update the hosts listed in your config.ini file to match, or pass them in as command line arguments to each other server.
  • The default configs assume that output data (MLFlow registry, hydra training runs, tensorboard logs, standard debug logs, etc.) will be stored in ${HOME}/mmpie1 and, for unit tests, that the project has been installed in ${HOME}/projects/mmpie1. If you wish to change these locations, you will need to update the same config.ini file accordingly. To create the default directories automatically, you may run rye run create_project_directories, which will create all directories according to your config.ini file.
  • The number of workers on the training server (-w 4) determines how many simultaneous training runs may be done. If you set the number too high you may run out of memory. In practice, you will likely want to run the training server in a completely separate environment, and configure each training job to get a separate GPU.
  • If, at any point, you get an error saying mmpie1 cannot be found, remember to add the mmpie1 path to your PYTHONPATH variable. E.g.
PYTHONPATH=${HOME}/projects/mmpie1 [... continue command]
  1. Start the MLFlow Tracking and Registry server:
rye run mlflow_server
Without Rye
mlflow server --backend-store-uri ${HOME}/mmpie1/mlflow --port 8080

Once the MLFlow server is up and running, you may get local access by opening a browser to its address:

Image

MLFlow Model Registry

  1. Start the Gateway server:
rye run gateway_server
Without Rye
python -m gunicorn -w 1 -b localhost:8081 -k uvicorn.workers.UvicornWorker "mmpie1.backend.gateway.gateway_server:app()"
  1. Start the Training server:
rye run training_server
Without Rye
python -m gunicorn -w 4 -b localhost:8082 -k uvicorn.workers.UvicornWorker "mmpie1.backend.training.training_server:app()"
  1. Start the Deployment server:
rye run deployment_server
Without Rye
python -m gunicorn -w 1 -b localhost:8083 -k uvicorn.workers.UvicornWorker "mmpie1.backend.deployment.deployment_server:app()"
  1. Start the Discord client:
rye run discord_client
Without Rye
python mmpie1/backend/discord/discord_client.py
  1. (Optional) Start Tensorboard:
tensorboard --logdir ${HOME}/mltemlpate/tensorboard
Image

Tensorboard Example

Advanced Deployment

For a more streamlined deployment, follow the instructions in the docker readme, in which case you may configure the deployment through a single Docker Compose file. Then all the backend servers can be deployed with a single Docker Compose call from the docker directory:

docker compose up

Demo

Once the backend servers are up and running, you may showcase your demo application through your discord server. You may run the following commands in any channel the bot has access to, or DM the bot directly. If DMing the bot directly, all non-command messages will equate to running the >chat command.

The demo comes with the following commands out-of-the-box.

List Commands

List Commands

Train a model

Train Model

Examples:

>train
>train --config-name train.yaml --model=cnn --dataset=mnist
>train --config-name train.yaml --model=cnn --dataset=mnist --trainer.max_epochs=2,5,10,15,20 --multirun

Any given arguments are directly passed to and parsed by Hydra. In particular, note that if you run a parameter sweep (by providing multiple values for a given parameter), you must also pass the --multirun flag. Training will run in the background on the training server, and store the trained models in the model registry when complete. You will be notified when the training job completes.

Deploy a model

Deploy Model

When deploying a model, pass in the model name and version number. For example,

>load_model ModelName 1

Run inference

Inference

There are two commands for running inference:

>classify_id TEST_ID_INTEGER
>classify_image [UPLOAD IMAGE]

In the first case, you may specify an integer ID from the test dataset, which will be loaded and classified by the model. In the second case, you may upload an image (drag and drop the image file into your Discord message); example MNIST images may be found in the tests/resources directory.

Registry Summary

Registry Summary

You may use the >registry_summary command at any time to get a summary of the model registry, including all models and their associated metrics.

Server Logs

Server Logs

You may use the >logs command at any time to get the server logs.

Chat

Chat

You may use the >chat command at any time to converse freely with GPT. This command is implied if you DM the bot with a general message.

Debug

Debug

You may use the debug command at any time to request GPT to help debug any problem with the model or servers. The server logs are automatically passed to GPT. The debug command itself may be run alone, or the user may optionally pass a message to GPT to help it understand the context of the debug request.

Examples:

>debug
>debug Check the training logs to see if the last training request is still running or returned an error.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published