Skip to content

Latest commit

 

History

History
298 lines (184 loc) · 9.35 KB

README.md

File metadata and controls

298 lines (184 loc) · 9.35 KB

Sisyphus - GPU Package Build Automation Tool

Sisyphus

Sisyphus is a command-line automation tool that streamlines the process of building GPU (CUDA) enabled and other packages across Linux, Windows and other platforms. Sisyphus integrates with Anaconda's existing rocket-platform infrastructure to eliminate manual toil from the GPU package building process.

Key features:

  • Automated environment and host preparation including CUDA setup.
  • Remote build process management across Linux (x64, aarch64), Windows and other platforms.
  • Real-time build monitoring and logging.
  • Asynchronous operations to protect against local system, network or vpn issues resulting in lost work and time.
  • Automated package upload and distribution handling to specific channels.
  • Seamless integration with existing Anaconda build infrastructure

Sisyphus currently supports building the following packages with more planned:

  • llama.cpp: CPU, CUDA, and hardware optimized versions

Prerequisites

To use Sisyphus, you need:

  • Access to rocket-platform dev instances (documentation)
  • Active Anaconda VPN connection
  • Unix-like system (Linux, MacOS) for running the tool

Sisyphus automates the build processes documented in:

Setup & Usage

You will need to have conda and git installed.

conda create -n sisyphus -c conda-forge python pip
conda activate sisyphus
git clone [email protected]:anaconda/sisyphus.git
cd sisyphus
git submodule update --init
pip install -e .

TL;DR (full example)

Start a host:

sisyphus start-host --linux

Use --windows for a Windows host instead.

Below we'll assume the above command returned the host IP address 1.2.3.4.

Build llama.cpp on host 1.2.3.4:

sisyphus build -H 1.2.3.4 -P llama.cpp

This will connect to the host, determine if it's Linux or Windows, prepare it to run CUDA builds, prepare the build, run it, then show the progress in real-time.

If you lose connection to the host, which isn't unusual, you can resume watching the build process. Losing the connection will never interrupt builds.

sisyphus watch -H 1.2.3.4 -P llama.cpp

When the build completes, you can retrieve the built packages like this:

sisyphus download -H 1.2.3.4 -P llama.cpp

Stop the host when you're done:

sisyphus stop-host 1.2.3.4

Getting help

Get general help with:

> sisyphus --help

Get help for a specific command with:

> sisyphus <command> --help

For example:

> sisyphus build --help

Commands often have options not discussed here for the sake of brevity.

Start a new host

Start a new linux host with:

sisyphus start-host --linux

Start a new windows host with:

sisyphus start-host --windows

This will create a new GPU instance using rocket-platform and return its IP address. By default, it creates a Linux g4dn.4xlarge instance with a 24-hour lifetime.

You will need to provide a GitHub token for authentication (workflow scope, SSO authenticated). Either set the GITHUB_TOKEN environment variable or pass the --token option.

Note

The system may sometimes fail to retrieve the workflow run from rocket-platform. This is an infrequent but known bug which we haven't been able to resolve yet. When that happens, you will have to go to https://github.com/anaconda-distribution/rocket-platform/actions/workflows/start.yml to locate the workflow (look for your user name and the triggered time). Select it, click on Start, Start Instance, then locate the INSTANCE_IDS variable which will list your instance ID. Finally go to https://github.com/anaconda-distribution/rocket-platform/actions/workflows/stop.yml, and use this ID to run a workflow to stop the instance. Then run the start-host command again.

Prepare the host

This step is optional. The build command will automatically prepare the host if needed.

The prepare command will create a work directory, set up Conda, and create a Conda environment. On Windows, it will also install CUDA software and drivers, which takes more time. This only needs to be done once.

To prepare the host, run:

> sisyphus prepare -H <host>

Where <host> is the IP address or FQDN of your remote host.

Remember to make sure your SSH key is correctly configured (see rocket platform dev instance docs).

Note

You do not need to define the host type, Sisyphus will automatically detect if the remote host is Linux or Windows. It will immediately disconnect from the host but the preparation will continue on the host asynchronously protecting against local system, network or vpn issues.

Logs for the Conda and, on Windows, both CUDA jobs are saved on the remote host in the work directory. On Linux this is at /tmp/sisyphus, and on Windows it's at C:\tmp\sisyphus. When sshing to the host for checking these logs, remember you should login with the ec2-user name on Linux and dev-admin on Windows.

Watch the preparation progress

This step is for when you run the prepare command manually, and only useful in case you lose the connection to the host during the prepare process (unlikely but not impossible).

To watch the preparation process, do:

> sisyphus watch -H <host>

On the default logging level (info), the output will inform in real-time of the status. In case of failure, an error exit code will be returned for use in automation, or even a Shell script or one-liner.

Build the package

Start a build with:

> sisyphus build -H <host> -P <package>

<package> is the package name as written in the URL for the feedstock. For example, if the URL is https://github.com/AnacondaRecipes/llama.cpp-feedstock, then <package> is llama.cpp.

Sisyphus will prepare the host to run CUDA builds if needed, prepare all the data locally, upload it to the host, start the build, then show the build process in real-time (unless --no-watch is specified).

If you lose connection to the host during the build process, which isn't unusual, you can use the watch command like below to resume watching the build process. Losing the connection will never interrupt builds.

Watch the build process

This command is useful in case you lose the connection to the host during the build process, which is a common occurrence.

It's the same as above for the preparation, except this time we pass the package name.

> sisyphus watch -H <host> -P <package>

On the default logging level, sisyphus will show the build output in real-time. Here too, an exit code will be returned at the end for use in automation.

Check the build status

> sisyphus status -H <host> -P <package>

This will print one of the following statuses:

  • Not started: The build hasn't started yet
  • Building: The build is currently running
  • Complete: The build finished successfully
  • Failed: The build failed

The command returns immediately without waiting for the build to finish.

Wait for build completion

> sisyphus wait -H <host> -P <package>

Wait for the build to finish and return an exit code of 0 if the build succeeded, or 1 if it failed.

This is useful for automation and scripting.

Print or download the build log

> sisyphus log -H <host> -P <package>

Print the build log in your terminal. By default, it will wait for the build to finish before printing the log, unless --no-wait is specified. The output can, and probably should, be piped to a pager like less or be redirected to a file to save it.

Transmute packages

This step is optional. The download command will automatically transmute packages as needed before downloading them.

Transmute built packages with:

sisyphus transmute -H <host> -P <package>

Sisyphus will automaticaly convert all .tar.bz2 packages to .conda packages, and vice-versa, as needed.

Download built packages

Download built packages with:

sisyphus download -H <host> -P <package>

Sisyphus will automatically transmute packages as needed before downloading them.

Uploading packages to anaconda.org

Upload packages to anaconda.org with:

sisyphus upload -H <host> -P <package> -C <channel> -t <token>

Important

Windows packages need to be signed first. Upload the packages to a temporary channel, then run the code signing action at https://github.com/anaconda-distribution/rocket-platform/actions/workflows/codesign-windows.yml

Stop the host

Don't forget to stop the host when you're done. Hosts cost money per hour they run.

Stop a host with:

sisyphus stop-host  <host>

Where <host> can be either the IP address or the instance ID. If using the IP address, the tool will automatically retrieve the instance ID from the host before stopping it.

You will need to provide a GitHub token for authentication. Either set the GITHUB_TOKEN environment variable or pass the --token option.