Skip to content

Commit

Permalink
feat: nvidia gpu operator and ca signed tls certs (#39)
Browse files Browse the repository at this point in the history
Co-authored-by: Mark Van Aken <[email protected]>

## Features:

- add NVIDIA GPU operator Zarf package
- default to containerd configuration for NVIDIA runtime
- updates, additions, and improvements to README and Docs
- add CA cert creation and machine-level injection
- add ability to provide CA-signed TLS certs for different base domains

## Dependencies:

- upgrade to UDS Core 0.23.0
- upgrade to LFAI 0.9.1

## Fixes:

- fix RKE2 CoreDNS rewrites
- fix namespace and values organization
- fix DOMAIN variable in tasks and CoreDNS override
- fix LPP config and docs for multi-node, HA deployments

## Chores:

- removed Pepr exemptions race-condition workaround
- removed Supabase studio ConfigMap workaround
  • Loading branch information
justinthelaw authored Jul 20, 2024
1 parent b4cbbb7 commit 3b8ec24
Show file tree
Hide file tree
Showing 59 changed files with 2,212 additions and 502 deletions.
17 changes: 12 additions & 5 deletions .github/workflows/build-test.yaml → .github/workflows/e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,18 @@ jobs:
registry1Password: ${{secrets.IRON_BANK_ROBOT_PASSWORD}}
ghToken: ${{ secrets.GITHUB_TOKEN }}

- name: Create /opt/uds LVM
- name: Test Create Default LVM
run: |
sudo mkdir -p /opt/uds
sudo chown -Rv 65534:65534 /opt/uds
sudo uds run create:logical-volume
- name: Test the UDS RKE2 + Custom Zarf Init Bootstrap (`local-path`)
- name: Test Create Default TLS Cert
run: |
sudo uds run uds-rke2-local-path-test --no-progress --log-level warn -a amd64
sudo uds run create-tls-local-path-dev
- name: Test Deploy UDS RKE2
run: |
sudo uds run test:uds-rke2 --set VERSION=dev --log-level warn
- name: Test Deploy `local-path` Flavor Custom Zarf Init
run: |
sudo uds run test:local-path-minio-init --set VERSION=dev --log-level warn
4 changes: 2 additions & 2 deletions .github/workflows/tag-and-release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,8 @@ jobs:
echo "Publishing for tag: ${{ github.ref }}"
if [[ "${{ github.ref }}" == "refs/tags/dev" ]]; then
sudo uds run release-dev --set VERSION=dev --no-progress --no-log-file --log-level debug
sudo uds run release
else
sudo uds run release --no-progress --no-log-file --log-level debug
sudo uds run release-dev
fi
shell: bash
2 changes: 2 additions & 0 deletions .github/workflows/zarf-lint.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,5 @@ jobs:
check-jsonschema packages/minio/zarf.yaml --schemafile zarf.schema.json
check-jsonschema packages/local-path/zarf.yaml --schemafile zarf.schema.json
check-jsonschema packages/rook-ceph/zarf.yaml --schemafile zarf.schema.json
check-jsonschema packages/leapfrogai/zarf.yaml --schemafile zarf.schema.json
check-jsonschema packages/nvidia-gpu-operator/zarf.yaml --schemafile zarf.schema.json
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@ tmp/
uds-bundle-*.tar.zst

# Secrets
**/*.cert
**/*.key
tls/
cert*/
**/*.cert*
**/*.key*

# Builds
build/
Expand Down
10 changes: 10 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,16 @@ repos:
files: "deploy.yaml"
types: [yaml]
args: ["--schemafile", "tasks.schema.json"]
- id: check-jsonschema
name: "Validate Setup Tasks Against Schema"
files: "setup.yaml"
types: [yaml]
args: ["--schemafile", "tasks.schema.json"]
- id: check-jsonschema
name: "Validate Test Tasks Against Schema"
files: "test.yaml"
types: [yaml]
args: ["--schemafile", "tasks.schema.json"]
- repo: local
hooks:
- id: delete-schema
Expand Down
169 changes: 60 additions & 109 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,56 +9,43 @@ This Zarf package serves as an air-gapped production environment for deploying [

See the [UDS RKE2 Mermaid diagram](docs/DIAGRAM.md) for visual representations of the tech stack's components and order of operations.

## Table of Contents

1. [Pre-Requisites](#pre-requisites)
2. [Usage](#usage)
- [Virtual Machines](#virtual-machines)
- [Bundles](#bundles)
- [Quick Start](#quick-start)
- [Latest](#latest)
- [Development](#development)
3. [Additional Info](#additional-info)

## Pre-Requisites

### Deployment Target
The following are requirements for an environment where a user is deploying UDS RKE2 and its custom components and applications.

- A base installation of [Ubuntu Server 20.04+](https://ubuntu.com/download/server) on the node's host system
- A base installation of [Ubuntu 20.04 or 22.04](https://ubuntu.com/download/server) on the node's host system (server or desktop)
- [UDS CLI](https://github.com/defenseunicorns/uds-cli/blob/main/README.md#install) using the versions specified in the [UDS Common repository](https://github.com/defenseunicorns/uds-common/blob/main/README.md#supported-tool-versions)
- See the RKE2 documentation for host system [pre-requisites](https://docs.rke2.io/install/requirements)
- See the Rook-Ceph documentation for the host system [pre-requisites](https://rook.io/docs/rook/latest-release/Getting-Started/Prerequisites/prerequisites/) based on the node's role and the cluster's configurations

### UDS CLI Aliasing

Below are instructions for adding UDS CLI aliases that are useful for deployments that occur in an air-gap with only the UDS CLI binary available to the delivery engineer.
- See the [Application-Specific](#application-specific) and [Flavor-Specific Infrastructure](#flavor-specific-infrastructure) configuration sections for instruction on setup based on what is deployed atop UDS RKE2

For general CLI UX, put the following in your shell configuration (e.g., `/root/.bashrc`):

```bash
alias k="uds zarf tools kubectl"
alias kubectl="uds zarf tools kubectl"
alias zarf='uds zarf'
alias k9s='uds zarf tools monitor'
alias udsclean="uds zarf tools clear-cache && rm -rf ~/.uds-cache && rm -rf ~/.zarf-cache && rm -rf /tmp/uds* && rm -rf /tmp/zarf-*"
```

For fulfilling `xargs` and `kubectl` binary requirements necessary for running some of the _optional_ deployment helper scripts:

```bash
touch /usr/local/bin/kubectl
echo '#!/bin/bash\nuds zarf tools kubectl "$@"' > /usr/local/bin/kubectl
chmod +x /usr/local/bin/kubectl
```

### Local Development
## Usage

- All pre-requisites listed in [Deployment Target](#deployment-target)
- [Docker](https://docs.docker.com/get-docker/) or [Podman](https://podman.io/getting-started/installation) for running, building, and pulling images
> [!IMPORTANT]
> This entire repository assumes that you have root access, and all scripts and actions are run as root. Use `sudo su` to activate a root shell.
## Usage
This section provides minimal context and instructions for quickly deploying the base UDS RKE2 capability. See the [DEVELOPMENT.md](docs/DEVELOPMENT.md) for instructions on how to further develop UDS RKE2.

### Virtual Machines

> [!CAUTION]
> Due to the the disk formatting operations, networking and STIG configurations that are applied to a node's host, it is highly recommended that the contents of this repository are not directly installed on a personal machine.
> Due to the the disk formatting and mount operations, networking and STIG configurations that are applied to a node's host, it is highly recommended that the contents of this repository are not directly installed on a personal machine.
The best way to test UDS RKE2 is to spin-up one or more nodes using a containerized method, such as virtual machines or networks.

[LeapfrogAI](https://github.com/defenseunicorns/leapfrogai), the main support target of this bundle, requires GPU passthrough to all worker nodes that will have a taint for attracting pods with GPU resource and workload requirements.

Please see the [VM setup documentation](./docs/VM.md) and VM setup scripts to learn more about manually creating development VM.

VM setup may not be necessary if using Longhorn or Local Path Provisioner, but it is highly recommended when using Rook-Ceph.
Please see the [VM setup documentation](./docs/VM.md) and VM setup scripts to learn more about manually creating development VM..

### Bundles

Expand All @@ -68,125 +55,88 @@ There are 3 main "flavors" of the UDS RKE2 Core bundle, with 4 distinct flavors
2. (WIP) [Longhorn](./docs/LONGHORN.md) + [MinIO](./docs/MINIO.md)
3. (WIP) [Rook-Ceph](./docs/ROOK-CEPH.md)

Each bundle can also be experimented with using the Zarf package creation and deployment commands via the UDS tasks outlined in the sections below.

### Packages
### Quick Start

See the [Configuration section](#configuration) for more details on each specific package in each of the bundle flavors.
The following are quick starts for the `local-path` flavored UDS RKE2 bundle. This does not include the optional NVIDIA GPU operator and LeapfrogAI workarounds Zarf packages.

### UDS Tasks
#### Latest

This repository uses [UDS CLI](https://github.com/defenseunicorns/uds-cli)'s built-in [task runner](https://github.com/defenseunicorns/maru-runner) to perform all actions required to run, develop, and publish the UDS RKE2 tech stack.

Run the following to see all the tasks in the main [`tasks.yaml`](./tasks.yaml), and their descriptions:
1. Change directory to the bundle and deploy the bundle:

```bash
uds run --list-all
```

#### Create

See the UDS [`create` tasks](./tasks/create.yaml) file for more details.

To create all packages and bundles, do the following:

```bash
# Login to Registry1
set +o history
export REGISTRY1_USERNAME="YOUR-USERNAME-HERE"
export REGISTRY1_PASSWORD="YOUR-PASSWORD-HERE"
echo $REGISTRY1_PASSWORD | zarf tools registry login registry1.dso.mil --username $REGISTRY1_USERNAME --password-stdin
set -o history

# Login to ghcr
set +o history
export GHCR_USERNAME="YOUR-USERNAME-HERE"
export GHCR_PASSWORD="YOUR-PASSWORD-HERE"
echo $GHCR_PASSWORD | zarf tools registry login ghcr.io --username $GHCR_USERNAME --password-stdin
set -o history

uds run create:all
# use `ifconfig` to identify the NETWORK_INTERFACE for L2 advertisement
uds run uds-rke2-local-path-core --set NETWORK_INTERFACE=eth0
```

#### Deploy

> [!NOTE]
> The pre-deployment setup of the host machine is storage solution-dependent, so be sure to check the documentation for the package flavor you are deploying: [`local-path`](./docs/LOCAL-PATH.md), [`longhorn`](./docs/LONGHORN.md), or [`rook-ceph`](./docs/ROOK-CEPH.md).
See the UDS [`deploy` tasks](./tasks/deploy.yaml) file for more details.

To deploy a bundle (e.g., UDS RKE2 bootstrap with `local-path` flavor), do the following:
2. Modify your `/etc/hosts` according to your base IP on the Istio Tenant gateway

```bash
# LATEST
uds run uds-rke2-local-path-core
# /etc/hosts

# DEV
uds run uds-rke2-local-path-core-dev
192.168.0.200 keycloak.admin.uds.dev grafana.admin.uds.dev neuvector.admin.uds.dev
192.168.0.201 sso.uds.dev
```

#### Publish

See the UDS [`publish` tasks](./tasks/publish.yaml) file for more details. Also see the `release` task in the main [`tasks.yaml`](./tasks.yaml).
#### Development

To publish all packages and bundles, do the following:
1. Login to GitHub Container Registry (GHCR) and [DoD's Registry1](https://registry1.dso.mil/):

```bash
# Login to GHCR
set +o history
export GHCR_USERNAME="YOUR-USERNAME-HERE"
export GHCR_PASSWORD="YOUR-PASSWORD-HERE"
echo $GHCR_PASSWORD | zarf tools registry login ghcr.io --username $GHCR_USERNAME --password-stdin
echo $GHCR_PASSWORD | uds zarf tools registry login ghcr.io --username $GHCR_USERNAME --password-stdin
set -o history

# if create:all was already run
uds run publish:all

# if create:all was not already run
uds run release
# Login to Registry1
set +o history
export REGISTRY1_USERNAME="YOUR-USERNAME-HERE"
export REGISTRY1_PASSWORD="YOUR-PASSWORD-HERE"
echo $REGISTRY1_PASSWORD | uds zarf tools registry login registry1.dso.mil --username $REGISTRY1_USERNAME --password-stdin
set -o history
```

#### Remove

Run the following to remove all Docker, Zarf and UDS artifacts from the host:
2. Build all necessary packages and then create and deploy the bundle

```bash
uds run setup:clean
# use `ifconfig` to identify the NETWORK_INTERFACE for L2 advertisement
uds run uds-rke2-local-path-core-dev --set NETWORK_INTERFACE=eth0
```

Run the following to completely destroy the UDS RKE2 node and all of UDS RKE2's artifacts from the node's host:
3. Modify your `/etc/hosts` according to your base IP on the Istio Tenant gateway

```bash
uds run setup:uds-rke2-destroy
```

#### Test

Run the following to run the E2E CI test(s):
# /etc/hosts

```bash
uds run uds-rke2-local-path-test
192.168.0.200 keycloak.admin.uds.local grafana.admin.uds.local neuvector.admin.uds.local
192.168.0.201 sso.uds.local
```

## Additional Info

Below are resources to explain some of the rationale and inner workings of the RKE2 cluster's infrastructure.
The following sub-sections outlines all of the configuration documentation, which includes additional information, optional Zarf packages, and customization options for each component of UDS RKE2.

### Configuration
### Base Infrastructure

- [Operating System Configuration](docs/OS.md)
- [RKE2-Specific Configuration](docs/RKE2.md)
- [Operating System](docs/OS.md)
- [RKE2-Specific](docs/RKE2.md)
- [UDS-RKE2 Infrastructure and Exemptions](docs/UDS-RKE2.md)
- [MinIO Configuration](docs/MINIO.md)
- [Rook-Ceph Configuration](docs/ROOK-CEPH.md)
- [Longhorn Configuration](docs/LONGHORN.md)
- [Hosts, DNS and TLS Configuration](docs/DNS-TLS.md)

### Flavor-Specific Infrastructure

- [Rook-Ceph](docs/ROOK-CEPH.md)
- [Longhorn](docs/LONGHORN.md)
- [Local Path Provisioner](docs/LOCAL-PATH.md)
- [Custom Zarf Init](docs/INIT.md)
- [MinIO](docs/MINIO.md)

### Application-Specific

- [UDS Core](UDS-CORE.md)
- [LeapfrogAI](docs/LEAPFROGAI.md)
- [LeapfrogAI Workarounds](docs/LEAPFROGAI.md)
- [NVIDIA GPU Operator](docs/NVIDIA-GPU-OPERATOR.md)

### Virtual Machine Setup and Testing

Expand All @@ -204,3 +154,4 @@ Below are resources to explain some of the rationale and inner workings of the R
- [RKE2 Zarf Init](https://github.com/defenseunicorns/zarf-package-rke2-init)
- [Zarf Longhorn Init](https://github.com/defenseunicorns/zarf-init-longhorn)
- [UDS Rook-Ceph Capability](https://github.com/defenseunicorns/uds-capability-rook-ceph)
- [UDS Nutanix SWF Bundle](https://github.com/defenseunicorns/uds-bundle-software-factory-nutanix/tree/main)
17 changes: 17 additions & 0 deletions bundles/dev/ca.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[req]
distinguished_name = req_distinguished_name
x509_extensions = v3_ca
prompt = no
default_bits = 4096
default_md = sha256

[req_distinguished_name]
CN = UDS RKE2 Root CA
O = Defense Unicorns
OU = UDS RKE2 Product Team

[v3_ca]
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid:always,issuer
basicConstraints = critical, CA:true, pathlen:0
keyUsage = critical, digitalSignature, cRLSign, keyCertSign
Loading

0 comments on commit 3b8ec24

Please sign in to comment.