spike: NVIDIA GPU operator Zarf package #321

justinthelaw · 2024-03-26T14:19:04Z

LFAI delivery requires a production-ready NVIDIA GPU operator Zarf package that will bootstrap a containerized version of the necessary NVIDIA CUDA drivers, container toolkit, feature discovery and device plugin components to enable generative AI and ML applications to use NVIDIA GPUs from a Kubernetes cluster.

How do I prepare an air-gappable Zarf package that contains the NVIDIA GPU operator?
How do I setup the NVIDIA GPU operator to be configurable at deploy time?
- Multi-instance GPU (logical separation of GPU resources)?
- Time slicing (shared GPU loading and usage)?
- Distributed node resource load balancing configuration?
How and where do I consistently test this on K3D to make sure it works?
How and where do I consistently test this on RKE2 to make sure it works?
How do I integrate this back into the LFAI infrastructure UDS bundle in issue #317

See additional NVIDIA GPU operator context here: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html

justinthelaw · 2024-03-26T14:19:38Z

Some Defense Unicorns related resources:

YrrepNoj · 2024-04-04T18:17:47Z

Commenting for personal tracking- Part of this spike should involve evaluating creating our own version of this repo/container that we publish from our org to use.

justinthelaw · 2024-06-18T16:16:28Z

This will be tracked via the following PR: justinthelaw/uds-rke2#39

justinthelaw · 2024-07-11T18:33:32Z

PR in previous comment is the tracking PR that is tied to a Delivery issue.

justinthelaw self-assigned this Mar 26, 2024

justinthelaw added the tech-debt Not a feature, but still necessary label Mar 26, 2024

justinthelaw assigned gscallon Apr 10, 2024

justinthelaw assigned vanakema Jul 11, 2024

justinthelaw closed this as completed Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spike: NVIDIA GPU operator Zarf package #321

spike: NVIDIA GPU operator Zarf package #321

justinthelaw commented Mar 26, 2024 •

edited

Loading

justinthelaw commented Mar 26, 2024 •

edited

Loading

YrrepNoj commented Apr 4, 2024

justinthelaw commented Jun 18, 2024

justinthelaw commented Jul 11, 2024

spike: NVIDIA GPU operator Zarf package #321

spike: NVIDIA GPU operator Zarf package #321

Comments

justinthelaw commented Mar 26, 2024 • edited Loading

justinthelaw commented Mar 26, 2024 • edited Loading

YrrepNoj commented Apr 4, 2024

justinthelaw commented Jun 18, 2024

justinthelaw commented Jul 11, 2024

justinthelaw commented Mar 26, 2024 •

edited

Loading

justinthelaw commented Mar 26, 2024 •

edited

Loading