Skip to content

Commit

Permalink
conf: update for gpu in job template (#301)
Browse files Browse the repository at this point in the history
The job template is updated to prefer a node with gpu. If gpu is not
available, other nodes are considered for scheduling.
  • Loading branch information
myungjin authored Jan 4, 2023
1 parent abc06e7 commit 56ccf64
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 2 deletions.
49 changes: 47 additions & 2 deletions docs/03-b-amzn2-gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stabl
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl

# install helm
HELM_VERSION=v3.10.2-linux-amd64
HELM_VERSION=v3.10.2
curl -LO https://get.helm.sh/helm-$HELM_VERSION-linux-amd64.tar.gz
tar -zxvf helm-$HELM_VERSION-linux-amd64.tar.gz
sudo mv linux-amd64/helm /usr/local/bin/helm
Expand Down Expand Up @@ -125,7 +125,52 @@ An output should look similar to:
}
```

### Step 3: Configuring addons
### Step 3: Install NVIDIA'S GPU feature discovery resources
More details are found [here](https://github.com/NVIDIA/gpu-feature-discovery).

Deploy Node Feature Discovery (NFD) as a daemonset.
```bash
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/gpu-feature-discovery/v0.7.0/deployments/static/nfd.yaml
```

Deploy NVIDIA GPU Feature Discovery (GFD) as a daemonset.
```bash
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/gpu-feature-discovery/v0.7.0/deployments/static/gpu-feature-discovery-daemonset.yaml
```

```bash
kubectl get nodes -o yaml
```
The above command will output something similar to the following:
```console
apiVersion: v1
items:
- apiVersion: v1
kind: Node
metadata:
...
labels:
...
nvidia.com/cuda.driver.major: "470"
nvidia.com/cuda.driver.minor: "57"
nvidia.com/cuda.driver.rev: "02"
nvidia.com/cuda.runtime.major: "11"
nvidia.com/cuda.runtime.minor: "4"
nvidia.com/gfd.timestamp: "1672792567"
nvidia.com/gpu.compute.major: "3"
nvidia.com/gpu.compute.minor: "7"
nvidia.com/gpu.count: "1"
nvidia.com/gpu.family: kepler
nvidia.com/gpu.machine: HVM-domU
nvidia.com/gpu.memory: "11441"
nvidia.com/gpu.product: Tesla-K80
nvidia.com/gpu.replicas: "1"
nvidia.com/mig.capable: "false"
...
...
```

### Step 4: Configuring addons
Next, `ingress` and `ingress-dns` addons need to be installed with the following command:
```bash
sudo minikube addons enable ingress
Expand Down
11 changes: 11 additions & 0 deletions fiab/helm-chart/control/job/job-agent.yaml.mustache
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,15 @@ spec:
- name: AWS_SECRET_ACCESS_KEY
value: {{ .Values.secretAccessKey }}
restartPolicy: Never

affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: "nvidia.com/gpu.count"
operator: Gt
values:
- "0"
<%={{ }}=%>
11 changes: 11 additions & 0 deletions fiab/helm-chart/deployer/job/job-agent.yaml.mustache
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,15 @@ spec:
- name: AWS_SECRET_ACCESS_KEY
value: {{ .Values.secretAccessKey }}
restartPolicy: Never

affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: "nvidia.com/gpu.count"
operator: Gt
values:
- "0"
<%={{ }}=%>

0 comments on commit 56ccf64

Please sign in to comment.