Skip to content
This repository has been archived by the owner on Jan 24, 2023. It is now read-only.

Cannot program the FPGA #96

Open
ravicorning opened this issue Mar 11, 2021 · 6 comments
Open

Cannot program the FPGA #96

ravicorning opened this issue Mar 11, 2021 · 6 comments

Comments

@ravicorning
Copy link

Hi

Using this API to program the binary in the FPGA:
kubectl rsu program -f <signed_RTL_image> -n -d <RSU_PCI_bus_function_id>

The documentation doesn't specify the arguments clearly, so assume signed_RTL_image is the name of the .bin file. Hostname, for some reason, the API doesn't seem to work with the node name k8S controller sees, have to use the IP address. The PCI bus id is the what i get doing "lspci |grep acc". Running this i see the, the fpga-opae.. container is in a pending state, complains about mismatch in the nodeselector, is there a corrsponding .yml i can modify to remove this filtering..?

[root@corningopenness opt]# kubectl describe pods fpga-opae-10.12.87.80-0b30-xbrx2
Name: fpga-opae-10.12.87.80-0b30-xbrx2
Namespace: default
Priority: 0
Node:
Labels: controller-uid=8a798e1c-5a1e-43d9-bb37-a9049458d61f
job-name=fpga-opae-10.12.87.80-0b30
Annotations:
Status: Pending
IP:
IPs:
Controlled By: Job/fpga-opae-10.12.87.80-0b30
Containers:
fpga-opae:
Image: fpga-opae-pacn3000:1.0
Port:
Host Port:
Command:
sudo
-E
/bin/bash
-c
--
Args:
./check_if_modules_loaded.sh && fpgasupdate /root/images/20ww27.5-2x2x25G-5GLDPC-v1.6.1-3.0.0-unsigned.bin 0b30 && rsu bmcimg 0b30
Environment:
PYTHONIOENCODING: utf-8
Mounts:
/root/images from image-dir (rw)
/sys/devices from class (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-fzdsz (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
class:
Type: HostPath (bare host directory volume)
Path: /sys/devices
HostPathType:
image-dir:
Type: HostPath (bare host directory volume)
Path: /temp/vran_images
HostPathType:
default-token-fzdsz:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-fzdsz
Optional: false
QoS Class: BestEffort
Node-Selectors: kubernetes.io/hostname=10.12.87.80
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Warning FailedScheduling 58m default-scheduler 0/2 nodes are available: 2 node(s) didn't match node selector.
Warning FailedScheduling 58m default-scheduler 0/2 nodes are available: 2 node(s) didn't match node selector

@aniket-intel
Copy link

aniket-intel commented Mar 11, 2021

The node selector values are not matching because you have run the program command with IP. Please try running it with the node hostname. This will he the same as the one in /etc/hostname of the node.

As for the kubectl rsu command, please execute kubectl rsu discover prior to the program command and copy the signed or unsigned image name from the output of that command and also the device ID. Consequently, paste the values in the kubectl rsu program command.

@ravicorning
Copy link
Author

Ok, sure, can you tell me which one is the device here 8086:0b30 or 54:00:0
[root@corningopenness ravi]# kubectl rsu discover -n 10.12.87.80

Available RTL images:
[email protected]'s password:

Mar 10 44M 20ww27.5-2x2x25G-5GLDPC-v1.6.1-3.0.0-unsigned.bin

FPGA devices installed:

[email protected]'s password:
54:00.0 Processing accelerators [1200]: Intel Corporation Device [8086:0b30]
Subsystem: Intel Corporation Device [8086:0000]
Kernel driver in use: intel-fpga-pci
Kernel modules: intel_fpga_pci

@ravicorning
Copy link
Author

Was able to download the image to FPGA, was able to configure the VFs in the FPGA, but cannot see it in available resources, any idea..what might be the problem ?, it says the resources should map to ConfigMap.yml for device plugin, but where is correlated with the bb_config helm chart provisioning ?

[root@corningopenness helm-charts]# kubectl get node opennesswkn-1 -o json | jq '.status.allocatable'
{
"cpu": "46",
"devices.kubevirt.io/kvm": "110",
"devices.kubevirt.io/tun": "110",
"devices.kubevirt.io/vhost-net": "110",
"ephemeral-storage": "96589578081",
"hugepages-1Gi": "20Gi",
"intel.com/intel_sriov_netdevice": "12",
"memory": "110455600Ki",
"pods": "110"
}

[root@corningopenness helm-charts]# kubectl logs intel-fpga-cfg-opennesswkn-1-jwlpn
ERROR: Section (FLR) or name (flr_time_out) is not valid.
FEC FPGA RTL v3.0
UL.DL Weights = 3.3
UL.DL Load Balance = 128.128
Queue-PF/VF Mapping Table = READY
Ring Descriptor Size = 256 bytes

--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| PF | VF0 | VF1 | VF2 | VF3 | VF4 | VF5 | VF6 | VF7 |
--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
UL-Q'00 | | X | | | | | | | |
UL-Q'01 | | X | | | | | | | |
UL-Q'02 | | X | | | | | | | |
UL-Q'03 | | X | | | | | | | |
UL-Q'04 | | X | | | | | | | |
UL-Q'05 | | X | | | | | | | |
UL-Q'06 | | X | | | | | | | |
UL-Q'07 | | X | | | | | | | |
UL-Q'08 | | X | | | | | | | |
UL-Q'09 | | X | | | | | | | |
UL-Q'10 | | X | | | | | | | |
UL-Q'11 | | X | | | | | | | |
UL-Q'12 | | X | | | | | | | |
UL-Q'13 | | X | | | | | | | |
UL-Q'14 | | X | | | | | | | |
UL-Q'15 | | X | | | | | | | |
UL-Q'16 | | | X | | | | | | |
UL-Q'17 | | | X | | | | | | |
UL-Q'18 | | | X | | |

@ravicorning
Copy link
Author

ravicorning commented Mar 11, 2021 via email

@aniket-intel
Copy link

Run the kubectl rsu discover command with hostname.

Also, the device ID here is 54:00.0. Only once the FPGA card is configured properly, it will show in the list of allocable resources.

@ravicorning
Copy link
Author

I got this working after reinstalliing the worker node after programing the FPGA.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants