ProxMox 7+ Home Assistant Full + Frigate + 2x Mini PCIe Coral TPUs = /dev/apex_0 not found! #4485
-
I've got Home Assistant Full installed in a ProxMox VM and I'm attempting to use the Frigate Add-on (tried both Frigate NVR and the "Full Access" version) + Frigate Integration + HACS Frigate Integration. But I'm stuck at the point of "/dev/apex_0 not found". Originally I followed this HAOS Install Guide to create the Home Assistant VM. After struggling to get the TPUs working, I read there were problems with PCIe passthrough on i440fx though, and so I rebuilt VM following a modified version of this guide. It didn't help. I've also turned off secure boot (#407) , and of course I have turned on IOMMU (Smart Passthrough VT-d (and also Intel Virtualization Technology) in the Asus bios. I followed the instructions on the Coral site, Get started with the M.2 or Mini PCIe Accelerator | Coral, to add the debian package, install the PCIe driver and Edge TPU runtime packages, etc... I had a built-in version of Apex originally, so I blacklisted it before installing the new one and then later went back and removed the blacklisting. I've also installed PyCoral and TensorFlow light) just for the heck of it, but given that there's something wrong with the ls /dev/apex_0 and apex_1 not appearing, it of course fails when I try to run the test. So I throw myself at the mercy of you fine folks and share what I'm seeing as it seems to be unique... maybe one of you can shed some light on a potential solution. Here's the relevent hardware I'm using:
Virtual Machine:
Here's the info which I think might be requested (run on the ProxMox host): root@pve:~# uname -a
Linux pve 5.15.74-1-pve #1 SMP PVE 5.15.74-1 (Mon, 14 Nov 2022 20:17:15 +0100) x86_64 GNU/Linux
root@pve:~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
root@pve:~# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 39 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Model name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Stepping: 3
CPU MHz: 4000.000
CPU max MHz: 4200.0000
CPU min MHz: 800.0000
BogoMIPS: 7999.96
Virtualization: VT-x
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 1 MiB
L3 cache: 8 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Vulnerable: No microcode
Vulnerability Tsx async abort: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmpe
rf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ibrs ibpb st
ibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdsee
d adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp
root@pve:~# lspci -nnk
...
02:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
Subsystem: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
Kernel driver in use: vfio-pci
Kernel modules: apex
03:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
Subsystem: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
Kernel driver in use: vfio-pci
Kernel modules: apex
...
root@pve:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/1/devices/0000:02:00.0
/sys/kernel/iommu_groups/12/devices/0000:03:00.0
root@pve:~# dmesg | grep 2:00
[ 0.242964] pci 0000:02:00.0: [1ac1:089a] type 00 class 0x0000ff
[ 0.242981] pci 0000:02:00.0: reg 0x10: [mem 0xd2400000-0xd2403fff 64bit pref]
[ 0.242991] pci 0000:02:00.0: reg 0x18: [mem 0xd2300000-0xd23fffff 64bit pref]
[ 0.311489] pci 0000:02:00.0: Adding to iommu group 1
[ 13.449916] vfio-pci 0000:02:00.0: enabling device (0000 -> 0002)
[ 14.480070] vfio-pci 0000:02:00.0: vfio_ecap_init: hiding ecap 0x1e@0x110
root@pve:~# dmesg | grep 3:00
[ 0.251784] pci 0000:03:00.0: [1ac1:089a] type 00 class 0x0000ff
[ 0.251806] pci 0000:03:00.0: reg 0x10: [mem 0xd2200000-0xd2203fff 64bit pref]
[ 0.251820] pci 0000:03:00.0: reg 0x18: [mem 0xd2100000-0xd21fffff 64bit pref]
[ 0.311498] pci 0000:03:00.0: Adding to iommu group 12
[ 14.481661] vfio-pci 0000:03:00.0: enabling device (0000 -> 0002)
[ 15.504139] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1e@0x110
root@pve:~# dpkg -l | grep gasket
ii gasket-dkms 1.0-18 all DKMS source for the gasket driver
root@pve:~# modinfo gasket
filename: /lib/modules/5.15.74-1-pve/updates/dkms/gasket.ko
author: Rob Springer <[email protected]>
license: GPL v2
version: 1.1.4
description: Google Gasket driver framework
srcversion: 2CA68DA0268ABC8C7117109
depends:
retpoline: Y
name: gasket
vermagic: 5.15.74-1-pve SMP mod_unload modversions
parm: dma_bit_mask:int
root@pve:~# dpkg -l | grep edgetpu
ii libedgetpu1-std:amd64 16.0 amd64 Support library for Edge TPU
root@pve:~# modinfo apex
filename: /lib/modules/5.15.74-1-pve/updates/dkms/apex.ko
author: John Joseph <[email protected]>
license: GPL v2
version: 1.2
description: Google Apex driver
srcversion: 700E8BBBE9CC23C6EC17712
alias: pci:v00001AC1d0000089Asv*sd*bc*sc*i*
depends: gasket
retpoline: Y
name: apex
vermagic: 5.15.74-1-pve SMP mod_unload modversions
parm: allow_power_save:int
parm: allow_sw_clock_gating:int
parm: allow_hw_clock_gating:int
parm: bypass_top_level:int
parm: trip_point0_temp:int
parm: trip_point1_temp:int
parm: trip_point2_temp:int
parm: hw_temp_warn1:int
parm: hw_temp_warn2:int
parm: hw_temp_warn1_en:bool
parm: hw_temp_warn2_en:bool
parm: temp_poll_interval:int
root@pve:~# ls /dev/apex_0
ls: cannot access '/dev/apex_0': No such file or directory
root@pve:~# lsmod | grep apex
^^^returns no output. The two TPUs show up as PCI devices I can and did pass through: Here's my very basic starter frigate.yml mqtt:
host: 10.0.20.10
user: mqtt-user
password: <password_removed>
detectors:
coral0:
type: edgetpu
device: pci:0
coral1:
type: edgetpu
device: pci:1
cameras:
front_porch:
rtmp:
enabled: False
ffmpeg:
inputs:
- path: rtsp://admin:<password_removed>@10.0.30.251:554/cam/realmonitor?channel=1&subtype=1
roles:
- detect
- path: rtsp://admin:<password_removed>@10.0.30.251:554/cam/realmonitor?channel=1&subtype=0
roles:
- record
detect:
width: 704
height: 480
fps: 5 Starting the Frigate Add-On generates the following errors (this is the relevent excerpt)... Note: if I remove the 'detectors' section of the frigate.yml, it starts up just fine and I can see video from my camera, but of course it's using the CPU and not the TPUs.
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 5 replies
-
Were you able to get your Coral PCIe to work with kernel 5.15? |
Beta Was this translation helpful? Give feedback.
-
I had this issue for quite some time, so I posted a description of what I did to make it work here. The issue of Coral not being detected was solved by dialing in the three files: You can try copying mine word-for-word, though you may not need or want the splitting of the iGPU, in which case you could probably leave out the gvt related arguments. I'm using the 'stronger' half of the iGPU for the ubuntu/frigate VM, and the weaker half for the host. |
Beta Was this translation helpful? Give feedback.
-
Hi @monomolecular , my setup is a little different. I'm running proxmox 7.2 and passthrough the M.2 PCIe Coral TPU to a VM with Ubuntu 20.04 (The TPU is meant to be used with Shinobi instead of Frigate) On proxmox my /etc/modprobe.d/blacklist-apex.conf looks like: ` options vfio-pci ids=1ac1:089a On the Ubuntu VM I have followed the instruction from the coral website. gasket and apex are properly loaded by the kernel. But dmesg get flooded with the following messages and it appears the it happens exactly when any program tries to access the /dev/apex_0 device.
ls -l /dev/apex* I have tried everything... I'm out of ideas.... |
Beta Was this translation helpful? Give feedback.
-
Sorry if I ask here, but will a Mini PCI-E to USB adapter like the one below would work to run a Dual Edge TPU ? |
Beta Was this translation helpful? Give feedback.
-
hi I get this error when I try to go a ls /dev/apex_0: |
Beta Was this translation helpful? Give feedback.
I had this issue for quite some time, so I posted a description of what I did to make it work here.
The issue of Coral not being detected was solved by dialing in the three files:
/etc/default/grub
/etc/modules
/etc/modprobe.d/blacklist-apex.conf
You can try copying mine word-for-word, though you may not need or want the splitting of the iGPU, in which case you could probably leave out the gvt related arguments. I'm using the 'stronger' half of the iGPU for the ubuntu/frigate VM, and the weaker half for the host.