Update Error with Coral TPU Drivers

Hi everyone,

Joining the common frustration here heh

I'm new to Proxmox
I've have a m.2 coral dual TPU with the PCIE card from magic_smoke to make both TPUs accessible

I've been converting over to Proxmox after having one TPU working in frigate with 7 cameras for half a year

I think I'm close but I've run so many work arounds now I'm not sure what state my installs at

I am currently trying to get an unprivileged LXC to work with frigate
The LXC is ubuntu server 24.04

My pc;
  • B550M Steel Legend
  • 5700G AM4
  • 32gb ram

I am trying to allow the LXC to access the onboard GPU for acceleration and the Coral for inference

I followed this guide most recently
https://github.com/blakeblackshear/frigate/discussions/5773 but its for an Intel CPU and usb Coral
Also the forum post we are in now and many many others

From Proxmox
Code:
pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.12-1-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-1
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
amd64-microcode: 3.20240820.1~deb12u1
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.2
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.13-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.2-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

Grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt systemd.unified_cgroup_hierarchy=1 pcie_aspm=off"

I am not using Secure boot
Code:
mokutil --sb-state
SecureBoot disabled
Platform is in Setup Mode

Code:
ls -l /dev/dri
total 0
drwxr-xr-x 2 root   root         80 Sep  2 14:00 by-path
crw-rw---- 1 root   video  226,   1 Sep  2 14:00 card1
crw-rw-rw- 1 100000 111000 226, 128 Sep  2 14:00 renderD128

My LXC config is
Code:
arch: amd64
cores: 4
features: nesting=1
hostname: secure
memory: 12128
nameserver: 192.168.2.11
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.2.1,hwaddr=BC:24:11:3A:E2:A8,ip=192.168.2.70/24,type=veth
net1: name=eth1,bridge=vmbr1,firewall=1,gw=192.168.3.1,hwaddr=BC:24:11:48:80:B3,ip=192.168.3.1/24,type=veth
ostype: ubuntu
rootfs: local-lvm:vm-105-disk-0,size=20G
swap: 2048
unprivileged: 1
lxc.cgroup2.devices.allow: c 120:0 rwm #coral 1
lxc.cgroup2.devices.allow: c 120:1 rwm #coral 2
lxc.cgroup2.devices.allow: c 226:0 rwm #igpu
lxc.cgroup2.devices.allow: c 226:128 rwm #igpu
lxc.mount.entry: /dev/apex_0 dev/apex_0 none bind,optional,create=file
lxc.mount.entry: /dev/apex_1 dev/apex_1 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file 0,0 # iGPU (u=root g=render)
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.hook.pre-start: sh -c "chown 100000:111000 /dev/dri/renderD128" # create a host gid for lxc_gpu_shares

This is passing through both apex tpus, the gpu for video acceleration and two network cards

I don't have anything blacklisted
I've installed gasket from LiloBzH's comment and changed to the newer git repo from KyleGospo https://github.com/KyleGospo/gasket-dkms

Code:
lsmod | grep apex
apex                   28672  0
gasket                126976  1 apex

Code:
dmesg |grep gasket
[    4.488791] gasket: loading out-of-tree module taints kernel.
[    4.488798] gasket: module verification failed: signature and/or required key missing - tainting kernel

Not sure what to do with that last message it seems to still work possibly? but I'm not sure there's a github issue for it that seems to just ignore it in the end as the person is using a usb coral

Code:
modinfo gasket
filename:       /lib/modules/6.8.12-1-pve/updates/dkms/gasket.ko
author:         Rob Springer <rspringer@google.com>
license:        GPL v2
version:        1.1.4
description:    Google Gasket driver framework
import_ns:      DMA_BUF
srcversion:     EADC63F50EF98E8414DE268
depends:      
retpoline:      Y
name:           gasket
vermagic:       6.8.12-1-pve SMP preempt mod_unload modversions
sig_id:         PKCS#7
signer:         DKMS module signing key
sig_key:        7D:90:0A:1F:F3:EA:5A:AA:D8:EA:BC:D2:7A:30:B3:54:BA:08:0F:CF
sig_hashalgo:   sha512
signature:      62:95:43:51:83:F9:79:E5:41:A9:89:DB:ED:41:45:0B:51:26:A8:6D:
                BE:6C:28:E7:E4:0A:0F:C4:10:C2:62:CF:2D:89:25:BE:3C:4F:69:06:
                19:CF:FE:5A:7E:30:FF:02:1C:D1:3C:F1:84:8E:19:8D:8B:F5:9E:21:
                B4:3D:5C:0C:DC:DA:2C:CE:F2:B8:2E:AD:38:0B:86:97:FD:6F:A0:F3:
                40:BE:7A:FA:50:CD:C4:05:70:15:3F:B4:B2:7C:E2:33:F3:42:F9:B1:
                76:4E:90:EA:EC:A7:2C:6B:ED:D7:E7:E8:28:15:72:AD:7B:8B:3E:23:
                A2:DB:CF:FE:55:C4:86:41:DD:8A:44:01:FA:15:89:47:E3:8E:C0:73:
                CE:70:B5:78:E2:38:82:42:0F:65:FA:46:90:52:3F:27:48:A1:B7:93:
                08:8F:C9:E3:8E:FE:20:34:09:8C:E9:04:A2:9D:F0:3E:98:6E:3F:64:
                FD:E8:09:98:17:A7:F2:73:B4:12:CA:6D:2D:1E:56:4B:9E:D7:6B:11:
                64:80:30:2D:EB:72:01:7C:9B:A7:F3:EC:9E:F3:81:EC:CB:26:24:6B:
                13:DD:A1:54:B8:D4:AD:D2:FA:1C:33:1E:55:7B:43:FB:76:19:E8:0E:
                F6:42:55:B5:57:73:92:A1:E4:26:63:98:61:2B:C0:0E
parm:           dma_bit_mask:int


Code:
ls -la /dev/apex*
crwxr-xr-x 1 root apex 120, 0 Sep  2 14:00 /dev/apex_0
crwxr-xr-x 1 root apex 120, 1 Sep  2 14:00 /dev/apex_1

Code:
lspci -nn | grep 089a
05:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
06:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]

Code:
tail /etc/udev/rules.d/65-apex.rules
SUBSYSTEM=="apex", MODE="0755", GROUP="apex"

Code:
lspci
05:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
06:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU
0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c8)
0a:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller

I separated all the PCIe groups earlier when i tried this as a normal VM to passthrough, I'm not sure if that config is needed anymore in grub and bios

From the LXC

Code:
uname -r
6.8.12-1-pve

Code:
ls -l /dev
total 0
crwxr-xr-x 1 nobody nogroup 120, 0 Sep  2 04:00 apex_0
crwxr-xr-x 1 nobody nogroup 120, 1 Sep  2 04:00 apex_1
drwxr-xr-x 3 nobody nogroup    100 Sep  2 04:00 dri

Is that an issue that they are assigned to nobody and nogroup?

Code:
lsmod | grep apex
apex                   28672  0
gasket                126976  1 apex

Code:
sudo dmesg |grep gasket
dmesg: read kernel buffer failed: Operation not permitted
Not sure why this happens perhaps as the container is unprivileged?

Code:
lspci -nn | grep 089a
05:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]
06:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]

I've just tried to install the gasket driver in the LXC but that doesnt get past the debuild -us -uc -tc -b part as it needs the headers and you cant have the headers in the LXC

Code:
sudo modinfo gasket
modinfo: ERROR: Module gasket not found.

Code:
vainfo
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
error: can't connect to X server!
libva info: VA-API version 1.20.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_20
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: Mesa Gallium driver 24.0.9-0ubuntu0.1 for AMD Radeon Graphics (radeonsi, renoir, LLVM 17.0.6, DRM 3.57, 6.8.12-1-pve)
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSlice
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile2            : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc

I've got docker running successfully in the LXC, I'm also running mosquitto and portainer in the same container

Frigate is currently constantly rebooting saying it cant find the coral

docker config is pretty stock standard
Code:
services:
  frigate:
    container_name: frigate
    privileged: true # this may not be necessary for all setups
    restart: unless-stopped
    image: ghcr.io/blakeblackshear/frigate:stable
    shm_size: "850mb" # update for your cameras based on calculation above
    devices:
      #- /dev/bus/usb:/dev/bus/usb # passes the USB Coral, needs to be modified for other versions
      - /dev/apex_0:/dev/apex_0
      - /dev/apex_1:/dev/apex_1
      - /dev/dri/renderD128:/dev/dri/renderD128 # for intel hwaccel, needs to be updated for your hardware
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /home/frodo/scripts/docker/frigate:/db  
      - /home/frodo/scripts/docker/frigate/config/frigate.yml:/config/config.yml
      - /home/frodo/scripts/docker/frigate/config/go2rtc:/config/go2rtc
      - /mnt/CamFootage:/media/frigate
      - type: tmpfs # Optional: 1GB of memory, reduces SSD/SD Card wear
        target: /tmp/cache
        tmpfs:
          size: 1000000000
    networks:
      - eth1
    ports:
      - "5001:5000"
      - "1935:1935" # RTMP feeds
      - "8554:8554" # RTSP feeds
      - "8555:8555/tcp" # WebRTC over tcp
      - "8555:8555/udp" # WebRTC over udp
    environment:
      FRIGATE_RTSP_PASSWORD: "password"
      LIBVA_DRIVER_NAME: "radeonsi"
networks:
  eth1:

Im using the latest version of go2rtc which was downloaded through git

Frigate config with almost all redacted parts as i ran out of space in the post, i have to comment out the second tpu when it was on bare metal as it couldnt detect it i could swap to either but not both at the same time, seems to be a common issue even with the PCIE adapter unfortuantely
Code:
ffmpeg:
  hwaccel_args: preset-vaapi
  output_args:
    record: preset-record-generic-audio-aac

detectors:
  coral1:
    type: edgetpu
    device: pci:0
#  coral2:
#    type: edgetpu
#    device: pci:1

Frigate errors
Code:
2024-09-02 05:24:12.043121495  [WARN] Using go2rtc binary from '/config/go2rtc' instead of the embedded one
2024-09-02 05:24:12.043125352  [INFO] Starting go2rtc...
2024-09-02 05:24:12.106608299  05:24:12.106 INF go2rtc platform=linux/amd64 revision=a4885c2 version=1.9.4
2024-09-02 05:24:12.106623487  05:24:12.106 INF config path=/dev/shm/go2rtc.yaml
...
2024-09-02 05:24:12.826444683  [2024-09-02 05:24:12] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as pci:0
2024-09-02 05:24:12.826452277  [2024-09-02 05:24:12] frigate.detectors.plugins.edgetpu_tfl ERROR   : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
2024-09-02 05:24:12.826453720  Process detector:coral1:
2024-09-02 05:24:12.826454602  Traceback (most recent call last):
2024-09-02 05:24:12.826456004    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
2024-09-02 05:24:12.826456956      delegate = Delegate(library, options)
2024-09-02 05:24:12.826457798    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
2024-09-02 05:24:12.826472746      raise ValueError(capture.message)
2024-09-02 05:24:12.826473487  ValueError
2024-09-02 05:24:12.826487964
2024-09-02 05:24:12.826488796  During handling of the above exception, another exception occurred:
2024-09-02 05:24:12.826489407
2024-09-02 05:24:12.826489918  Traceback (most recent call last):
2024-09-02 05:24:12.826490659    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-09-02 05:24:12.826501820      self.run()
2024-09-02 05:24:12.826502622    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-09-02 05:24:12.826503283      self._target(*self._args, **self._kwargs)
2024-09-02 05:24:12.826504105    File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
2024-09-02 05:24:12.826504876      object_detector = LocalObjectDetector(detector_config=detector_config)
2024-09-02 05:24:12.826505557    File "/opt/frigate/frigate/object_detection.py", line 53, in __init__
2024-09-02 05:24:12.826506149      self.detect_api = create_detector(detector_config)
2024-09-02 05:24:12.826506830    File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
2024-09-02 05:24:12.826520125      return api(detector_config)
2024-09-02 05:24:12.826520976    File "/opt/frigate/frigate/detectors/plugins/edgetpu_tfl.py", line 41, in __init__
2024-09-02 05:24:12.826521668      edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
2024-09-02 05:24:12.826522519    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
2024-09-02 05:24:12.826523241      raise ValueError('Failed to load delegate from {}\n{}'.format(
2024-09-02 05:24:12.826523892  ValueError: Failed to load delegate from libedgetpu.so.1.0
This config has been working for around 6 months for my cameras when i had it on a bare metal install before trying Proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!