IOMMU / VT-d Error: ()TASK ERROR: Cannot open iommu_group: No such file or directory

Aussi

Member
Jan 20, 2023
42
0
6
Hi All,

Recently I've started up an VM with Ubuntu server on it.
I wanted to share the GPU with it so that it can be used for Frigate purposes.

Sadly enough when I tried to pass through the Intel HD Graphics 5000 I get the error as stated in the topic title.

Digging in some stuff I find the following:
No IOMMU groupnumber for the intel card:
1682016082311.png

No corresponding group found when running:
find /sys/kernel/iommu_groups/ -type l
1682016214029.png

Checked the BIOS and VT-d seems to be enabled properly. Also I've disabled legacy boot and the BIOS.
1682016422858.png

Hope to get this to work otherwise my current Intel Nuc5i3ryh will be kind of use less...
 

Attachments

  • 1682016167819.png
    1682016167819.png
    28.2 KB · Views: 14
Last edited:
Oh dear. That's a real shame than :( ...
Lot's of devices do not work (well) with passthrough, especially integrated graphics because it also requires main memory (and drivers don't expect them to be separate from a CPU). Did you find some information on the internet that this particular device could be passed through successfully?
 
Alright thanks, uitroepteken or vraagteken ;)
Will be playing with that tomorrow than !
 
@leesteken , would it help me if I go the route of an LXC container?

I read stuff about that the pass through is not necessary and that the GPU can be "shared" ?
 
Alright guys. I've made a unprivileged LXC container instead in the hope I can make use of the GPU.

I seem to be almost there....

but when I try to check if I can execute
intel_gpu_top

I get the following output:

root@docker:/opt# intel_gpu_top
Failed to initialize PMU! (Permission denied)


I've shared the following stuff to the lxc container.
1682174824742.png

And the output of the vainfo is as followed:
1682174863457.png


Am I missing something ??
 
Last edited:
I updated my 6.2.6-1-pve kernel to the latest 6.2.9-1-pve today and it broke intel_gpu_top with a similar error: Failed to initialize PMU! (No such file or directory)

I went back to 6.2.6-1 and it works again.
 
I updated my 6.2.6-1-pve kernel to the latest 6.2.9-1-pve today and it broke intel_gpu_top with a similar error: Failed to initialize PMU! (No such file or directory)

I went back to 6.2.6-1 and it works again.
Edit:

Seems that I'm running the default stuff , so that problably wont be the problem :( : / :
proxmox-ve: 7.4-1 (running kernel: 5.15.104-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-1
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
root@sandersve:~#
 
Last edited:
I just reread your earlier post. The intel_gpu_top command does not work within containers even when passthrough is all working correctly. Run it on the host.
 
I just reread your earlier post. The intel_gpu_top command does not work within containers even when passthrough is all working correctly. Run it on the host.
Running it on the PVE server/host works ok , I can see the "graphs" but not that the video part is being used...
Therefore I tried running it in the LXC container....

And therefore I tried running it in the container because I'm running docker in it.
And also Frigate as w docker container where I want to share the GPU to.

But also in the Frigate logs I see that the GPU isn't shared.
So therefore I thought 1+1=2...

Maybe you got any other suggestions?
 
Last edited:
Cool so it sounds like you've got the GPU working ok on the host. This is the guide I followed for GPU passthrough to an unprivileged container for Plex: https://gist.github.com/packerdl/a4887c30c38a0225204f451103d82ac5

From a quick look at your posted container conf, it looks like you've done all of that. Try an ls on /dev/dri/renderD128 from the container to see if it's showing the required group and that the group has rw access. Mine shows as: crw-rw---- 1 nobody accodrender 226, 128 Apr 23 13:42 renderD128

If you're completely stuck, try the opt in 6.2 kernel. I needed that to get my GPU working on my host so I doubt it will help you but worth a try given it's quick.

Remember to test the passthrough by actually trying to use it in the container rather than running intel_gpu_top. As I said earlier, that doesn't work in the container even with it all setup correctly (as root or as the user with group access):

Bash:
$ intel_gpu_top
Failed to initialize PMU! (Permission denied)
 
Last edited:
Thanks for the reply, really appreciated !

I tried to follow this page:
https://yoursunny.com/t/2022/lxc-vaapi/

And I don't have a separate user on the docker lxc container, always using root :)

Running the command gives me the following:

root@docker:~# ls -l /dev/dri total 0 drwxr-xr-x 2 nobody nogroup 80 Apr 21 17:27 by-path crw-rw-rw- 1 nobody nogroup 226, 0 Apr 21 17:27 card0 crw-rw-rw- 1 nobody render 226, 128 Apr 21 17:27 renderD128

So I change the permission of renderD128 to the render group because if the set to nogroup.
This was stated in the mentioned url above.
But seems to be no success :(

The frigate logs state the following:
2023-04-26 11:22:53.122695396 [2023-04-26 11:22:53] frigate.util ERROR : Unable to poll intel GPU stats: Failed to initialize PMU! (Operation not permitted)

Looking on the frigate webpage it does detect some kind of gpu / hardware accel. (intel-vaapi)
But I think it still lacks permission:
1682509329215.png

Also followed the frigate page on how to build up the docker compose file and setting that container to privileged but also not working :(

Btw do you have an example of usages to check whether it's available in the container?
 
Last edited:
I haven't used it as root in my container so I don't know if that works. Try adding root to the render group - I know you shouldn't normally need to do that when using the root account but it might help in a container. If no luck, I would try getting it to work with a normal user. I did not need to create renderD128 or change its permissions.

Aside from Plex, my usage has been with ffmpeg. My container is Ubuntu 22.04. The ubuntu version of ffmpeg did/does not support qsv so I installed the jellyfin version (though I had to fix dependencies hence the apt command):
Bash:
dpkg -i jellyfin-ffmpeg6_6.0-1-jammy_amd64.deb
apt --fix-broken install
dpkg -i  jellyfin-ffmpeg6_6.0-1-jammy_amd64.deb
/usr/lib/jellyfin-ffmpeg/ffmpeg -encoders | grep qsv

I could then convert a test video with (not as root but as my user account with the passthrough permission on renderD128):
Bash:
/usr/lib/jellyfin-ffmpeg/ffmpeg -hwaccel qsv -nostdin -i ~/test.mp4 -c:v hevc_qsv -preset slow -global_quality 22 -look_ahead 1 -map 0 -max_muxing_queue_size 9999 ./output.mkv

The host would then show activity in intel_gpu_top

It would be quite a bit of hassle but you could try creating an Ubuntu 22.04 unpriv container, following the earlier passthrough guide for plex (but with a different user account) and then doing the above with ffmpeg (as that normal user). If that works then it may help you get it working with frigate.
 
The Frigate usage stats error is shown because it tries to run intel_gpu_top and that fails because as mentioned this command doesn't work in LXC containers (definitely not in an unprivileged LXC container, and I don't think it works in a privileged container either.) However, this doesn't affect Frigate using the GPU once it is able to do so.

In the `ls` output in an earlier post, the /dev/dri/card0 devices is owned by nogroup which will prevent Frigate from using the container. Check the lxc.cgroup2.devices.allow and lxc.idmap settings in the LXC container's conf file. They are notoriously fiddly to get right in my experience, but at least once they work you should not have to change them again.

You can't chown the device from within the Docker or LXC containers to resolve this access issue - it must be fixed from outside the LXC container.

This output is from my working setup, starting from within the LXC container:
Bash:
lxc$ docker exec -it frigate /bin/bash
root@frigate:~# ls -l /dev/dri
total 0
crw-rw---- 1 nobody video 226,   0 Jun  5 21:37 card0
crw-rw---- 1 nobody   105 226, 128 Jun  5 20:55 renderD128
root@frigate:~# ls -ln /dev/dri
total 0
crw-rw---- 1 65534  44 226,   0 Jun  5 21:37 card0
crw-rw---- 1 65534 105 226, 128 Jun  5 20:55 renderD128
root@frigate:~# id
uid=0(root) gid=0(root) groups=0(root),44(video),105
Note that root is a member of groups 44 and 105. I needed to add --group-add 44,105 to the docker run command to add those groups to root within the container. Even though you are root within the Docker container, using an unprivileged LXC container means that you are not subject to root's normal full access rights, thus the root user needs also be a member of the groups of the GPU devices.

Once the permissions and groups are right, then test whether root can read from the devices:
Bash:
root@frigate:~# cat /dev/dri/card0
root@frigate:~# cat /dev/dri/renderD128
Frigate should then also be able to use the GPU once this is enabled in the Frigate config.
 
Last edited:
Please help. I spend whole day reading and trying. When I run intel_gpu_top in my LXC I see the graphs. Everything works fine. When I run this inside frigate docker container on my LXC I get

Code:
Failed to initialize PMU! (Permission denied)

Inside frigate docker container I see.

Code:
oot@ce5c0c90a9b8:/opt/frigate# ls -la /dev/dri
total 0
drwxr-xr-x 2 root root         80 Jul 17 12:53 .
drwxr-xr-x 6 root root        360 Jul 17 12:53 ..
crw-rw-rw- 1 root video  226,   0 Jul 17 12:53 card0
crw-rw-rw- 1 root render 226, 128 Jul 17 12:53 renderD128

root@ce5c0c90a9b8:/opt/frigate# id 
uid=0(root) gid=0(root) groups=0(root),44(video),103(render)

Although I added manually as --group-add 44,103 did not work because there are no such groups in a docker container I had to create them.

Configuration of my LXC container is this. And I run it as Privileged.

Code:
arch: amd64
cores: 1
features: nesting=1
hostname: frigate
memory: 2048
mp0: /tank/nvr,mp=/media/frigate
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.1.1,hwaddr=B2:50:AB:02:A5:6B,ip=192.168.1.108/24,type=veth
ostype: debian
rootfs: drives:108/vm-108-disk-0.raw,size=8G
swap: 1024
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 29:0 rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/dri/renderD128 dev/renderD128 none bind,optional,create=file

VAINFO inside my docker container

Code:
root@ce5c0c90a9b8:/opt/frigate# vainfo
error: XDG_RUNTIME_DIR is invalid or not set in the environment.
error: can't connect to X server!
libva info: VA-API version 1.17.0
libva info: User environment variable requested driver 'i965'
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/i965_drv_video.so
libva info: Found init function __vaDriverInit_1_8
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.17 (libva 2.10.0)
vainfo: Driver version: Intel i965 driver for Intel(R) Coffee Lake - 2.4.1
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Simple            : VAEntrypointEncSlice
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSlice
      VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264Main               : VAEntrypointEncSliceLP
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointEncSliceLP
      VAProfileH264MultiviewHigh      : VAEntrypointVLD
      VAProfileH264MultiviewHigh      : VAEntrypointEncSlice
      VAProfileH264StereoHigh         : VAEntrypointVLD
      VAProfileH264StereoHigh         : VAEntrypointEncSlice
      VAProfileVC1Simple              : VAEntrypointVLD
      VAProfileVC1Main                : VAEntrypointVLD
      VAProfileVC1Advanced            : VAEntrypointVLD
      VAProfileNone                   : VAEntrypointVideoProc
      VAProfileJPEGBaseline           : VAEntrypointVLD
      VAProfileJPEGBaseline           : VAEntrypointEncPicture
      VAProfileVP8Version0_3          : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSlice
      VAProfileVP9Profile0            : VAEntrypointVLD
      VAProfileVP9Profile0            : VAEntrypointEncSlice
      VAProfileVP9Profile2            : VAEntrypointVLD

Please help, I am desperate.
 
Please help. I spend whole day reading and trying. When I run intel_gpu_top in my LXC I see the graphs. Everything works fine. When I run this inside frigate docker container on my LXC I get
As you'll find on this forum, there are problems with running Docker in a container. The Proxmox staff (and others on this forum) advise everybody to run Docker in a VM.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!