Two GPU and their assigment to /dev/dri/renderD...

reah

Member
Dec 31, 2023
4
2
8
Hello,

I am running a Proxmox host (9.1.7) with two GPUs (AMD iGPU and NVIDIA dGPU (well actually three when counting the BMC's VGA)). I'm using them in different LXC which works flawless. The iGPU I use for Jellyfin transcoding, assignment is done so:
dev0: /dev/dri/card2,gid=44
dev1: /dev/dri/renderD129,gid=106

Now the issue: The DRM device nodes under /dev/dri/ (e.g. card0, card1, renderD128, renderD129) are assigned dynamically at boot, so the mapping between device nodes and physical GPUs changes from boot to boot and I then have to find out which is which, reconfigure the config and start up Jellyfin manually.

So what I need is a robust way to:
  1. Pass a specific GPU into my Jellyfin LXC container,
  2. Ensure the mapping remains stable across reboots,
  3. Have correct permissions (e.g. root:video) inside the container,
What would be the recommended approach to achieve stable, selective GPU passthrough for DRM devices in LXC/Proxmox environments?

Regards,
Reah
 
Set up udev rules on the host to map the devices to persistently named device nodes.
Then pass these consistently named device nodes to the containers.
 
an alternative would be to use '/dev/dri/by-path' which encodes the pciid and that should be more stable? (i hope at least)
 
  • Like
Reactions: Johannes S
Now I wasn't yet able to make my system assign the card and render nodes differently again. So I won't call the following a solution, but it's an intriguing observation.

I already tried the udev way and while it created me
Code:
/dev/dri/card-amd
/dev/dri/card-nvidia
/dev/dri/render-amd
/dev/dri/render-nvidia
on Proxmox, mouting them like in
Code:
dev0: /dev/dri/render-amd,gid=106
in the LXC config, using it inside the container didn't work out:
Code:
vainfo --display drm --device /dev/dri/render-amd
Failed to a DRM display for the given device

ffmpeg -v debug -init_hw_device drm=dr:/dev/dri/render-amd -init_hw_device vulkan@dr
Unable to get device info from DRM fd: Invalid argument!
Device creation failed: -542398533.
Failed to set value 'vulkan@dr' for option 'init_hw_device': Generic error in an external library

Now dcsapak proposed '/dev/dri/by-path' and I tested it. But configuring
Code:
dev0: /dev/dri/by-path/pci-0000:30:00.0-card,gid=44
dev1: /dev/dri/by-path/pci-0000:30:00.0-render,gid=106
results in same errors for vainfo and for ffmpeg.

Now more by chance than by intend I configured the container as follows:
Code:
dev0: /dev/dri/card0,gid=44
dev1: /dev/dri/card1,gid=44
dev2: /dev/dri/card2,gid=44
dev3: /dev/dri/renderD128,gid=106
dev4: /dev/dri/renderD129,gid=106
dev5: /dev/dri/by-path/pci-0000:30:00.0-card,gid=44
dev6: /dev/dri/by-path/pci-0000:30:00.0-render,gid=106

And to my surprise Now vainfo and ffmpeg work with pci-0000:30:00.0-render (same goes for /dev/dri/render-amd btw.).
So my hope is, that pci-0000:30:00.0-render will be useable regardless which card is renderD128 and which renderD129 as long as I have both mounted in the LXC.
 
  • Like
Reactions: Johannes S