Major number for nvidia-uvm changes over reboot

hlab

New Member
Mar 26, 2023
22
1
3
Hi,
I've LXCs working with `nvidia-uvm` passed through but if reboot host, the major number changes.
Is there way to keep major number same over reboot
OR
way to figure out and update container config on reboot.
 
Do a `ls -l /dev/nvidia-uvm` on the host. It shows major and minor device number right after the GID column.
In the lxc-config I have e.g.
```
lxc.cgroup2.devices.allow = c 236:1 rwm
lxc.cgroup2.devices.allow = c 236:0 rwm
...
lxc.mount.entry = /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file 0 2
lxc.mount.entry = /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file 0 2
...
```
 
Last edited:
Do a `ls -l /dev/nvidia-uvm` on the host. It shows major and minor device number right after the GID column.
In the lxc-config I have e.g.
```
lxc.cgroup2.devices.allow = c 236:1 rwm
lxc.cgroup2.devices.allow = c 236:0 rwm
...
lxc.mount.entry = /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file 0 2
lxc.mount.entry = /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file 0 2
...
```
ok it finally happened again.
Yes I know how to find major and minor device number, however major number changes over PVE reboots.
for e.g.
BEFORE:
Bash:
root@pve:~# ls -l /dev/nvidia-uvm
crw-rw-rw- 1 root root 508, 0 Dec  4 15:03 /dev/nvidia-uvm
AFTER:
Bash:
root@pve:~# ls -l /dev/nvidia-uvm
crw-rw-rw- 1 root root 505, 0 Dec  4 15:03 /dev/nvidia-uvm

Is there way to keep major device number same or figure out at boot and update lxc config?
 
I also have this problem on 7.4, using cgroup2 to make my GPU available to some unprivileged LXCs. Did you ever make any progress?

It bites me just about every reboot. I'm wishing cgroup2 had an alternative to using device numbers, along the lines of identifying disks by their UUID. I don't see anything in the docs, but hope springs eternal.
 
I also have this problem on 7.4, using cgroup2 to make my GPU available to some unprivileged LXCs. Did you ever make any progress?

It bites me just about every reboot. I'm wishing cgroup2 had an alternative to using device numbers, along the lines of identifying disks by their UUID. I don't see anything in the docs, but hope springs eternal.
Nope I've just made a habit of checking all numbers after reboot / power cycle
 
For anyone coming on this later, this problem is solved in v8.1:

Proxmox 8.1 (and maybe 8.0?) has explicit device sharing by filename. My example, sharing my gpu, now looks like this in the lxc.conf. There are no longer any cgroup2.devices.allow or mount.entry elements required.

Code:
dev0: /dev/nvidia0
dev1: /dev/nvidiactl
dev2: /dev/nvidia-uvm
dev3: /dev/nvidia-uvm-tools
 
For anyone coming on this later, this problem is solved in v8.1:

Proxmox 8.1 (and maybe 8.0?) has explicit device sharing by filename. My example, sharing my gpu, now looks like this in the lxc.conf. There are no longer any cgroup2.devices.allow or mount.entry elements required.

Code:
dev0: /dev/nvidia0
dev1: /dev/nvidiactl
dev2: /dev/nvidia-uvm
dev3: /dev/nvidia-uvm-tools
Hi, I do not understand your code.

currently I have these lines in my lxc.conf for /dev/nvidia-uvm and /dev/nvidia-uvm-tools:
Code:
lxc.cgroup2.devices.allow: c 507:0 rw
lxc.cgroup2.devices.allow: c 507:1 rw

What should I change them to?

this is the full entry for my nvidia card:
Code:
lxc.cgroup2.devices.allow: c 195:0 rw
lxc.cgroup2.devices.allow: c 195:255 rw
lxc.cgroup2.devices.allow: c 195:254 rw
lxc.cgroup2.devices.allow: c 507:0 rw
lxc.cgroup2.devices.allow: c 507:1 rw
lxc.cgroup2.devices.allow: c 10:144 rw
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvram dev/nvram none bind,optional,create=file
 
Last edited:
  • Like
Reactions: Gavino
Hi, I do not understand your code.

currently I have these lines in my lxc.conf for /dev/nvidia-uvm and /dev/nvidia-uvm-tools:
Code:
lxc.cgroup2.devices.allow: c 507:0 rw
lxc.cgroup2.devices.allow: c 507:1 rw

What should I change them to?

this is the full entry for my nvidia card:
Code:
lxc.cgroup2.devices.allow: c 195:0 rw
lxc.cgroup2.devices.allow: c 195:255 rw
lxc.cgroup2.devices.allow: c 195:254 rw
lxc.cgroup2.devices.allow: c 507:0 rw
lxc.cgroup2.devices.allow: c 507:1 rw
lxc.cgroup2.devices.allow: c 10:144 rw
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvram dev/nvram none bind,optional,create=file

It means replace everything you've described as the "full entry for my nvidia card" with just this:

Code:
dev0: /dev/nvidia0
dev1: /dev/nvidiactl
dev2: /dev/nvidia-uvm
dev3: /dev/nvidia-uvm-tools
dev4: /dev/nvidia-caps/nvidia-cap1
dev5: /dev/nvidia-caps/nvidia-cap2

I'm not sure what the cgroup2 permissions are for that (r,w,m perhaps?) but whatever the defaults are - they work for me.

Don't forget those nvidia-cap* devices, if your setup has them. Also reading your configuration, you have /dev/nvram listed. I'm pretty sure you don't want to pass through your non-volatile RAM storage, which is the memory of the Real-Time Clock. You probably just picked that up with an "nv" grep, but that's "non-volatile" and not "nvidia". :)
 
Last edited: