[SOLVED] Sharing GPU to LXC container - Failed to initialize NVML: Unknown Error

ntblade

Renowned Member
Apr 29, 2011
19
1
68
Hi all,
I'm trying to share a GPU with a Debian Bullseye (11) container. I installed the nvidia driver using NVIDIA-Linux-x86_64-390.144.run on the proxmox host and then on the container.
Host:
Code:
nvidia-smi
Sat Oct 30 22:27:21 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.144                Driver Version: 390.144                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro 600          Off  | 00000000:05:00.0 Off |                  N/A |
| 30%   62C    P0    N/A /  N/A |      0MiB /   963MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

/etc/modules-load.d/modules.conf:
Code:
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"

# Nvidia modules
nvidia
nvidia_uvm

/etc/udev/rules.d/70-nvidia.rules:
Code:
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"

Here's the the dev list and container config on the host:
Code:
ls -la /dev/nvid*
crw-rw-rw- 1 root root 195,   0 Oct 30 22:16 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Oct 30 22:16 /dev/nvidiactl
crw-rw-rw- 1 root root 239,   0 Oct 30 22:16 /dev/nvidia-uvm
crw-rw-rw- 1 root root 239,   1 Oct 30 22:16 /dev/nvidia-uvm-tools

lxc.cgroup.devices.allow: c 195:* rw
lxc.cgroup.devices.allow: c 239:* rw
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
and on the container:
Code:
ls -la /dev/nvidia*
crw-rw-rw- 1 root root 239,   0 Oct 30 21:16 /dev/nvidia-uvm
crw-rw-rw- 1 root root 239,   1 Oct 30 21:16 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Oct 30 21:16 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Oct 30 21:16 /dev/nvidiactl

However, when I run nvidia-smi on the container:
Code:
nvidia-smi
Failed to initialize NVML: Unknown Error

Is anyone able to help please?

Thanks

NTB
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!