giving LXC direct GPU access for host/lxc CUDA+vGPU?

zenowl77

Member
Feb 22, 2024
91
12
8
okay so i have setup a merged driver with both KVM/vGPU and standard features, so the nvidia modules nvidia-modeset, nvidia-uvm & nvidia-uvm-tools are available on the proxmox host machine. i have tried every guide, tutorial, help post, etc i can find online for every type of GPU to find any way to make it work, best i have gotten so far is the processes showed up on the host nvidia-smi list but they error out saying unknown cuda error...

GPU is a Nvidia Tesla P4, i have tried driver 17.2/550.90.05 and 16.4/535.161.05 merged drivers, currently on 16.4 since 17+ causes issues in linux VMs with vgpu and nvidia-smi claiming a driver mismatch on any driver version. (windows is fine using 535 but i don't want to be stuck on windows vms only)

i am guessing maybe this is a permissions issue, possibly to be solved with lxc.idmap: and/or permissions corrections on the host? but everything i have tried doesnt work....

does anyone have LXC cuda/encoding in docker/jellyfin working at the same time as vGPU with a nvidia GPU? what did you have to do to get it working?

Docker lxc.conf:
Code:
arch: amd64
cores: 16
features: mknod=1,nesting=1
hostname: docker
memory: 8192
mp1: /mnt/10TB-2,mp=/mnt/10TB-2
mp2: /mnt/8TB,mp=/mnt/8TB
nameserver: 10.0.0.1
net0: name=eth0,bridge=vmbr0,gw=10.0.0.1,hwaddr=BC:24:11:15:95:AD,ip=10.0.0.220/24,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-118-disk-0,size=640G
swap: 0
tags: proxmox-helper-scripts
lxc.cgroup2.devices.allow: a
lxc.cap.drop: 
lxc.cgroup2.devices.allow: c 188:* rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup2.devices.allow: c 29:0 rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/net dev/net none bind,create=dir
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
lxc.cap.drop: 
lxc.cgroup2.devices.allow: c 188:* rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup2.devices.allow: c 29:0 rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/net dev/net none bind,create=dir
lxc.cgroup2.devices.allow: c 10:* rwm
lxc.cgroup2.devices.allow: c 508:* rwm
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 506:* rwm
lxc.cgroup2.devices.allow: c 507:* rwm
lxc.cgroup2.devices.allow: c 510:* rwm
lxc.cgroup2.devices.allow: c 128:* rwm
lxc.cgroup2.devices.allow: c 129:* rwm
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvram dev/nvram none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD129 dev/dri/renderD128 none bind,optional,create=file

the docker lxc sees the gpu wit nvidia-smi (although it randomly stops seeing it and has issues etc)


Code:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P4                       Off | 00000000:17:00.0 Off |                  Off |
| N/A   33C    P0              22W /  75W |      0MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+