okay so i have setup a merged driver with both KVM/vGPU and standard features, so the nvidia modules
GPU is a Nvidia Tesla P4, i have tried driver
i am guessing maybe this is a permissions issue, possibly to be solved with lxc.idmap: and/or permissions corrections on the host? but everything i have tried doesnt work....
does anyone have LXC cuda/encoding in docker/jellyfin working at the same time as vGPU with a nvidia GPU? what did you have to do to get it working?
Docker lxc.conf:
the docker lxc sees the gpu wit nvidia-smi (although it randomly stops seeing it and has issues etc)
nvidia-modeset, nvidia-uvm & nvidia-uvm-tools
are available on the proxmox host machine. i have tried every guide, tutorial, help post, etc i can find online for every type of GPU to find any way to make it work, best i have gotten so far is the processes showed up on the host nvidia-smi list but they error out saying unknown cuda error...GPU is a Nvidia Tesla P4, i have tried driver
17.2/550.90.05
and 16.4/535.161.05
merged drivers, currently on 16.4 since 17+ causes issues in linux VMs with vgpu and nvidia-smi claiming a driver mismatch on any driver version. (windows is fine using 535 but i don't want to be stuck on windows vms only)i am guessing maybe this is a permissions issue, possibly to be solved with lxc.idmap: and/or permissions corrections on the host? but everything i have tried doesnt work....
does anyone have LXC cuda/encoding in docker/jellyfin working at the same time as vGPU with a nvidia GPU? what did you have to do to get it working?
Docker lxc.conf:
Code:
arch: amd64
cores: 16
features: mknod=1,nesting=1
hostname: docker
memory: 8192
mp1: /mnt/10TB-2,mp=/mnt/10TB-2
mp2: /mnt/8TB,mp=/mnt/8TB
nameserver: 10.0.0.1
net0: name=eth0,bridge=vmbr0,gw=10.0.0.1,hwaddr=BC:24:11:15:95:AD,ip=10.0.0.220/24,type=veth
onboot: 1
ostype: debian
rootfs: local-lvm:vm-118-disk-0,size=640G
swap: 0
tags: proxmox-helper-scripts
lxc.cgroup2.devices.allow: a
lxc.cap.drop:
lxc.cgroup2.devices.allow: c 188:* rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup2.devices.allow: c 29:0 rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/net dev/net none bind,create=dir
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
lxc.cap.drop:
lxc.cgroup2.devices.allow: c 188:* rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.cgroup2.devices.allow: c 29:0 rwm
lxc.mount.entry: /dev/dri dev/dri none bind,optional,create=dir
lxc.mount.entry: /dev/net dev/net none bind,create=dir
lxc.cgroup2.devices.allow: c 10:* rwm
lxc.cgroup2.devices.allow: c 508:* rwm
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 506:* rwm
lxc.cgroup2.devices.allow: c 507:* rwm
lxc.cgroup2.devices.allow: c 510:* rwm
lxc.cgroup2.devices.allow: c 128:* rwm
lxc.cgroup2.devices.allow: c 129:* rwm
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
lxc.mount.entry: /dev/nvram dev/nvram none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file
lxc.mount.entry: /dev/dri/renderD129 dev/dri/renderD128 none bind,optional,create=file
the docker lxc sees the gpu wit nvidia-smi (although it randomly stops seeing it and has issues etc)
Code:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07 Driver Version: 535.161.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P4 Off | 00000000:17:00.0 Off | Off |
| N/A 33C P0 22W / 75W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+