Use vGPU on LXC

yezifeng · Mar 12, 2021

Hi, all

I have a Tesla P4 GPU. How can I use vGPU on the LXC container, just like using the mdev device on a VM. Is there any method or suggestion?

By the way, I tried GPU passthrough on LXC containers, and it worked perfectly.But for vGPU, I don't have a good idea.

best wishes

Ramalama · Mar 12, 2021

You need to mount the device node
/dev/... I don't know how yours is called

Ramalama · Mar 12, 2021

https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
https://amp.reddit.com/r/Proxmox/comments/glog5j/lxc_gpu_passthrough/

yezifeng · Mar 12, 2021

Hi thanks for your reply,

Forgive my bad english

Ramalama said:
https://theorangeone.net/posts/lxc-nvidia-gpu-passthrough/
https://amp.reddit.com/r/Proxmox/comments/glog5j/lxc_gpu_passthrough/

This is useful for GPU passthrough

However, I am trying to use vGPU on LXC, like configured 02:00.0,mdev=nvidia-64 on VM

Also, I tried to configure vGPU like GPU passthrough.
Unfortunately, the vGPU management driver seems to lack nvidia-drm and nvidia-uvm modules.
And there are no similar device nodes like /dev/dri/cardX and /dev/dri/renderDXXX.

I don't know how to do

rrprime · Oct 4, 2021

resurrecting this thread as I am trying to achieve this.

ismar.san · Aug 11, 2022

Does anyone have any news on this if it's possible?

Ramalama · Aug 12, 2022

To passthrough a gpu to an lxc container you don't need a tesla/quadro gpu...
It works with every GPU...

Here is an example with an normal nvidia consumer GPU:

1. Install the drivers on the host...
- best way is to install the drivers directly from nvidia, because the packaged drivers from apt doesn't include the nvidia uvm tools. You probably even don't need them, but they are useful to see if the container actually utilizes the GPU...
- you need for the ncidia drivers the kernel-headers of your kernel, gcc and make
- exanple: apt install pve-headers-5.15 gcc make
- then reboot and install the drivers and reboot again.

2.

Bash:

root@proxmox:~# ls -l /dev/nvidiactl
crw-rw-rw- 1 root root 195, 255 Jun 27 13:44 /dev/nvidiactl
root@proxmox:~# ls -l /dev/nvidia-uvm
crw-rw-rw- 1 root root 505, 0 Jun 27 13:44 /dev/nvidia-uvm

You see i have there 195 and 505...
You may have there sth different.
You have to change that below in the cgroup2 lines....

Shutdown the lxc container...
Add to your LXC config:

Bash:

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 505:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

Start the container/containers again...

Ssh into the container...
Install the same driver version as on the host, but this time with an argument:
./NVIDIA-Linux-x86_64-515.48.07.run --no-kernel-module

You may need to reboot the container and voila, it's done...

To check if it works, simply start sth on the container, idk, video encoding/Decoding whatever...
Ssh into the proxmox host again and execute: nvidia-smi

You will see there what container does what and how much % it utilizes the GPU....

For AMD cards this is similar, but you have to google an how-to...
Same for intel.

However, in short, passing an GPU to an LXC container is easy af. And you can pass the same GPU to as many containers as you want.
The only limit is, that nvidia for example restricts decoding/encoding etc only to 3-5 simultaneous threads.
Means 3-5 containers can access at the same time the GPU, depending on the card.

However passing a GPU to a VM is a whole different story.
To one VM, if it's a dedicated cards, that's easy af either.
One Card to multiple VM's = vGPU/MxGPU + it's buggy as hell, in short, forget it.
The only hope i see at all, if intel releases the arc graphic cards with proper SR-IOV.
Because SR-IOV is the only reliable perfectly working way to passthrough something to multiple VM's without to rely on broken drivers.

Cheers

ismar.san · Aug 12, 2022

Hi friend, thanks for all this information, I'm facing some problems with using more than one container on the same GPU, each container has an application for real-time video rendering, I can't have a drop in FPS, for you to have an idea we did it running 3 video streams on an RTX5000, I believe it is too little for what the card has to offer, if trying for 4 streams it affects the previous ones and everything crashes.

basically this limitation forces me to buy more GPUs or use a Hypervisor that supports native vGPU, in this case the license is also paid but it would be a lower value than buying more RTX6000 cards, but thanks thanks.

Ramalama · Aug 12, 2022

ismar.san said:
Hi friend, thanks for all this information, I'm facing some problems with using more than one container on the same GPU, each container has an application for real-time video rendering, I can't have a drop in FPS, for you to have an idea we did it running 3 video streams on an RTX5000, I believe it is too little for what the card has to offer, if trying for 4 streams it affects the previous ones and everything crashes.

basically this limitation forces me to buy more GPUs or use a Hypervisor that supports native vGPU, in this case the license is also paid but it would be a lower value than buying more RTX6000 cards, but thanks thanks.

It's either a driver bug, or you need another driver.
There are enterprise R515 drivers for such cards.
They should have unlimited sessions.
But tbh, if you encounter problems woth more as 3 sessions, it sounds like a bug.
Nothing proxmox related, just a general nvidia bug, because all consumer cards have a limit of 3 sessions. Your card shouldn't have a limit, but as the enterprise and consumer driver is almost the same, it sounds for me like that.

Anyway, other as that i can't help. If you by any means still using the normal driver, try the R515, if you are already using R515, then you can't do much.

Wish you luck & cheers

ismar.san · Aug 12, 2022

Ramalama said:
It's either a driver bug, or you need another driver.
There are enterprise R515 drivers for such cards.
They should have unlimited sessions.
But tbh, if you encounter problems woth more as 3 sessions, it sounds like a bug.
Nothing proxmox related, just a general nvidia bug, because all consumer cards have a limit of 3 sessions. Your card shouldn't have a limit, but as the enterprise and consumer driver is almost the same, it sounds for me like that.

Anyway, other as that i can't help. If you by any means still using the normal driver, try the R515, if you are already using R515, then you can't do much.

Wish you luck & cheers

I understand, exactly, I had seen before that consumer drivers had this limitation, but I forgot about that fact, I'll try to use Nvidia GRID drivers as you suggested, thank you very much.

ismar.san · Aug 13, 2022

Ramalama said:
It's either a driver bug, or you need another driver.
There are enterprise R515 drivers for such cards.
They should have unlimited sessions.
But tbh, if you encounter problems woth more as 3 sessions, it sounds like a bug.
Nothing proxmox related, just a general nvidia bug, because all consumer cards have a limit of 3 sessions. Your card shouldn't have a limit, but as the enterprise and consumer driver is almost the same, it sounds for me like that.

Anyway, other as that i can't help. If you by any means still using the normal driver, try the R515, if you are already using R515, then you can't do much.

Wish you luck & cheers

Friend, I decided to insist on the process of making the vGPU work, I don't need it to be with LXC, it could be with VMs, I got some commands around on the internet, you can take a look and see if I'm on the right path and help what would be next step? I believe it would create the vGPU profiles right? see here

yezifeng · Aug 23, 2022

ismar.san said:
Friend, I decided to insist on the process of making the vGPU work, I don't need it to be with LXC, it could be with VMs, I got some commands around on the internet, you can take a look and see if I'm on the right path and help what would be next step? I believe it would create the vGPU profiles right? see here

If you want vGPU to work for a virtual machine, you need to add configuration to the virtual machine configuration file like this:

hostpci0: 04:00.0,mdev=RTXA5000-4Q

You can also configure it on the web

dreen24 · Jan 13, 2024

I found this article helpful
https://gitlab.com/polloloco/vgpu-proxmox

FancyBee · Apr 23, 2024

VGPU with VM is working fine.
but for that case you need separate VM for each container that need some vgpu. I looking for way to work around of VGPU in LXC.
What kind of devices needed to send to LXC to use it inside.

Right now I'm still considering to install docker directly to Proxmox host and use a Portainer to manage it.

DJB-WSM · Jul 14, 2024

After the same as @FancyBee. vGPU passthrough for VMs works fine but how the hell do I do the same with LXCs? Everything I read on LXCs is about basically sharing the HOSTs GPU driver and that is not how the vGPU driver setup works. You also cant have both before someone else says install the drivers on the Host! With a vGPU configuration you intentionally prevent the host from using the GPU!

bindi · Jul 15, 2024

You do need drivers on the host for vGPU..

I'm using nvidia-container-toolkit, maybe it will work alongside with vGPU enabled drivers? https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

<CTID>.conf additions:

Code:

lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
lxc.hook.mount: /usr/share/lxc/hooks/nvidia

FancyBee · Jul 17, 2024

bindi said:
You do need drivers on the host for vGPU..

I'm using nvidia-container-toolkit, maybe it will work alongside with vGPU enabled drivers? https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

<CTID>.conf additions:

Code:

lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u' lxc.environment: NVIDIA_VISIBLE_DEVICES=all lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility,video lxc.hook.mount: /usr/share/lxc/hooks/nvidia

I tried to install nvidia toolkit it is part of ollama host prep manual, but if no devices passed to lxc, than toolkit can't complete init and etc. after that docker container don't get any vgpu or gpu.

DARKACE · Dec 15, 2024

Hi i tried getting it to work.

u need to merge the nvidia vgpu host driver and the normal driver then u can get is working see this thread

Z

Thread 'Merge patch for nvidia drivers to get LXC containers gpu access working'

Apr 29, 2024

found this github here:
https://github.com/VGPU-Community-Drivers/vGPU-Unlock-patcher

it includes patches for merging drivers and i managed to get it working on proxmox with 535.129.03 to get the nvidia_uvm kernel module installed and now LXC containers successfully can access the GPU and make use of it, tested with jellyfin and NVENC access.

i just downloaded the repo threw a copy of the nvidia .run driver files into the folder, (it extracts both the kvm and consumer driver you need to put both in the folder) ran the ./patch.sh general-merge command then cd into the -patched...

Use vGPU on LXC

Member

Renowned Member

Renowned Member

Member

New Member

New Member

Renowned Member

New Member

Renowned Member

New Member

New Member

Member

New Member

Member

New Member

Member

Member

New Member

We value your privacy