Use vGPU on LXC

yezifeng

Member
Oct 13, 2020
10
0
21
30
Hi, all

I have a Tesla P4 GPU. How can I use vGPU on the LXC container, just like using the mdev device on a VM. Is there any method or suggestion?

By the way, I tried GPU passthrough on LXC containers, and it worked perfectly.But for vGPU, I don't have a good idea.

best wishes
 
Hi thanks for your reply,

Forgive my bad english :D


This is useful for GPU passthrough

However, I am trying to use vGPU on LXC, like configured 02:00.0,mdev=nvidia-64 on VM

Also, I tried to configure vGPU like GPU passthrough.
Unfortunately, the vGPU management driver seems to lack nvidia-drm and nvidia-uvm modules.
And there are no similar device nodes like /dev/dri/cardX and /dev/dri/renderDXXX.

I don't know how to do
 
To passthrough a gpu to an lxc container you don't need a tesla/quadro gpu...
It works with every GPU...

Here is an example with an normal nvidia consumer GPU:

1. Install the drivers on the host...
- best way is to install the drivers directly from nvidia, because the packaged drivers from apt doesn't include the nvidia uvm tools. You probably even don't need them, but they are useful to see if the container actually utilizes the GPU...
- you need for the ncidia drivers the kernel-headers of your kernel, gcc and make
- exanple: apt install pve-headers-5.15 gcc make
- then reboot and install the drivers and reboot again.

2.
Bash:
root@proxmox:~# ls -l /dev/nvidiactl
crw-rw-rw- 1 root root 195, 255 Jun 27 13:44 /dev/nvidiactl
root@proxmox:~# ls -l /dev/nvidia-uvm
crw-rw-rw- 1 root root 505, 0 Jun 27 13:44 /dev/nvidia-uvm

You see i have there 195 and 505...
You may have there sth different.
You have to change that below in the cgroup2 lines....

Shutdown the lxc container...
Add to your LXC config:
Bash:
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 505:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
Start the container/containers again...

Ssh into the container...
Install the same driver version as on the host, but this time with an argument:
./NVIDIA-Linux-x86_64-515.48.07.run --no-kernel-module

You may need to reboot the container and voila, it's done...

To check if it works, simply start sth on the container, idk, video encoding/Decoding whatever...
Ssh into the proxmox host again and execute: nvidia-smi

You will see there what container does what and how much % it utilizes the GPU....

For AMD cards this is similar, but you have to google an how-to...
Same for intel.

However, in short, passing an GPU to an LXC container is easy af. And you can pass the same GPU to as many containers as you want.
The only limit is, that nvidia for example restricts decoding/encoding etc only to 3-5 simultaneous threads.
Means 3-5 containers can access at the same time the GPU, depending on the card.

However passing a GPU to a VM is a whole different story.
To one VM, if it's a dedicated cards, that's easy af either.
One Card to multiple VM's = vGPU/MxGPU + it's buggy as hell, in short, forget it.
The only hope i see at all, if intel releases the arc graphic cards with proper SR-IOV.
Because SR-IOV is the only reliable perfectly working way to passthrough something to multiple VM's without to rely on broken drivers.

Cheers ✌️
 
Hi friend, thanks for all this information, I'm facing some problems with using more than one container on the same GPU, each container has an application for real-time video rendering, I can't have a drop in FPS, for you to have an idea we did it running 3 video streams on an RTX5000, I believe it is too little for what the card has to offer, if trying for 4 streams it affects the previous ones and everything crashes.

basically this limitation forces me to buy more GPUs or use a Hypervisor that supports native vGPU, in this case the license is also paid but it would be a lower value than buying more RTX6000 cards, but thanks thanks.
 
Hi friend, thanks for all this information, I'm facing some problems with using more than one container on the same GPU, each container has an application for real-time video rendering, I can't have a drop in FPS, for you to have an idea we did it running 3 video streams on an RTX5000, I believe it is too little for what the card has to offer, if trying for 4 streams it affects the previous ones and everything crashes.

basically this limitation forces me to buy more GPUs or use a Hypervisor that supports native vGPU, in this case the license is also paid but it would be a lower value than buying more RTX6000 cards, but thanks thanks.
It's either a driver bug, or you need another driver.
There are enterprise R515 drivers for such cards.
They should have unlimited sessions.
But tbh, if you encounter problems woth more as 3 sessions, it sounds like a bug.
Nothing proxmox related, just a general nvidia bug, because all consumer cards have a limit of 3 sessions. Your card shouldn't have a limit, but as the enterprise and consumer driver is almost the same, it sounds for me like that.

Anyway, other as that i can't help. If you by any means still using the normal driver, try the R515, if you are already using R515, then you can't do much.

Wish you luck & cheers ✌️
 
  • Like
Reactions: ismar.san
It's either a driver bug, or you need another driver.
There are enterprise R515 drivers for such cards.
They should have unlimited sessions.
But tbh, if you encounter problems woth more as 3 sessions, it sounds like a bug.
Nothing proxmox related, just a general nvidia bug, because all consumer cards have a limit of 3 sessions. Your card shouldn't have a limit, but as the enterprise and consumer driver is almost the same, it sounds for me like that.

Anyway, other as that i can't help. If you by any means still using the normal driver, try the R515, if you are already using R515, then you can't do much.

Wish you luck & cheers ✌️

I understand, exactly, I had seen before that consumer drivers had this limitation, but I forgot about that fact, I'll try to use Nvidia GRID drivers as you suggested, thank you very much.
 
It's either a driver bug, or you need another driver.
There are enterprise R515 drivers for such cards.
They should have unlimited sessions.
But tbh, if you encounter problems woth more as 3 sessions, it sounds like a bug.
Nothing proxmox related, just a general nvidia bug, because all consumer cards have a limit of 3 sessions. Your card shouldn't have a limit, but as the enterprise and consumer driver is almost the same, it sounds for me like that.

Anyway, other as that i can't help. If you by any means still using the normal driver, try the R515, if you are already using R515, then you can't do much.

Wish you luck & cheers ✌️

Friend, I decided to insist on the process of making the vGPU work, I don't need it to be with LXC, it could be with VMs, I got some commands around on the internet, you can take a look and see if I'm on the right path and help what would be next step? I believe it would create the vGPU profiles right? see here
 
Friend, I decided to insist on the process of making the vGPU work, I don't need it to be with LXC, it could be with VMs, I got some commands around on the internet, you can take a look and see if I'm on the right path and help what would be next step? I believe it would create the vGPU profiles right? see here
If you want vGPU to work for a virtual machine, you need to add configuration to the virtual machine configuration file like this:

hostpci0: 04:00.0,mdev=RTXA5000-4Q

You can also configure it on the web
 
VGPU with VM is working fine.
but for that case you need separate VM for each container that need some vgpu. I looking for way to work around of VGPU in LXC.
What kind of devices needed to send to LXC to use it inside.

Right now I'm still considering to install docker directly to Proxmox host and use a Portainer to manage it.
 
  • Like
Reactions: DJB-WSM
After the same as @FancyBee. vGPU passthrough for VMs works fine but how the hell do I do the same with LXCs? Everything I read on LXCs is about basically sharing the HOSTs GPU driver and that is not how the vGPU driver setup works. You also cant have both before someone else says install the drivers on the Host! With a vGPU configuration you intentionally prevent the host from using the GPU!
 
You do need drivers on the host for vGPU..

I'm using nvidia-container-toolkit, maybe it will work alongside with vGPU enabled drivers? https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

<CTID>.conf additions:
Code:
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
lxc.hook.mount: /usr/share/lxc/hooks/nvidia
 
You do need drivers on the host for vGPU..

I'm using nvidia-container-toolkit, maybe it will work alongside with vGPU enabled drivers? https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

<CTID>.conf additions:
Code:
lxc.hook.pre-start: sh -c '[ ! -f /dev/nvidia0 ] && /usr/bin/nvidia-modprobe -c0 -u'
lxc.environment: NVIDIA_VISIBLE_DEVICES=all
lxc.environment: NVIDIA_DRIVER_CAPABILITIES=compute,utility,video
lxc.hook.mount: /usr/share/lxc/hooks/nvidia
I tried to install nvidia toolkit it is part of ollama host prep manual, but if no devices passed to lxc, than toolkit can't complete init and etc. after that docker container don't get any vgpu or gpu.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!