GPU passthrough to Container (LXC)

pkr

New Member
Aug 22, 2023
26
1
3
loving all things proxmox thus far

have created my 1st of many containers (ubuntu 22.04) and have 2 x NVIDIA 2080 GPUs on my host, which i would like to use in the contanier

what is the best way to do this ?

is there a detailed guide ?!
 
I support my lab colleagues running CUDA workloads on Docker. They want Docker and CUDA, I want to also run VMs.

On oVirt/RHV I solve that with passing the V100 GPUs through to one or two VMs, which then run the Nvidia augmented Docker for their CI-CD workloads.

A previous attempt to run Docker + CUDA side-by-side with oVirt/RHV failed because Docker and oVirt were righting over the network.

Likewise a previous attempt to run CUDA inside an OpenVZ container failed because the CUDA runtime is looking for certain devices files and OpenVZ is hiding them from containers. I tried patching the OpenVZ kernel drivers but without source code to the CUDA runtime, that effort seemed too much to pursue.

I am a bit curious if LXC works with CUDA out of the box, but not enough to try it on my proxmox test machines, where I have gone with passing the GPU to a VM and toggling forth and back is rather involved.

And the user demand is Docker, ...which should work inside LXC, but might not want to. And the chances of LXC+Docker+CUDA working without messing with KVM seem almost remote.

If you can get it to work, I'd like to hear how it went.

But I can assure you that CUDA on a VM with GPU pass-through works extremely easy also on Proxmox or KVM in general and might be a better approach ...once you get the pass-through GPU working.

I dislike "fat" VM virtualization when containers should do the job already. But when CUDA workloads suffer next to no VM overhead because of pass-through, pragmatism wins.
 
Has anyone had luck actually installing the "proper" NVIDIA drivers on the host?

I.e. On Proxmox 8 I download the appropriate .run file (for my Quadro P600) and the installation aborts, and tells me to use the Debian package manager to install instead. This is even after blacklisting nouveau, and 'apt remove --purge libnvcuvid1 libnvidia-encode1' (the two packages specified in the Jellyfin install instructions for Debian systems)

Trying later to just use those packages does not work. I end up with only /dev/nvram, and not the other 5 items in the /dev directory shown in most tuturials.
 
Has anyone had luck actually installing the "proper" NVIDIA drivers on the host?

The biggest problem with luck is that it's not reliable.

Proxmox and Nvidia software are constantly evolving and even if you should be so lucky as to get things to work, any package update might possibly undo things and without mainline support for the combination your luck might run out.

I've fallen to the same temptation often enough, e.g. tried to force CUDA to run on OpenVZ/Virtuozzo and found myself rewriting kernel code to offset what I'd still label 'CUDA runtime stupidities'... I ran out of time before it got it to work, but at least I learned something.

Containers are designed to share better, what was designed to be shared already.

Computers weren't designed to be shared initially, not even memory with data and code before John von Neumann's memo spread the idea, but in the case of storage, CPUs, memory, and even networks (which I'd argue have never been 'natural' in Unix) resources became much easier to share thanks to replacing (slow) human operators by (fast) software operating systems and design retrofits such as mapping (storage), resumable traps (CPU), memory mapping and virtualization (RAM), and 'better' networking.

But GPUs have changed their functionality and mission so rapidly over the last decades, that proper architecture, hardware abstractions, software and hardware co-design haven't quite caught up and I'm afraid they'll only ever will, once GPUs slow down their evolution and become boring.

And one such aspect IMHO can be seen in the fact that many want to virtualize the GPU for transcoding... a VPU functionality that, until streaming and remote gaming became a thing, wound up with the GPUs more or less by accident or marketing pushing ideas over thought.

VPU engines are fixed function blocks and evidently GPUs have them in varying numbers, relatively unrelated to the 'size' of a GPU. Sharing those should be pretty much like sharing a set of printers (limited time exclusive round-robin), and even if Microsoft still struggles with it, that hasn't been rocket science for a long time.

Except that on x86 VPUs are tied to the GPUs for said accident and lack of proper abstractions, both in hardware and in software.

But it's much worse with the proper GPU (graphics, not video), which just isn't designed to be shared: at best is has partition support.

So putting it under the exclusive control of a dedicated VM may not just be far easier now that at least PCIe pass-through mechanics have been devised and established, it's the better way of dealing with a resource not designed for sharing.

IMHO keeping it out of Proxmox' hair, is just a more efficient way of applying everyones' resources until xPUs, chipsets and operating systems have caught up.
 
Has anyone had luck actually installing the "proper" NVIDIA drivers on the host?

I.e. On Proxmox 8 I download the appropriate .run file (for my Quadro P600) and the installation aborts, and tells me to use the Debian package manager to install instead. This is even after blacklisting nouveau, and 'apt remove --purge libnvcuvid1 libnvidia-encode1' (the two packages specified in the Jellyfin install instructions for Debian systems)

Trying later to just use those packages does not work. I end up with only /dev/nvram, and not the other 5 items in the /dev directory shown in most tuturials.
I had the same issue as you. I received the message about using the Debian package manager to install instead. To fix this, I did apt remove purge nvidia*

You have to remove all nvidia packages before running the installer.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!