Isolation of tasks using GPU (not at the same time)

FredreikSchack

New Member
Aug 2, 2023
3
0
1
I wanted to isolate different tasks in VMs (Windows for gaming, something with Portainer for Docker containers and a Linux desktop for Python AI stuff), that are using one GPU and I’m a bit surprised about how challenging that seems to be, considering that virtualization is a big thing today. This is partially caused by nVidia preventing vGPU on consumer grade cards. Although it’s technically possible to work around the vGPU barrier, at least in Proxmox, it makes it more difficult. Further it’s caused by Windows limited support for Linux in Hyper-V, performance penalty for Docker containers in WSL and not least that Microsoft doesn't seem to provide any nimble solution for GPU virtualization.

If I use Hyper-V in Windows, there will be several layers of abstraction, that are each going to impact performance. Running WSL and Hyper-V doesn’t seem like a viable solution as that might conflict with Hyper-V about GPU resources.

Running the free Hyper-V with only command line is a nightmare because of all the security measures, it can be hard to get through to the Hyper-V at all and it still doesn’t support Linux very well.

To me it seems like the best solution is ProxMox, that probably supports Windows better than Hyper-V supports Linux. I understand that there is a functionality in ProxMox that is called GPU hotplugging and combined with Looking Galss it should give me an almost seamless automatic hotplugging? This with the benefit to stream to thin clients, which would be a nice to have :)

I’ve never used ProxMox, or Looking Glass, so would like to hear your experiences with this combination and if it would suite my use case? The gaming is for my son and not running simultaneously with the other work loads.
 
There are some attempts on sharing a local GPU like VirtGL and Juice Labs, but they both have a huge overhead and it's really not worth it.

I can run multiple machines at the same time, but only one with GPU. I think that I can make two VM's using one VHD, one with GPU Passthrough and one without, so I can run either depending on the needs.
 
At some point in the 1960's and with the help of virtual memory CPUs became so fast, that [human] operators could no longer keep up.
Operating systems replaced them, which could partition the (virtualized) memoy space and then juggle the CPU between tasks.

It helped that the CPUs did not hold a lot of state, so context switches were quick; so quick it created an illusion of concurrency and time sharing was born.

A GPU today contains not only thousands of cores, each also contains vector register files that may measure in kilobytes per core... which adds up with thousands of cores. So doing the equivalent of a full context switch would imply pumping quite a lot of data--without any immediate benefit and at terrible context switch latencies.

Then GPUs are designed not as independent cores, but to work in tightly coordinated wave-fronts of thousands of cores using hundreds of thousands of registers at Terabytes/s bandwidth and megabytes of caches so they rarely have to use GDDRx or HBM memory, because from their perspective it's almost as fast as a hard disk for one of the early virtual memory systems.

Since that GPU RAM is still critically tight, for a context switch you'd have to swap that to CPU RAM and even if modern GPUs will be able to access it directly, it will be over PCIe and thus at the speed of "tape", perhaps 70GB/sec not the 1000GB/s of GDDR6X or 3000GB/s of HBM3.

In other words: OS type multi-tasking, even if Linux had an idea on how to do that (and the GPU cores made that possible), simply would be too unefficient for practical use.

That leaves you with soft or hard partitioning the GPU between jobs. The first is quite normal even with CUDA: you can run several applications at once, but they have to agree on how they'll partition the resources. The second is splitting the GPU into parts that look like distinct GPUs to the OS. The hardware is in all Nvidia chips, but only enabled for those who paid the vGPU tax.

The latter would allow you to run different VMs and operating systems on those partitioned GPUs, but the resources are fixed at configuration, not the wonderful overcommit you'd get on a multi-tasking OS for CPUs.

In most private use cases, like the one you're describing, getting a separate GPU and perhaps a separate system to go with it will almost always be cheaper and easier to handle.

Except, when you can decide which one to run at any given time: then you just launch one VM or the other.

Of course someone could write an abstraction layer for things like VirtGL which has a third VM or a container manage the GPU resources for client VMs, but abstractions that work both for real-time game engines and batch type machine learning training and inferencing jobs are difficult to design.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!