VDI Options | Post VMware Roadmap / Decisions

vasquezmi

Member
Aug 26, 2022
5
0
6
I am looking to understand the roadmap as well as options in our current VE setting (8.1.3).
There are limited resources in PVE for VDI and they all seem stale....Deskpool seemed to have the client based upon on a good path, but I think the development has stopped.

VMWare has unknown future issues with organizations that will likely sour over time with the recent buy out.
**Most organizations of size leverage VDI Pools with Desktop (golden) images.

So my question is what is the best VDI client tool to use today, options?
Does the Proxmox development community have the appetite to open these options and/or work with volunteers to try out enhancements. I will put my hat and environment in for the works.
 
Mind sharing the issues you ran into? Were they specific to the AMD chipset or proxmox?
well i don't think they were specific to amd, but who knows..

there were a mainly three bigger issues and i'm not fully clear on what causes them

* on some older platform (haswell era) the board was not able to boot with the card in it at all, no clue as to why
* in some configurations virtual functions could not be created, it seems this was fixed by moving (other unrelated) pci devices to different slots or removing them altogether (this was on an epyc board)
* sometimes, the vf seem to crash/get extremely slow and won't work properly afterwards (e.g. one can see gpu hangs in dmesg on the host etc) it goes from e.g. 140 FPS in guest to 1 FPS and finally hangs/crashes on guest reboot

the last issue may be due to thermal constraints or hardware/firmware issue, currently not clear

also currently the backport-driver does not compile on 6.8 (and our newest 6.5 either :( ) but i hope that will be fixed in the future
AFAIU the longer term plan is to integrate those drivers into the upstream kernel (though i don't have any idea when this will be?)
 
  • Like
Reactions: shabsta
well i don't think they were specific to amd, but who knows..

there were a mainly three bigger issues and i'm not fully clear on what causes them

* on some older platform (haswell era) the board was not able to boot with the card in it at all, no clue as to why
* in some configurations virtual functions could not be created, it seems this was fixed by moving (other unrelated) pci devices to different slots or removing them altogether (this was on an epyc board)
* sometimes, the vf seem to crash/get extremely slow and won't work properly afterwards (e.g. one can see gpu hangs in dmesg on the host etc) it goes from e.g. 140 FPS in guest to 1 FPS and finally hangs/crashes on guest reboot

the last issue may be due to thermal constraints or hardware/firmware issue, currently not clear

also currently the backport-driver does not compile on 6.8 (and our newest 6.5 either :( ) but i hope that will be fixed in the future
AFAIU the longer term plan is to integrate those drivers into the upstream kernel (though i don't have any idea when this will be?)
thanks for the feedback Dominik

I've encountered weird non related issues with proxmox and the latest generation EPYC's on Supermicro boards - specifically around IOMMU hence I asked about AMD. I found that disabling IOMMU in bios resolved these issues

I have a new intel system being shipped towards the end of this month with flex 170 cards in them so will run through some tests and revert back on my findings.
 
@dcsapak I think I know the answer to this, but can vms that have a gpu shard mapped be live migrated?

No. Unfortunately, devices with a vGPU cannot be live migrated. You must shut them down to migrate them. Someone can correct me if I'm wrong, but it's my understanding that there's currently no way to live migrate the occupied memory in the GPU to the new host.
 
@dcsapak I think I know the answer to this, but can vms that have a gpu shard mapped be live migrated?
No. Unfortunately, devices with a vGPU cannot be live migrated. You must shut them down to migrate them. Someone can correct me if I'm wrong, but it's my understanding that there's currently no way to live migrate the occupied memory in the GPU to the new host.
AFAIU the virtual functions of these intel gpus don't have that capability, but e.g. when using GPUs from NVIDIA with their GRID/vGPU technology that works

there are pending patches to enable support for those live migrations, but they're not applied/reviewed yet: https://lists.proxmox.com/pipermail/pve-devel/2024-April/063463.html
so in general all the plumbing work for that is there in qemu and the kernel, it only depends on hardware and driver capability

* sometimes, the vf seem to crash/get extremely slow and won't work properly afterwards (e.g. one can see gpu hangs in dmesg on the host etc) it goes from e.g. 140 FPS in guest to 1 FPS and finally hangs/crashes on guest reboot

the last issue may be due to thermal constraints or hardware/firmware issue, currently not clear
btw. it turned out the stability issues with the intel flex was in fact a thermal issue, our fan speeds were set too low in the tested servers, and increasing that made it stable
 
Are you guys working on an improved experience and support for the Intel flex cards?

Thanks for letting me know about the heating issues @dcsapak - that makes more sense.
 
Are you guys working on an improved experience and support for the Intel flex cards?
aside from documenting the driver install process (once it works with current kernels) + how to configure the vfs what would you have imagined?
if the vfs are there, using them is pretty straight forward, add a 'resource mapping' -> add to vm
 
aside from documenting the driver install process (once it works with current kernels) + how to configure the vfs what would you have imagined?
if the vfs are there, using them is pretty straight forward, add a 'resource mapping' -> add to vm
Just checking to see if you have any documentation to share about driver install progress?

I see it’s well documented for nvidia, would be nice to have the same for the intel flex 170’s

https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE
 
Just checking to see if you have any documentation to share about driver install progress?
not yet, but we're working on it, but no timeframe yet. installing the driver is more or less straightforward (from https://github.com/intel-gpu/intel-gpu-i915-backports) but the activation of the virtual functions need a bit of configuration (on ubuntu it's possible with the xpu-smi tool from intel, but that's currently not available for debian stable)
 
well i don't think they were specific to amd, but who knows..

there were a mainly three bigger issues and i'm not fully clear on what causes them

* on some older platform (haswell era) the board was not able to boot with the card in it at all, no clue as to why
* in some configurations virtual functions could not be created, it seems this was fixed by moving (other unrelated) pci devices to different slots or removing them altogether (this was on an epyc board)
* sometimes, the vf seem to crash/get extremely slow and won't work properly afterwards (e.g. one can see gpu hangs in dmesg on the host etc) it goes from e.g. 140 FPS in guest to 1 FPS and finally hangs/crashes on guest reboot

the last issue may be due to thermal constraints or hardware/firmware issue, currently not clear

also currently the backport-driver does not compile on 6.8 (and our newest 6.5 either :( ) but i hope that will be fixed in the future
AFAIU the longer term plan is to integrate those drivers into the upstream kernel (though i don't have any idea when this will be?)
Hi,

Could you provide details on which kernel and drivers you've used in Proxmox? I'm currently trying to get a Flex 170 running on my Minisforum ar900i with an i9-13900HX, but I'm struggling to find working drivers for the GPU.
 
Could you provide details on which kernel and drivers you've used in Proxmox? I'm currently trying to get a Flex 170 running on my Minisforum ar900i with an i9-13900HX, but I'm struggling to find working drivers for the GPU.
i tested with 6.8.8-3-pve (but i think all 6.8 kernels should work) and the driver from here:

https://github.com/intel-gpu/intel-gpu-i915-backports/tree/backport/main

use the 'backport/main' branch, install the dependencies (dkms, and a few others can't remember currently) and use 'make i915dkmsdeb-pkg' this produces a deb file which contains the dkms driver

but as i said, you also want to use the 'xpu-smi' utility, which we're currently trying to figure out how we get it onto pve (since intel only provides the packages for ubuntu not debian)
 
the latest commit (as of this writing) in backport/main does not build with kernel 6.8.12-1-pve for me.
the same it seems to build OK and work fine with kernel 6.8.8-3-pve

intel flex 140 with FW: DG02_2.2356
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!