Advice needed for working with GPU's in an HA Cluster

surfrock66

Well-Known Member
Feb 10, 2020
54
10
48
42
I have a 3 node HA cluster of enterprise hardware which runs a small home business and a bunch of personal VM's backed by iSCSI ZFS LUNS. It's extremely underutilized at this point, but I'm looking to expand my usage.

I have a bit of money to spend on upgrades, and I am looking into the feasibility of procuring some GPU's. My budget is in the neighborhood of $600-1000. The main factor is I have 3 hosts. Because I have HA (and have experimented with ProxLB, which I am not using right now) all of my VM's are built as if they can freely migrate, and that's been fine so far, but all my reading about adding GPU's has left me more confused than when I started. As for starting goals, I want to add GPU transcoding to my Jellyfin instance, dedicate a GPU to immich for doing ML on my library, add a GPU to a VDI VM I use as a virtual workspace, and potentially use local language models for some development which is not yet defined.

I've read a ton on using nvidia GPU's with vGPU's, but I have a lot of concerns. I do not have the funds to buy an actual vgpu license, and doing the fastapi-dls thing feels a little too unreliable to justify a large spend. A P40 would be in the budget range, but without tensor cores and with the licensing complexity I'm not sure it's a good investment. Additionally, the documentation on how that would work with HA hasn't clicked; I don't even see HA mentioned here: https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE . Most threads on using non-nvidia GPU's has led me to believe it isn't really worth it.

Another option I though of was using multiple lower-cost GPU's (like a Tesla P4) and just assign full cards to the VM's; my reading though is I'm going to break things with HA, and I'll have to use resource mapping which is more complex and outside my comfort zone: https://pve.proxmox.com/wiki/QEMU/KVM_Virtual_Machines#resource_mapping I also don't have the tensor cores with this option.

It seems like I have a lot of options, but at this point I'm not sure what's the best way to proceed. I don't want to spend $600 and end up with something causing more headaches than it's bringing value. I'd like to hear people's advice, experience, and recommendations especially if there's something I'm missing here that my reading hasn't uncovered.