Help Needed: Assigning Multiple vGPU Instances to a Single VM in Proxmox VE!

nhocit

New Member
Feb 2, 2023
2
0
1
Hi everyone,

I'm having an issue with vGPU assignments in Proxmox VE and could really use some help.

I have two NVIDIA A30 Tensor Core GPUs in my server, each with 24GB of capacity. I've set up vGPU passthrough, and I want to assign two vGPU instances, one from each GPU to a single VM so that the VM gets a total of 48GB of GPU memory. But every time I try this, my entire Proxmox VE system crashes.

Has anyone managed to assign multiple vGPU instances from different physical GPUs to a single VM in Proxmox VE? If so, how did you do it? Are there specific steps or configurations needed to make this work without crashing the system?

Any tips or advice would be greatly appreciated!

Thanks!

The errors look likes this:
Mar 12 09:02:30 mlspai01 kernel: [413021.864886] [nvidia-vgpu-vfio] €…Ñš÷ÿ€…Ñš÷ÿ€ìaÁÿÿÿÿ€…Ñš÷ÿ¸¹Á@æ
Mar 12 09:02:30 mlspai01 kernel: [413021.864886] µ: Failed to unpin all 0x1 pages Total pages unpinned: 0x0 status: 0x59 ret: -19
Mar 12 09:02:30 mlspai01 kernel: [413021.865887] [nvidia-vgpu-vfio] €…Ñš÷ÿ€…Ñš÷ÿ€ìaÁÿÿÿÿ€…Ñš÷ÿ¸¹Á@æ
Mar 12 09:02:30 mlspai01 kernel: [413021.865887] µ: Failed to unpin all 0x1 pages Total pages unpinned: 0x0 status: 0x59 ret: -19
Mar 12 09:02:30 mlspai01 kernel: [413021.866889] [nvidia-vgpu-vfio] €…Ñš÷ÿ€…Ñš÷ÿ€ìaÁÿÿÿÿ€…Ñš÷ÿ¸¹Á@æ
Mar 12 09:02:30 mlspai01 kernel: [413021.866889] µ: Failed to unpin all 0x1 pages Total pages unpinned: 0x0 status: 0x59 ret: -19
Mar 12 09:02:30 mlspai01 kernel: [413021.868291] [nvidia-vgpu-vfio] €…Ñš÷ÿ€…Ñš÷ÿ€ìaÁÿÿÿÿ€…Ñš÷ÿ¸¹Á@æ
Mar 12 09:02:30 mlspai01 kernel: [413021.868291] µ: Failed to unpin all 0x1 pages Total pages unpinned: 0x0 status: 0x59 ret: -19
Mar 12 09:02:30 mlspai01 kernel: [413021.869292] [nvidia-vgpu-vfio] €…Ñš÷ÿ€…Ñš÷ÿ€ìaÁÿÿÿÿ€…Ñš÷ÿ¸¹Á@æ
Mar 12 09:02:30 mlspai01 kernel: [413021.869292] µ: Failed to unpin all 0x1 pages Total pages unpinned: 0x0 status: 0x59 ret: -19
Mar 12 09:02:30 mlspai01 kernel: [413021.870294] [nvidia-vgpu-vfio] €…Ñš÷ÿ€…Ñš÷ÿ€ìaÁÿÿÿÿ€…Ñš÷ÿ¸¹Á@æ
Mar 12 09:02:30 mlspai01 kernel: [413021.870294] µ: Failed to unpin all 0x1 pages Total pages unpinned: 0x0 status: 0x59 ret: -19
Mar 12 09:02:30 mlspai01 kernel: [413021.871440] [nvidia-vgpu-vfio] €…Ñš÷ÿ€…Ñš÷ÿ€ìaÁÿÿÿÿ€…Ñš÷ÿ¸¹Á@æ
Mar 12 09:02:30 mlspai01 kernel: [413021.871440] µ: Failed to unpin all 0x1 pages Total pages unpinned: 0x0 status: 0x59 ret: -19
 
mhmm the line:

Mar 12 09:02:30 mlspai01 kernel: [413021.864886] [nvidia-vgpu-vfio] €…Ñš÷ÿ€…Ñš÷ÿ€ìaÁÿÿÿÿ€…Ñš÷ÿ¸¹Á@æ
seems to indicate a problem inside the nvidia driver, which version do you use ?

also since you seem to passthrough the entire gpu anyway, why not try to go that route, simple passthrough the whole devices and use the regular nvidia driver in the guest, shouldn't that work too?
 
@dcsapak, I upgraded Proxmox 7.4 to 8 so now it works perfectly!
Thank you very much for your support!
Best regards,
Hung