Hi everyone,
I’m currently running a Proxmox VE cluster with two physical servers, each equipped with its own GPU. I’m looking to set up a virtual machine that can leverage both GPUs simultaneously, even though they’re on separate nodes.
I know GPU passthrough is possible on a per-node basis using VFIO, but is there any way (officially supported or workaround) to make a single VM access and utilize GPUs from two different physical servers?
Some context:
Is there a cluster-aware GPU pooling solution or passthrough trick that could help? Or should I just create two separate VMs and use a distributed compute framework?
I’m currently running a Proxmox VE cluster with two physical servers, each equipped with its own GPU. I’m looking to set up a virtual machine that can leverage both GPUs simultaneously, even though they’re on separate nodes.
I know GPU passthrough is possible on a per-node basis using VFIO, but is there any way (officially supported or workaround) to make a single VM access and utilize GPUs from two different physical servers?
Some context:
- The GPUs are different models but both NVIDIA.
- I want to use this for AI training workloads (e.g., PyTorch/TensorFlow).
- Ideally, I’d like to avoid splitting the workload manually unless there’s no alternative.
Is there a cluster-aware GPU pooling solution or passthrough trick that could help? Or should I just create two separate VMs and use a distributed compute framework?