Hi Everyone,
We have a 3 node cluster where one of the nodes has a A16 Nvidia GPU. The installation is working perfectly and we are able to see the GPU when running the nvidia-smi.
We can even select the GPU and the part we want when creating the VM. The problem is that everytime we try to have 2 VGPUs in the same VM, it gives this error:
kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/00000000-0000-0000-0000-000999004004,id=hostpci0,bus=pci.0,addr=0x10: warning: vfio 00000000-0000-0000-0000-000999004004: Could not enable error recovery for the device
kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/00000001-0000-0000-0000-000999004004,id=hostpci1,bus=pci.0,addr=0x11: vfio 00000001-0000-0000-0000-000999004004: error getting device from group 344: Input/output error
Following the NVIDIA documentation, its says that we should be able to do it using C and Q versions but we are unable. Has anyone has the same problems?
Best Regards
We have a 3 node cluster where one of the nodes has a A16 Nvidia GPU. The installation is working perfectly and we are able to see the GPU when running the nvidia-smi.
We can even select the GPU and the part we want when creating the VM. The problem is that everytime we try to have 2 VGPUs in the same VM, it gives this error:
kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/00000000-0000-0000-0000-000999004004,id=hostpci0,bus=pci.0,addr=0x10: warning: vfio 00000000-0000-0000-0000-000999004004: Could not enable error recovery for the device
kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/00000001-0000-0000-0000-000999004004,id=hostpci1,bus=pci.0,addr=0x11: vfio 00000001-0000-0000-0000-000999004004: error getting device from group 344: Input/output error
Following the NVIDIA documentation, its says that we should be able to do it using C and Q versions but we are unable. Has anyone has the same problems?
Best Regards