KVM hardware virtualization affects nvidia-smi

vtrandal · Feb 2, 2016

In my Proxmox VE 4.1 installations disabling KVM hardware virtualization fixes an nvidia-smi error message, and I would like to understand why. Here is the error message that results from running nvidia-smi on the Linux guest with KVM hardware virtualization enabled:

Unable to determine the device handle for GPU 0000:0B:00.0: Unknown Error

But when KVM hardware virtualization is disabled the error message does not happen and the GPU is operational in the Linux guest. Disabling KVM hardware virtualization results in very slow guest, otherwise I would consider it a solution.

So then I want to understand how disabling KVM hardware virtualization seemingly fixes the above device handle error and makes the GPU operational. It's like disabling KVM hardware virtualization solves an initial condition problem in the GPU that only becomes a problem when KVM hardware virtualization is enabled.

The GPU is Nvidia Tesla M2090 with various recent driver versions, and the guest is Ubuntu Server 14.04.3.There's a ton of other detail I could add here, but I'm not sure what would be most helpful except perhaps the following list of hardware upon which this error occurs.

motherboards: Asrock EPC612D8 and Asus Z10PE-D8.
processor: Intel Xeon E5-2609v3

manu · Feb 3, 2016

Hi vtrandal

Can you please post the output of "lspci | grep VGA" on the host and the guest, and the configuration of the VM ? ( /etc/pve/qemu-server/<VMID>.conf )

This looks like a pci passthrough problem.

vtrandal · Feb 3, 2016

Hello manu. It is a passthrough problem. Passthrough works if "KVM hardware virtualization" is disabled. We would like to understand why that is the case. Disabling "KVM hardware virtualization makes the VM very slow, otherwise we would consider it a solution.

To be clear about the device we are working with we will provide "lspci | grep VGA" as well as "lspci | grep -i nvidia" output on the HOST and the GUEST.

[HOST] root@asrack1:~# lspci | grep -i "vga\|nvidia"
02:00.0 3D controller: NVIDIA Corporation GF110GL [Tesla M2090] (rev a1)
07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)

[GUEST] vtrandal@asrack1-vm110:~$ lspci | grep -i "vga\|nvidia"
00:01.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
01:00.0 3D controller: NVIDIA Corporation GF110GL [Tesla M2090] (rev a1)

So then the device that we are passing through is the Nvidia M2090.
root@asrack1:/etc/pve/qemu-server# cat 110.conf
boot: c
bootdisk: ide0
cores: 4
hostpci0: 02:00.0,pcie=1
ide0: local:110/vm-110-disk-1.qcow2,size=68719476736
ide2: local:iso/ubuntu-14.04.3-server-amd64.iso,media=cdrom
kvm: 0
machine: q35
memory: 4096
name: asrack1-vm110
net0: e1000=32:65:35:65:34:64,bridge=vmbr0
numa: 0
ostype: l26
smbios1: uuid=d2667581-e174-403f-a01f-194fb2e58dbe
sockets: 1
vga: qxl

Again, passthrough of the M2090 works if "KVM hardware virtualization" is disabled but the guest VM is very slow. If the VM were not so slow we would take that as a solution.

Soulgriever · Feb 4, 2016

Hi Vtrandal Im not sure I fully understand what your problem is as I havent played with tesla cards (yet)
but could you perhaps try to remove "vga: qxl" and change "hostpci0: 02:00.0,pcie=1" to "hostpci0: 02:00,pcie=1,x-vga=on"

I hope this helps

vtrandal · Feb 5, 2016

It's not necessary to have a tesla card to understand or reproduce our primary problem. Just create a linux VM with "KVM hardware virtualization" disabled and measure how long it takes for the VM to boot up. We find it takes about 10 minutes. But with "KVM hardware virtualization" enabled it boots up in about 10 seconds. Hence, we say the VM with "KVM hardware virtualization" disabled is much too slow. Otherwise, we would consider disabling "KVM hardware virtualization" to be a solution because only then is the tesla GPU card operational in the guest i.e. it has a valid device handle and the driver can communicate with it.

vtrandal · Feb 12, 2016

@Soulgriever, the solution was so to do as you. In the /etc/pve/qemu-server/<vmid>.conf file we removed vga:qxl and added x-vga=on. We must note that we could not add x-vga=on until after the operating system was installed in the quest. So then this thread can be closed resolved if there are no further comments or questions.

However, while passthrough for Nvidia Tesla M2090 is working, it is not working for Nvidia Tesla K80 as noted in this thread:
https://forum.proxmox.com/threads/v...passthrough-device-is-nvidia-tesla-k80.26005/

Search

Search

KVM hardware virtualization affects nvidia-smi

vtrandal

New Member

manu

Proxmox Staff Member

vtrandal

New Member

Soulgriever

New Member

vtrandal

New Member

vtrandal

New Member