vtrandal

New Member
Dec 27, 2015
6
0
1
61
In my Proxmox VE 4.1 installations disabling KVM hardware virtualization fixes an nvidia-smi error message, and I would like to understand why. Here is the error message that results from running nvidia-smi on the Linux guest with KVM hardware virtualization enabled:

Unable to determine the device handle for GPU 0000:0B:00.0: Unknown Error

But when KVM hardware virtualization is disabled the error message does not happen and the GPU is operational in the Linux guest. Disabling KVM hardware virtualization results in very slow guest, otherwise I would consider it a solution.

So then I want to understand how disabling KVM hardware virtualization seemingly fixes the above device handle error and makes the GPU operational. It's like disabling KVM hardware virtualization solves an initial condition problem in the GPU that only becomes a problem when KVM hardware virtualization is enabled.

The GPU is Nvidia Tesla M2090 with various recent driver versions, and the guest is Ubuntu Server 14.04.3.There's a ton of other detail I could add here, but I'm not sure what would be most helpful except perhaps the following list of hardware upon which this error occurs.

motherboards: Asrock EPC612D8 and Asus Z10PE-D8.
processor: Intel Xeon E5-2609v3
 

manu

Proxmox Staff Member
Retired Staff
Mar 3, 2015
806
66
28
Hi vtrandal

Can you please post the output of "lspci | grep VGA" on the host and the guest, and the configuration of the VM ? ( /etc/pve/qemu-server/<VMID>.conf )

This looks like a pci passthrough problem.
 
  • Like
Reactions: vtrandal

vtrandal

New Member
Dec 27, 2015
6
0
1
61
Hello manu. It is a passthrough problem. Passthrough works if "KVM hardware virtualization" is disabled. We would like to understand why that is the case. Disabling "KVM hardware virtualization makes the VM very slow, otherwise we would consider it a solution.

To be clear about the device we are working with we will provide "lspci | grep VGA" as well as "lspci | grep -i nvidia" output on the HOST and the GUEST.

[HOST] root@asrack1:~# lspci | grep -i "vga\|nvidia"
02:00.0 3D controller: NVIDIA Corporation GF110GL [Tesla M2090] (rev a1)
07:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 30)

[GUEST] vtrandal@asrack1-vm110:~$ lspci | grep -i "vga\|nvidia"
00:01.0 VGA compatible controller: Red Hat, Inc. QXL paravirtual graphic card (rev 04)
01:00.0 3D controller: NVIDIA Corporation GF110GL [Tesla M2090] (rev a1)

So then the device that we are passing through is the Nvidia M2090.
root@asrack1:/etc/pve/qemu-server# cat 110.conf
boot: c
bootdisk: ide0
cores: 4
hostpci0: 02:00.0,pcie=1
ide0: local:110/vm-110-disk-1.qcow2,size=68719476736
ide2: local:iso/ubuntu-14.04.3-server-amd64.iso,media=cdrom
kvm: 0
machine: q35
memory: 4096
name: asrack1-vm110
net0: e1000=32:65:35:65:34:64,bridge=vmbr0
numa: 0
ostype: l26
smbios1: uuid=d2667581-e174-403f-a01f-194fb2e58dbe
sockets: 1
vga: qxl

Again, passthrough of the M2090 works if "KVM hardware virtualization" is disabled but the guest VM is very slow. If the VM were not so slow we would take that as a solution.
 
Last edited:

Soulgriever

New Member
Feb 1, 2016
12
0
1
30
Hi Vtrandal Im not sure I fully understand what your problem is as I havent played with tesla cards (yet)
but could you perhaps try to remove "vga: qxl" and change "hostpci0: 02:00.0,pcie=1" to "hostpci0: 02:00,pcie=1,x-vga=on"

I hope this helps
 

vtrandal

New Member
Dec 27, 2015
6
0
1
61
It's not necessary to have a tesla card to understand or reproduce our primary problem. Just create a linux VM with "KVM hardware virtualization" disabled and measure how long it takes for the VM to boot up. We find it takes about 10 minutes. But with "KVM hardware virtualization" enabled it boots up in about 10 seconds. Hence, we say the VM with "KVM hardware virtualization" disabled is much too slow. Otherwise, we would consider disabling "KVM hardware virtualization" to be a solution because only then is the tesla GPU card operational in the guest i.e. it has a valid device handle and the driver can communicate with it.
 

vtrandal

New Member
Dec 27, 2015
6
0
1
61
@Soulgriever, the solution was so to do as you. In the /etc/pve/qemu-server/<vmid>.conf file we removed vga:qxl and added x-vga=on. We must note that we could not add x-vga=on until after the operating system was installed in the quest. So then this thread can be closed resolved if there are no further comments or questions.

However, while passthrough for Nvidia Tesla M2090 is working, it is not working for Nvidia Tesla K80 as noted in this thread:
https://forum.proxmox.com/threads/v...passthrough-device-is-nvidia-tesla-k80.26005/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!