gvt-g mdev missing after updating to 6.2-6

everwisher

Member
Jun 23, 2019
15
0
6
37
I just did apt upgrade and gvt-g error followed. Here's the log:

Use of uninitialized value $vmid in sprintf at /usr/share/perl5/PVE/SysFSTools.pm line 358.
kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:02.0/00000001-0000-0000-0000-000000000000,id=hostpci1,bus=pci.0,addr=0x11: vfio /sys/bus/pci/devices/0000:00:02.0/00000001-0000-0000-0000-000000000000: no such host device: No such file or directory
TASK ERROR: start failed: QEMU exited with code 1

I checked associated files and found the folder "/sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_8/devices" still present but there is nothing in it, unlike things were before the latest upgrade.

I manually created a mdev by the commandline:
Code:
echo '00000001-0000-0000-0000-000000000000' > /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_8/create
and the vm using gvt-g works like it did

How can I fix this glitch?
 

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
8,790
1,159
174
34
Vienna
Is there anything in dmesg? Can you try with an older kernel?
 

BobMccapherey

Member
Apr 25, 2020
33
0
6
41
I have the same error - the device does seem to be made. However I am having the same issue.

It seems like the device is being made with the right value:

Code:
[ 1277.732498] vfio_mdev 00000000-0000-0000-0000-000000000401: Adding to iommu group 18
[ 1277.732499] vfio_mdev 00000000-0000-0000-0000-000000000401: MDEV: group_id = 18
[ 1278.087861] device tap401i0 entered promiscuous mode
[ 1278.093354] custnet: port 2(tap401i0) entered blocking state
[ 1278.093355] custnet: port 2(tap401i0) entered disabled state
[ 1278.093414] custnet: port 2(tap401i0) entered blocking state
[ 1278.093415] custnet: port 2(tap401i0) entered forwarding state
[ 1278.238873] custnet: port 2(tap401i0) entered disabled state
[ 1278.688167] vfio_mdev 00000000-0000-0000-0000-000000000401: Removing from iommu group 18
[ 1278.688170] vfio_mdev 00000000-0000-0000-0000-000000000401: MDEV: detaching iommu

However, qemu is using the wrong path when attempting to start a VM

Code:
Use of uninitialized value $vmid in sprintf at /usr/share/perl5/PVE/SysFSTools.pm line 358.
kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:02.0/00000000-0000-0000-0000-000000000000,id=hostpci0,bus=pci.0,addr=0x10,rombar=0: vfio /sys/bus/pci/devices/0000:00:02.0/00000000-0000-0000-0000-000000000000: no such host device: No such file or directory
TASK ERROR: start failed: QEMU exited with code 1

Looks like the wrong path is being used above when starting QEMU.

Now here's the weird thing - I have 3 other hosts in the cluster that work fine. This host that is having issues is a newly-provisioned one that was recently configured, updated, and added to the existing cluster. When it fails to start on this one node, HA migrates it to another node where it starts with no issues. All hosts are on the newest version of packages from the same repo.
 
Last edited:

jollyrogr

New Member
Jul 12, 2020
6
0
1
43
my error message was:
root@pve02:~# qm start 100
Use of uninitialized value $vmid in sprintf at /usr/share/perl5/PVE/SysFSTools.pm line 358.
kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:02.0/00000000-0000-0000-0000-000000000000,id=hostpci0,bus=pci.0,addr=0x10,rombar=0: vfio /sys/bus/pci/devices/0000:00:02.0/00000000-0000-0000-0000-000000000000: no such host device: No such file or directory
start failed: QEMU exited with code 1

I tried
Code:
echo '00000000-0000-0000-0000-000000000000' > /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create

VM still wont start, error is now:
Use of uninitialized value $vmid in sprintf at /usr/share/perl5/PVE/SysFSTools.pm line 358.
pci device '0000:00:02.0' has no available instances of 'i915-GVTg_V5_4'
 

BobMccapherey

Member
Apr 25, 2020
33
0
6
41
my error message was:


I tried
Code:
echo '00000000-0000-0000-0000-000000000000' > /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create

VM still wont start, error is now:

Well the problem there is you can't just keep making virtual GPUs - if you have too many it won't let you create more. Likewise if you try to reuse a GUID it won't let you. You'd have to delete the vGPU and readd it (or just reuse the same one).

The problem is that Proxmox generates the GUID to use based on the VMID and puts that at the end of the GUID. For some reason that variable is not getting defined and the GUID is just all zeroes. For example, if your VM ID is 100, there should be a 100 at the end of the GUID. In my case my VM ID is 401, but there's no 401 in the GUID. Either way, something broke where the VM ID is not getting passed correctly to the parameters to infer the GUID of the vGPU and start the VM appropriately.
 

jollyrogr

New Member
Jul 12, 2020
6
0
1
43
For giggles I tried:
echo '00000000-0000-0000-0000-000000000100' > /sys/bus/pci/devices/0000:00:02.0/mdev_supported_types/i915-GVTg_V5_4/create

and the error that came back was:
Use of uninitialized value $vmid in sprintf at /usr/share/perl5/PVE/SysFSTools.pm line 358.
mdev instance '00000000-0000-0000-0000-000000000100' already existed, using it.
kvm: -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:00:02.0/00000000-0000-0000-0000-000000000000,id=hostpci0,bus=pci.0,addr=0x10,rombar=0: vfio /sys/bus/pci/devices/0000:00:02.0/00000000-0000-0000-0000-000000000000: no such host device: No such file or directory
start failed: QEMU exited with code 1

So Bob I think you're on to something but I dunno what the problem is. It's over my head.

dmesg:
[19427.926188] vfio_mdev 00000000-0000-0000-0000-000000000100: Removing from iommu group 14
[19427.926194] vfio_mdev 00000000-0000-0000-0000-000000000100: MDEV: detaching iommu
 

BobMccapherey

Member
Apr 25, 2020
33
0
6
41
For giggles I tried:


and the error that came back was:


So Bob I think you're on to something but I dunno what the problem is. It's over my head.

dmesg:

So at first glance this looks like it wants to use the correct mdev GUID. Then when it actually tries to run QEMU it uses the wrong one. I tried looking at the conf files in /etc/pve/qemu-server but the GUID doesn't seem to be defined there and seems to be generated by Proxmox. So there's definitely a bug in the code in file /usr/share/perl5/PVE/SysFSTools.pm line 358 where whatever calls it isn't passing the vmid. The problem is there's no clear indication what calls this function (generate_mdev_uuid) because we don't have a complete stack trace from the command log output.

Perl:
# encode the hostpci index and vmid into the uuid
sub generate_mdev_uuid {
    my ($vmid, $index) = @_;

    my $string = sprintf("%08d-0000-0000-0000-%012d", $index, $vmid);

    return $string;
}
 

BobMccapherey

Member
Apr 25, 2020
33
0
6
41
Is there an easy way to downgrade to 6.2-4? Seems that newer versions have completely hosed starting mdev gvt-g machines.
 

BobMccapherey

Member
Apr 25, 2020
33
0
6
41
I think I figured out the issue:

/usr/share/perl5/PVE/QemuServer/PCI.pm line 425 references a variable $vmid, but it's not declared or defined anywhere else in the file.
 

BobMccapherey

Member
Apr 25, 2020
33
0
6
41
Had to do this in order to fix things:

apt install libpve-common-perl=6.1-3 qemu-server=6.2-3 pve-manager=6.2-6 pve-container=3.1-8 pve-qemu-kvm=5.0.0-4 and then reboot every node in the cluster.

I based this on my recent apt logs to downgrade to a known working version.
 

jollyrogr

New Member
Jul 12, 2020
6
0
1
43
i just updated the node today and the issue appears to be fixed with the new updates. :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!