Failed to destroy vGPU device.

DuyQuy

Member
Apr 26, 2022
17
0
6
Hi all.

I have some error with vGPU.
My machine configuration is E5-2696 v4 and Nvidia Tesla M40.

If I shutdown VM with shutdown buttom on Proxmox GUI then i have error Failed to destroy vGPU,

1674013618921.png

And i don't start this VM again.

1674013652422.png

I have running kernel 5.15.74-1-pve and pve version: 7.3-4.

Config files /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt initcall_blacklist=sysfb_init"

I have attach dmesg bellow.

Check deeper with nvidia-smi i see vGPU for VM still running. It not destroy.

1674014029857.png

And mvdevctl not listen it.

1674014079365.png

this doesn't happen if I choose stop instead of shutting down the VM.

Plese help me. Thanks
 

Attachments

  • dmesg.txt
    15.1 KB · Views: 4
what does
Code:
ls -l /sys/bus/mdev/devices
show?
is it visible there? (should be a directory for every existing mediated device with its uuid, in your example that would be 00000000-0000-0000-0000-000000000100)
if yes, can you remove it with:

Code:
echo 1 > /sys/bus/mdev/devices/UUID/remove
?
 
I can reproduce this issue using nvidia grid driver >= 15.0.

To get around this, use 14.x drivers.



Nevertheless my environment:

Code:
root@myhost:~# uname -a
Linux hostname 5.15.83-1-pve #1 SMP PVE 5.15.83-1 (2022-12-15T00:00Z) x86_64 GNU/Linux

Code:
root@myhost:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-helper: 7.3-2
pve-kernel-5.15: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve3


Code:
root@myhost:~# nvidia-smi -q

==============NVSMI LOG==============

Timestamp                                 : Fri Jan 27 10:32:00 2023
Driver Version                            : 525.85.07
CUDA Version                              : Not Found
vGPU Driver Capability
        Heterogenous Multi-vGPU           : Supported

Attached GPUs                             : 1
GPU 00000000:C2:00.0
    Product Name                          : Quadro RTX 8000
    Product Brand                         : NVIDIA
    Product Architecture                  : Turing
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    vGPU Device Capability
        Fractional Multi-vGPU             : Supported
        Heterogeneous Time-Slice Profiles : Supported
        Heterogeneous Time-Slice Sizes    : Not Supported
    MIG Mode
        Current                           : N/A
        Pending                           : N/A
    Accounting Mode                       : Enabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 0321119003514
    GPU UUID                              : GPU-68178fda-8450-cdf7-3278-75bfba276169
    Minor Number                          : 1
    VBIOS Version                         : 90.02.28.00.0B
    MultiGPU Board                        : No
    Board ID                              : 0xc200
    Board Part Number                     : 900-5G150-0300-000
    GPU Part Number                       : 1E30-875-A1
    Module ID                             : 1
    Inforom Version
        Image Version                     : G150.0500.00.03
        OEM Object                        : 1.1
        ECC Object                        : 5.0
        Power Management Object           : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GSP Firmware Version                  : N/A
    GPU Virtualization Mode
        Virtualization Mode               : Host VGPU
        Host VGPU Mode                    : Non SR-IOV
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0xC2
        Device                            : 0x00
        Domain                            : 0x0000
        Device Id                         : 0x1E3010DE
        Bus Id                            : 00000000:C2:00.0
        Sub System Id                     : 0x129E103C

...

As you can see on nvidia-smi vgpu the VM workstation (VMID: 171) has two instances. This is produced by starting VM, backup VM (with stop mode) and then there is a new vGPU allocated but the old one is still listed (and not released by driver?).
This is only possible, because i am using a 8GB-vGPU-Profile and still have some instances left (using 4/6). So after this reboot there are 5/6 used, and after start/stop again there were 6/6 taken.

After stopping VM through Proxmox-GUI again, there is no mdev listed in /sys/bus/mdev/devices/ (belonging to 171).

Code:
root@myhost:~# nvidia-smi vgpu
Fri Jan 27 10:33:20 2023      
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.07              Driver Version: 525.85.07                 |
|---------------------------------+------------------------------+------------+
| GPU  Name                       | Bus-Id                       | GPU-Util   |
|      vGPU ID     Name           | VM ID     VM Name            | vGPU-Util  |
|=================================+==============================+============|
|   0  Quadro RTX 8000            | 00000000:C2:00.0             |   1%       |
|      3251634194  GRID RTX800... | 0000...  windows,debug-th... |      0%    |
|      3251634200  GRID RTX800... | 0000...  workstation,debu... |      0%    |
|      3251634312  GRID RTX800... | 0000...  docker-cuda-1,de... |      0%    |
|      3251634318  GRID RTX800... | 0000...  docker-cuda-2,de... |      0%    |
|      3251634698  GRID RTX800... | 0000...  workstation,debu... |      0%    |
+---------------------------------+------------------------------+------------+




Code:
[  963.693888] fwbr171i0: port 2(tap171i0) entered disabled state
[  963.745113] fwbr171i0: port 1(fwln171i0) entered disabled state
[  963.746298] vmbr0: port 8(fwpr171p0) entered disabled state
[  963.764476] device fwln171i0 left promiscuous mode
[  963.765187] fwbr171i0: port 1(fwln171i0) entered disabled state
[  963.807137] device fwpr171p0 left promiscuous mode
[  963.807851] vmbr0: port 8(fwpr171p0) entered disabled state
[  964.128222] [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000171: vGPU destroy failed: 0x38
[  964.128876] [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000171: Failed to destroy vGPU device, ret: -1
[  964.152619] [nvidia-vgpu-vfio] No vGPU device found, close failed
[  964.219130] ------------[ cut here ]------------
[  964.219790] WARNING: CPU: 21 PID: 23456 at drivers/vfio/vfio_iommu_type1.c:2592 vfio_iommu_type1_detach_group+0x6de/0x6f0 [vfio_iommu_type1]
[  964.220975] Modules linked in: tcp_diag inet_diag overlay sctp ip6_udp_tunnel udp_tunnel vfio_pci vfio_pci_core vfio_virqfd xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter softdog bonding tls nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd nvidia_vgpu_vfio(O) kvm_amd kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel snd_hda_codec_hdmi ast drm_vram_helper snd_hda_intel drm_ttm_helper nvidia(PO) snd_intel_dspcfg crypto_simd ttm snd_intel_sdw_acpi cryptd rapl snd_hda_codec pcspkr efi_pstore drm_kms_helper snd_hda_core cec rc_core snd_hwdep snd_pcm rndis_host i2c_algo_bit ucsi_ccg fb_sys_fops cdc_ether snd_timer mdev syscopyarea typec_ucsi vfio_iommu_type1 snd usbnet sysfillrect acpi_ipmi soundcore
[  964.221013]  joydev input_leds typec vfio mii corsair_psu sysimgblt ccp ptdma k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c simplefb usbmouse hid_generic usbhid hid crc32_pclmul i2c_nvidia_gpu xhci_pci ahci xhci_pci_renesas libahci bnxt_en nvme xhci_hcd i2c_piix4 nvme_core
[  964.230901] CPU: 21 PID: 23456 Comm: kvm Tainted: P           O      5.15.83-1-pve #1
[  964.231790] Hardware name: Supermicro Super Server/H12SSL-NT, BIOS 1.0 08/19/2020
[  964.232682] RIP: 0010:vfio_iommu_type1_detach_group+0x6de/0x6f0 [vfio_iommu_type1]
[  964.233587] Code: 05 ff ff ff 48 83 7b 78 00 74 0a eb 24 48 89 df e8 77 e9 ff ff 4c 89 ef e8 9f 84 cf e8 48 89 c6 48 85 c0 75 e8 e9 fd fd ff ff <0f> 0b e9 09 ff ff ff 0f 0b eb e0 e8 a2 67 3e e9 66 90 0f 1f 44 00
[  964.235466] RSP: 0018:ffffbbffeeebfcf8 EFLAGS: 00010286
[  964.236442] RAX: ffff90a330a5cd80 RBX: ffff90a330a5cd80 RCX: 000000000080005b
[  964.237381] RDX: ffff90a4e7ced758 RSI: 000000000080005b RDI: ffff90a4e7ced740
[  964.238322] RBP: ffffbbffeeebfd68 R08: 0000000000000001 R09: 0000000000000000
[  964.239275] R10: ffff90a25f15e400 R11: ffff90a27307c710 R12: ffff90a25f15e400
[  964.240246] R13: 0000000000000000 R14: ffffbbffeeebfd18 R15: ffff90a36e0c2e00
[  964.241208] FS:  0000000000000000(0000) GS:ffff90e08e940000(0000) knlGS:0000000000000000
[  964.242174] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  964.243142] CR2: 000000a954b3c000 CR3: 00000001f71f2000 CR4: 0000000000350ee0
[  964.244109] Call Trace:
[  964.245078]  <TASK>
[  964.246043]  ? dentry_free+0x37/0x80
[  964.247022]  __vfio_group_unset_container+0x4d/0x190 [vfio]
[  964.248013]  vfio_group_try_dissolve_container+0x2d/0x40 [vfio]
[  964.248997]  vfio_group_fops_release+0x25/0x60 [vfio]
[  964.249964]  __fput+0x9f/0x280
[  964.250936]  ____fput+0xe/0x20
[  964.251897]  task_work_run+0x6d/0xb0
[  964.252867]  do_exit+0x354/0xa20
[  964.253818]  ? wake_up_state+0x10/0x20
[  964.254772]  do_group_exit+0x3b/0xb0
[  964.255714]  __x64_sys_exit_group+0x18/0x20
[  964.256652]  do_syscall_64+0x5c/0xc0
[  964.257590]  ? irqentry_exit_to_user_mode+0x9/0x20
[  964.258516]  ? irqentry_exit+0x1d/0x30
[  964.259427]  ? exc_page_fault+0x89/0x170
[  964.260329]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[  964.261231] RIP: 0033:0x7f47dbd18f99
[  964.262112] Code: Unable to access opcode bytes at RIP 0x7f47dbd18f6f.
[  964.262999] RSP: 002b:00007ffc003c2ae8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[  964.263869] RAX: ffffffffffffffda RBX: 00007f47dbe1b880 RCX: 00007f47dbd18f99
[  964.264723] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
[  964.265554] RBP: 0000000000000000 R08: ffffffffffffdb60 R09: 0000000000000000
[  964.266364] R10: 00007f47d1c54f21 R11: 0000000000000246 R12: 00007f47dbe1b880
[  964.267163] R13: 0000000000000304 R14: 00007f47dbe20e08 R15: 0000000000000000
[  964.267934]  </TASK>
[  964.268678] ---[ end trace 5b03edcbc7fbb087 ]---
[  964.269429] ------------[ cut here ]------------
 
Last edited:
Ok after changing commenting out the lines 6127, 6128 (which are 6099,6100 on my local /usr/share/perl5/PVE/QemuServer.pm) to

Perl:
sub cleanup_pci_devices {
    my ($vmid, $conf) = @_;

    foreach my $key (keys %$conf) {
        next if $key !~ m/^hostpci(\d+)$/;
        my $hostpciindex = $1;
        my $uuid = PVE::SysFSTools::generate_mdev_uuid($vmid, $hostpciindex);
        my $d = parse_hostpci($conf->{$key});
        if ($d->{mdev}) {
            sleep(3); # also added by me!
           # NOTE: avoid PVE::SysFSTools::pci_cleanup_mdev_device as it requires PCI ID and we
           # don't want to break ABI just for this two liner
#           my $dev_sysfs_dir = "/sys/bus/mdev/devices/$uuid";
#           PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir;
        }
    }
    PVE::QemuServer::PCI::remove_pci_reservation($vmid);
}

the shutdown works as expected.

1. Start PVE-Host
2. Auto-Start Guests were started
3. cat /sys/bus/pci/devices/0000\:c2\:00.0/mdev_supported_types/nvidia-269/available_instances shows 2 (which is correct)
4. Shutdown guest via Proxmox-GUI
5. cat /sys/bus/pci/devices/0000\:c2\:00.0/mdev_supported_types/nvidia-269/available_instances shows 3 (which is correct, before we had 2)
6. Starting guest via Proxmox -GUI again
7. cat /sys/bus/pci/devices/0000\:c2\:00.0/mdev_supported_types/nvidia-269/available_instances shows 2 (which is correct, before we had 1)
 
I can reproduce this issue using nvidia grid driver >= 15.0.
sadly i don't currently have access to the newest drivers, so i can't really test that


Ok after changing commenting out the lines 6127, 6128 (which are 6099,6100 on my local /usr/share/perl5/PVE/QemuServer.pm) to

Perl:
sub cleanup_pci_devices {
    my ($vmid, $conf) = @_;

    foreach my $key (keys %$conf) {
        next if $key !~ m/^hostpci(\d+)$/;
        my $hostpciindex = $1;
        my $uuid = PVE::SysFSTools::generate_mdev_uuid($vmid, $hostpciindex);
        my $d = parse_hostpci($conf->{$key});
        if ($d->{mdev}) {
            sleep(3); # also added by me!
           # NOTE: avoid PVE::SysFSTools::pci_cleanup_mdev_device as it requires PCI ID and we
           # don't want to break ABI just for this two liner
#           my $dev_sysfs_dir = "/sys/bus/mdev/devices/$uuid";
#           PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir;
        }
    }
    PVE::QemuServer::PCI::remove_pci_reservation($vmid);
}

the shutdown works as expected.

1. Start PVE-Host
2. Auto-Start Guests were started
3. cat /sys/bus/pci/devices/0000\:c2\:00.0/mdev_supported_types/nvidia-269/available_instances shows 2 (which is correct)
4. Shutdown guest via Proxmox-GUI
5. cat /sys/bus/pci/devices/0000\:c2\:00.0/mdev_supported_types/nvidia-269/available_instances shows 3 (which is correct, before we had 2)
6. Starting guest via Proxmox -GUI again
7. cat /sys/bus/pci/devices/0000\:c2\:00.0/mdev_supported_types/nvidia-269/available_instances shows 2 (which is correct, before we had 1)
if what you write here is true, that's really bad, since nvidia seemingly tries to automatically manage the mdev devices on vm shutdown (which we do curently)
and seemingly does not allow that manually anymore..

i am not sure how we can handle this, i don't really want to write new versions of the code for every driver version
i'll check if the nvidia driver (<=14) also cleans that up itself, and if it does we could check he vendor id of the pci card for the cleanup (intel for example does not clean it up)
 
sadly i don't currently have access to the newest drivers, so i can't really test that
I'am running on 14.4 drivers since a few weeks and before on 14.3 which were really stable and did not clean up automatically. This behaviout is also new to me on drivers 15.0 and 15.1.

I have no urgent reason to upgrade to 15.0/15.1 drivers (since 14.x is working great with proxmox) but i'am in the vgpu_unlock-community where this problem is raised more and more often especially at 15.x. So i just looked to see if this was a bigger problem.

I am not allowed to offer you the new grid drivers as this is against nvidia's license terms, but there is a way to get them.

if what you write here is true, that's really bad, since nvidia seemingly tries to automatically manage the mdev devices on vm shutdown (which we do curently)
that's what I thought, too.

Or is it simply possible to check if /sys/bus/mdev/devices/$uuid is already released and does not exist? Or is this what PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir; does? Maybe then the sleep is enough to prevent an race-condition between proxmox removing mdev and nvidia ... Should i test it?

If you want me to test something, just let me know. I have a small test system with options for 14.4, 15.0 and 15.1.
 
Last edited:
I am not allowed to offer you the new grid drivers as this is against nvidia's license terms, but there is a way to get them.
yeah i already tried multiple times (through various channels) to contact nvidia if they want to give us drivers to test, but i never got a response.. (it's not really necessary nor practical for us to buy licenses, we literally only use it to test currently...)
we had a trial license, but that ran out...

Or is it simply possible to check if /sys/bus/mdev/devices/$uuid is already released and does not exist? Or is this what PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir; does? Maybe then the sleep is enough to prevent an race-condition between proxmox removing mdev and nvidia ... Should i test it
that line writes '1' into the remove property of the mdev (which is the way to remove it normally) if it exists
if it's already removed by the nvidia driver, it won't get executed

i don't really want to add a timeout here, that can only go wrong since instead of failing every time, it'll only fail sometimes, but you can't set the timeout so high to be completely sure it won't interfere....

is there maybe an option in the kernel module to disable that behaviour?
 
yeah i already tried multiple times (through various channels) to contact nvidia if they want to give us drivers to test, but i never got a response.
If you DM me i can give you a hint. I wrote the message as i got an error that i can not write you.

that line writes '1' into the remove property of the mdev (which is the way to remove it normally) if it exists
if it's already removed by the nvidia driver, it won't get executed
mh, i don’t unterstand why there is an error, and there is no error when this lines are commented out. Maybe the error is somewhere else? How do i print into log in perl? Maybe i can figure out some other code which fails.

EDIT: Ok, maybe Proxmox is ”faster” than nvidia, so Proxmox is calling remove to mdev, and then nvidia’s mdev-release fails, so that it leaves allocated as in nvidia-smi described above.

is there maybe an option in the kernel module to disable that behaviour?
I’ll dive into the docs from nvidia … but the “troubleshoot” section is very short :rolleyes:
 
Last edited:
EDIT: Ok, maybe Proxmox is ”faster” than nvidia, so Proxmox is calling remove to mdev, and then nvidia’s mdev-release fails, so that it leaves allocated as in nvidia-smi described above.
yes that is probably the issue. IMHO the driver could just ignore a device that already disappeared, but i don't know the internals of the nvidia driver ofc.

I’ll dive into the docs from nvidia … but the “troubleshoot” section is very short :rolleyes:
can you post the output of 'modinfo nvidia' and 'modinfo nvida_vgpu_vfio' ?
 
Code:
filename:       /lib/modules/5.15.83-1-pve/kernel/drivers/video/nvidia.ko
firmware:       nvidia/525.85.07/gsp_tu10x.bin
firmware:       nvidia/525.85.07/gsp_ad10x.bin
alias:          char-major-195-*
version:        525.85.07
supported:      external
license:        NVIDIA
srcversion:     0D4DAA775E6C1A10E833A66
alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm
retpoline:      Y
name:           nvidia
vermagic:       5.15.83-1-pve SMP mod_unload modversions
parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_EnablePCIeGen3:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_TCEBypassMode:int
parm:           NVreg_EnableStreamMemOPs:int
parm:           NVreg_RestrictProfilingToAdminUsers:int
parm:           NVreg_PreserveVideoMemoryAllocations:int
parm:           NVreg_EnableS0ixPowerManagement:int
parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
parm:           NVreg_DynamicPowerManagement:int
parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
parm:           NVreg_EnableGpuFirmware:int
parm:           NVreg_EnableGpuFirmwareLogs:int
parm:           NVreg_OpenRmEnableUnsupportedGpus:int
parm:           NVreg_EnableUserNUMAManagement:int
parm:           NVreg_MemoryPoolSize:int
parm:           NVreg_KMallocHeapMaxSize:int
parm:           NVreg_VMallocHeapMaxSize:int
parm:           NVreg_IgnoreMMIOCheck:int
parm:           NVreg_NvLinkDisable:int
parm:           NVreg_EnablePCIERelaxedOrderingMode:int
parm:           NVreg_RegisterPCIDriver:int
parm:           NVreg_EnableDbgBreakpoint:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RegistryDwordsPerDevice:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_GpuBlacklist:charp
parm:           NVreg_TemporaryFilePath:charp
parm:           NVreg_ExcludedGpus:charp
parm:           NVreg_DmaRemapPeerMmio:int
parm:           rm_firmware_active:charp


Here nvidia_vgpu_vfio is listed but modinfo nvidia_vgpu_vfio fails with modinfo: ERROR: Module nvida_vgpu not found.

Code:
lspci  -s c2:00.0 -v
c2:00.0 VGA compatible controller: NVIDIA Corporation TU102GL [Quadro RTX 6000/8000] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: Hewlett-Packard Company TU102GL [Quadro RTX 6000/8000]
        Flags: bus master, fast devsel, latency 0, IRQ 627, IOMMU group 2
        Memory at f3000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 17fe0000000 (64-bit, prefetchable) [size=256M]
        Memory at 17ff0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at f4000000 [virtual] [disabled] [size=512K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Capabilities: [bb0] Physical Resizable BAR
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_vgpu_vfio, nvidia
 
Following closely. Having the same issue on an HP DL380p - xeons E-2640 - Tesla P4 - Nvidia Grid 525.85.07 (same version as OP).

Looks like using some profiles triggers less issues (A69 which gave 4GB per VM was pretty stable while A67 = 1GB per VM crashes most of the time on VM shutdown).
 
Ok after changing commenting out the lines 6127, 6128 (which are 6099,6100 on my local /usr/share/perl5/PVE/QemuServer.pm) to

Perl:
sub cleanup_pci_devices {
    my ($vmid, $conf) = @_;

    foreach my $key (keys %$conf) {
        next if $key !~ m/^hostpci(\d+)$/;
        my $hostpciindex = $1;
        my $uuid = PVE::SysFSTools::generate_mdev_uuid($vmid, $hostpciindex);
        my $d = parse_hostpci($conf->{$key});
        if ($d->{mdev}) {
            sleep(3); # also added by me!
           # NOTE: avoid PVE::SysFSTools::pci_cleanup_mdev_device as it requires PCI ID and we
           # don't want to break ABI just for this two liner
#           my $dev_sysfs_dir = "/sys/bus/mdev/devices/$uuid";
#           PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir;
        }
    }
    PVE::QemuServer::PCI::remove_pci_reservation($vmid);
}

the shutdown works as expected.

1. Start PVE-Host
2. Auto-Start Guests were started
3. cat /sys/bus/pci/devices/0000\:c2\:00.0/mdev_supported_types/nvidia-269/available_instances shows 2 (which is correct)
4. Shutdown guest via Proxmox-GUI
5. cat /sys/bus/pci/devices/0000\:c2\:00.0/mdev_supported_types/nvidia-269/available_instances shows 3 (which is correct, before we had 2)
6. Starting guest via Proxmox -GUI again
7. cat /sys/bus/pci/devices/0000\:c2\:00.0/mdev_supported_types/nvidia-269/available_instances shows 2 (which is correct, before we had 1)
So do I need to add this to my system?
 
So do I need to add this to my system?
This is just a temporary fix and may break your system/mdevs. Only use this if you're using nvidia mdevs and no other mdevs like network-cards, ...
Additonally this will be overwritten on proxmox updates.
 
So do I need to add this to my system?
If I'm getting this right, adding this to your system would cause other PCI decives to not clear properly. I believe we should rather add an "if" condition to skip the two lines if the device is our GPU. Not sure what would be the variable to check though.
 
If I'm getting this right, adding this to your system would cause other PCI decives to not clear properly. I believe we should rather add an "if" condition to skip the two lines if the device is our GPU. Not sure what would be the variable to check though.
it's not that simple, unfortunately. First you have to check if it is an Nvidia GPU, and then if the Nvidia driver is >= 15.0. In addition, only < 16.0 should be checked, if nvidia changes something again in the future.
 
Noted thanks. I have applied it for now and can reinstate the 2 lines if needed. Its and otherwise clean install so hopefully it's fixed properly soon. Will a solution be posted in this thread?
 
Noted thanks. I have applied it for now and can reinstate the 2 lines if needed. Its and otherwise clean install so hopefully it's fixed properly soon. Will a solution be posted in this thread?
Hopefully. It's a fresh install on my end as well. I'm in no way shape or form associated to proxmox team. Maybe @dcsapak can enlighten us with a suggestion for a temporary "if" condition we could "safely" add to our systems for the time being. I know it would be a temporary fix but as-is, the whole server needs to be rebooted every time we shutdown a VM. On my end, I actually need to hard reset the server as proxmox won't shut it down on it's own. (Smurf voice) I don't like hard-resets!
 
the diff posted looks harmless enough, but i don't really want to do that as a permanent fix (as mentioned). i'll think about how we can handle that in all situations gracefully....
 
  • Like
Reactions: kbftech
Ok apparently the mdevs are also included during a backup. For example, I have a VM which never actually runs, but it has a vgpu assigned to it.

I now have the problem that I have ghost mdevs from vms that are not running.

Probably this is due to the fact that I have commented out the cleanup, but since the machine also does not really boot/shutdown, the cleanup from nvidia also does not take effect.


Here mdev 00000000-0000-0000-0000-000000000107 never booted since a while ago, and on the backup job of vm 107 last night the mdev got assinged and not cleaned up.

Code:
root@myhost:~# ls /sys/bus/pci/devices/0000:c2:00.0

00000000-0000-0000-0000-000000000107 
00000000-0000-0000-0000-000000000113 
00000000-0000-0000-0000-000000000161 
00000000-0000-0000-0000-000000000171 
00000000-0000-0000-0000-000000000881
00000000-0000-0000-0000-000000000882

A simple echo 1 > 00000000-0000-0000-0000-000000000107/remove solved my problem.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!