nvidia P4 vm config mdevctl inavailable

let's_go

New Member
May 17, 2022
1
0
1
hi All, i am new to pve, just install nvidia P4 on pve but i have problem with vm config. it doesn't show mdevctl in config.
Code:
root@pve:~# cat /etc/pve/qemu-server/101.conf
args: -uuid 00000000-0000-0000-0000-000000000101
bios: ovmf
boot: order=sata0
cores: 10
cpu: host
efidisk0: local-lvm:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: none,media=cdrom
machine: pc-q35-8.0
memory: 14096
meta: creation-qemu=8.0.2,ctime=1688742827
name: Win11
net0: virtio=56:24:53:CE:23:89,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
sata0: local-lvm:vm-101-disk-3,size=1T
scsihw: virtio-scsi-single
smbios1: uuid=0535d794-2850-4c6f-aa4d-9fa68f58b374
sockets: 1
tpmstate0: local-lvm:vm-101-disk-1,size=4M,version=v2.0
unused0: local-lvm:vm-101-disk-2
usb0: host=2-1.3,usb3=1
usb1: host=3-1.4,usb3=1
usb2: spice
usb3: host=10c4:ea60,usb3=1
vmgenid: 72fa323e-0154-4c6b-a7d2-cb43915ea3bd



Code:
root@pve:~# pveversion -v
proxmox-ve: 8.0.1 (running kernel: 6.2.16-15-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.5
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
mdevctl inavailable even nvidia-smi has output.
Code:
root@pve:~# mdevctl list

root@pve:~#

Code:
root@pve:~# mdevctl types

root@pve:~#
driver has been install but after reboot or some operation nvidia-smi get error
Code:
root@pve:~# nvidia-smi
Failed to initialize NVML: Unknown Error
root@pve:~#

/usr/share/perl5/PVE/QemuServer.pm
Code:
6165  sub cleanup_pci_devices {
  6166      my ($vmid, $conf) = @_;
  6167
  6168      foreach my $key (keys %$conf) {
  6169          next if $key !~ m/^hostpci(\d+)$/;
  6170          my $hostpciindex = $1;
  6171          my $uuid = PVE::SysFSTools::generate_mdev_uuid($vmid, $hostpciindex);
  6172          my $d = parse_hostpci($conf->{$key});
  6173          if ($d->{mdev}) {
  6174              # NOTE: avoid PVE::SysFSTools::pci_cleanup_mdev_device as it requires PCI ID and we
  6175              # don't want to break ABI just for this two liner
  6176              my $dev_sysfs_dir = "/sys/bus/mdev/devices/$uuid";
  6177
  6178              # some nvidia vgpu driver versions want to clean the mdevs up themselves, and error
  6179              # out when we do it first. so wait for 10 seconds and then try it
  6180              if ($d->{ids}->[0]->[0]->{vendor} =~ m/^(0x)?10de$/) {
  6181                  sleep 10;
  6182              }
  6183
  6184              PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir;
  6185          }
  6186      }
  6187      PVE::QemuServer::PCI::remove_pci_reservation($vmid);
  6188  }
  6189
  6190  sub vm_stop_cleanup {
  6191      my ($storecfg, $vmid, $conf, $keepActive, $apply_pending_changes) = @_;
  6192
  6193      eval {
  6194
  6195          if (!$keepActive) {
--
  8644  sub cleanup_pci_devices {
  8645      my ($vmid, $conf) = @_;
  8646
  8647      foreach my $key (keys %$conf) {
  8648          next if $key !~ m/^hostpci(\d+)$/;
  8649          my $hostpciindex = $1;
  8650          my $uuid = PVE::SysFSTools::generate_mdev_uuid($vmid, $hostpciindex);
  8651          my $d = parse_hostpci($conf->{$key});
  8652          if ($d->{mdev}) {
  8653              # NOTE: avoid PVE::SysFSTools::pci_cleanup_mdev_device as it requires PCI ID and we
  8654              # don't want to break ABI just for this two liner
  8655              my $dev_sysfs_dir = "/sys/bus/mdev/devices/$uuid";
  8656
  8657              # some nvidia vgpu driver versions want to clean the mdevs up themselves, and error
  8658              # out when we do it first. so wait for 10 seconds and then try it
  8659              my $pciid = $d->{pciid}->[0]->{id};
  8660              my $info = PVE::SysFSTools::pci_device_info("$pciid");
  8661              if ($info->{vendor} eq '10de') {
  8662                  sleep 10;
  8663              }
  8664              PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir;
  8665          }
  8666      }
  8667      PVE::QemuServer::PCI::remove_pci_reservation($vmid);
  8668  }
  8669
  8670  sub del_nets_bridge_fdb {
  8671      my ($conf, $vmid) = @_;
  8672
  8673      for my $opt (keys %$conf) {
  8674          next if $opt !~ m/^net(\d+)$/;


root@pve:~# lspci -k
Code:
03:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
        Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Code:
root@pve:~# qm start 101
Subroutine cleanup_pci_devices redefined at /usr/share/perl5/PVE/QemuServer.pm line 8644, <DATA> line 960.
swtpm_setup: Not overwriting existing state file.
Subroutine cleanup_pci_devices redefined at /usr/share/perl5/PVE/QemuServer.pm line 8644, <DATA> line 960.

please help. and let you know if need more info i don't know what info provide for diagnostic. thanks!
 
Last edited:
you half way there u need to be.
mdevctl is for VGPU but you configure whole adapter to vm and you using vfio-pci drivers
you should pass pci device in config like this:

hostpci0: 0000:01:00.0,pcie=1
args: -cpu 'host,-hypervisor,kvm=off'

my mdevctl types and list both return clear, because I don't init device in Proxmox, it passed to VM there it init with nvidia drivers and etc.
 
Right no I'm stuck with VGPU support on P4. so if you interested I can make a small instruction for you than I finished.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!