nvidia P4 vm config mdevctl inavailable

let's_go · Oct 29, 2023

hi All, i am new to pve, just install nvidia P4 on pve but i have problem with vm config. it doesn't show mdevctl in config.

Code:

root@pve:~# cat /etc/pve/qemu-server/101.conf
args: -uuid 00000000-0000-0000-0000-000000000101
bios: ovmf
boot: order=sata0
cores: 10
cpu: host
efidisk0: local-lvm:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: none,media=cdrom
machine: pc-q35-8.0
memory: 14096
meta: creation-qemu=8.0.2,ctime=1688742827
name: Win11
net0: virtio=56:24:53:CE:23:89,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
sata0: local-lvm:vm-101-disk-3,size=1T
scsihw: virtio-scsi-single
smbios1: uuid=0535d794-2850-4c6f-aa4d-9fa68f58b374
sockets: 1
tpmstate0: local-lvm:vm-101-disk-1,size=4M,version=v2.0
unused0: local-lvm:vm-101-disk-2
usb0: host=2-1.3,usb3=1
usb1: host=3-1.4,usb3=1
usb2: spice
usb3: host=10c4:ea60,usb3=1
vmgenid: 72fa323e-0154-4c6b-a7d2-cb43915ea3bd

Code:

root@pve:~# pveversion -v
proxmox-ve: 8.0.1 (running kernel: 6.2.16-15-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.5
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

mdevctl inavailable even nvidia-smi has output.

Code:

root@pve:~# mdevctl list

root@pve:~#

Code:

root@pve:~# mdevctl types

root@pve:~#

driver has been install but after reboot or some operation nvidia-smi get error

Code:

root@pve:~# nvidia-smi
Failed to initialize NVML: Unknown Error
root@pve:~#

/usr/share/perl5/PVE/QemuServer.pm

Code:

6165  sub cleanup_pci_devices {
  6166      my ($vmid, $conf) = @_;
  6167
  6168      foreach my $key (keys %$conf) {
  6169          next if $key !~ m/^hostpci(\d+)$/;
  6170          my $hostpciindex = $1;
  6171          my $uuid = PVE::SysFSTools::generate_mdev_uuid($vmid, $hostpciindex);
  6172          my $d = parse_hostpci($conf->{$key});
  6173          if ($d->{mdev}) {
  6174              # NOTE: avoid PVE::SysFSTools::pci_cleanup_mdev_device as it requires PCI ID and we
  6175              # don't want to break ABI just for this two liner
  6176              my $dev_sysfs_dir = "/sys/bus/mdev/devices/$uuid";
  6177
  6178              # some nvidia vgpu driver versions want to clean the mdevs up themselves, and error
  6179              # out when we do it first. so wait for 10 seconds and then try it
  6180              if ($d->{ids}->[0]->[0]->{vendor} =~ m/^(0x)?10de$/) {
  6181                  sleep 10;
  6182              }
  6183
  6184              PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir;
  6185          }
  6186      }
  6187      PVE::QemuServer::PCI::remove_pci_reservation($vmid);
  6188  }
  6189
  6190  sub vm_stop_cleanup {
  6191      my ($storecfg, $vmid, $conf, $keepActive, $apply_pending_changes) = @_;
  6192
  6193      eval {
  6194
  6195          if (!$keepActive) {
--
  8644  sub cleanup_pci_devices {
  8645      my ($vmid, $conf) = @_;
  8646
  8647      foreach my $key (keys %$conf) {
  8648          next if $key !~ m/^hostpci(\d+)$/;
  8649          my $hostpciindex = $1;
  8650          my $uuid = PVE::SysFSTools::generate_mdev_uuid($vmid, $hostpciindex);
  8651          my $d = parse_hostpci($conf->{$key});
  8652          if ($d->{mdev}) {
  8653              # NOTE: avoid PVE::SysFSTools::pci_cleanup_mdev_device as it requires PCI ID and we
  8654              # don't want to break ABI just for this two liner
  8655              my $dev_sysfs_dir = "/sys/bus/mdev/devices/$uuid";
  8656
  8657              # some nvidia vgpu driver versions want to clean the mdevs up themselves, and error
  8658              # out when we do it first. so wait for 10 seconds and then try it
  8659              my $pciid = $d->{pciid}->[0]->{id};
  8660              my $info = PVE::SysFSTools::pci_device_info("$pciid");
  8661              if ($info->{vendor} eq '10de') {
  8662                  sleep 10;
  8663              }
  8664              PVE::SysFSTools::file_write("$dev_sysfs_dir/remove", "1") if -e $dev_sysfs_dir;
  8665          }
  8666      }
  8667      PVE::QemuServer::PCI::remove_pci_reservation($vmid);
  8668  }
  8669
  8670  sub del_nets_bridge_fdb {
  8671      my ($conf, $vmid) = @_;
  8672
  8673      for my $opt (keys %$conf) {
  8674          next if $opt !~ m/^net(\d+)$/;

root@pve:~# lspci -k

Code:

03:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
        Subsystem: NVIDIA Corporation GP104GL [Tesla P4]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Code:

root@pve:~# qm start 101
Subroutine cleanup_pci_devices redefined at /usr/share/perl5/PVE/QemuServer.pm line 8644, <DATA> line 960.
swtpm_setup: Not overwriting existing state file.
Subroutine cleanup_pci_devices redefined at /usr/share/perl5/PVE/QemuServer.pm line 8644, <DATA> line 960.

please help. and let you know if need more info i don't know what info provide for diagnostic. thanks!

FancyBee · Apr 19, 2024

you half way there u need to be.
mdevctl is for VGPU but you configure whole adapter to vm and you using vfio-pci drivers
you should pass pci device in config like this:

hostpci0: 0000:01:00.0,pcie=1
args: -cpu 'host,-hypervisor,kvm=off'

my mdevctl types and list both return clear, because I don't init device in Proxmox, it passed to VM there it init with nvidia drivers and etc.

FancyBee · Apr 19, 2024

Right no I'm stuck with VGPU support on P4. so if you interested I can make a small instruction for you than I finished.

Search

Search

nvidia P4 vm config mdevctl inavailable

let's_go

Member

FancyBee

Member

FancyBee

Member

We value your privacy