Hello,
I have a pve 8.1.4 setup with 2x Nvidia L40s cards in it, as a well as a pair of VMs with Windows 11 installed.
Host and guests drivers as well as as the DLS server setup works well. I can confirm the vGPUs work and can be used in the guest VMs.
Setup here:
The only (minor) nuisance I have left is to do with a blue screen panic that Windows throws out occasionally on boot - DRIVER POWER STATE FAILURE.
I have noticed that this happens only when there is no other VM running with an allocated vGPU instance.
Say I have VM200 and 201 both with vGPU allocated.
If VM200 is running and I boot VM201, it's all nice and smooth.
If both VM200 and VM201 are off and I boot either, I get
It appears as if the L40s needs to be somewhat powered on before the VM boots. I have looked all over the Nvidia documentation but can't find any relevant option for my case.
Has anyone had a similar experience?
thanks
I have a pve 8.1.4 setup with 2x Nvidia L40s cards in it, as a well as a pair of VMs with Windows 11 installed.
Host and guests drivers as well as as the DLS server setup works well. I can confirm the vGPUs work and can be used in the guest VMs.
Setup here:
Code:
root@proxmox:~# pveversion -h
Unknown option: h
USAGE: pveversion [--verbose]
root@proxmox:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5: 6.5.11-8
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.0.7
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.4
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-3
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.0
pve-qemu-kvm: 8.1.5-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1
Code:
root@proxmox:~# qm config 200
balloon: 0
bios: ovmf
boot: order=scsi0;ide0
cores: 4
cpu: x86-64-v2-AES
efidisk0: ss-lvm-pool:vm-200-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: mapping=vGPUpool,mdev=nvidia-1155,pcie=1
machine: pc-q35-8.1
memory: 16384
meta: creation-qemu=8.1.5,ctime=1716915047
name: VM-win11test1
net0: virtio=BC:24:11:5F:9D:85,bridge=vmbr20,firewall=1
net1: virtio=BC:24:11:E5:27:88,bridge=vmbr50,firewall=1
numa: 0
ostype: win11
scsi0: ss-lvm-pool:vm-200-disk-1,iothread=1,size=100G
scsihw: virtio-scsi-single
smbios1: uuid=4f344fe7-460e-4913-9715-251c871a5252
sockets: 2
tpmstate0: ss-lvm-pool:vm-200-disk-2,size=4M,version=v2.0
vmgenid: c5f7655c-c19c-4c0d-bb86-208ca0a77f1c
The only (minor) nuisance I have left is to do with a blue screen panic that Windows throws out occasionally on boot - DRIVER POWER STATE FAILURE.
I have noticed that this happens only when there is no other VM running with an allocated vGPU instance.
Say I have VM200 and 201 both with vGPU allocated.
If VM200 is running and I boot VM201, it's all nice and smooth.
If both VM200 and VM201 are off and I boot either, I get
- DRIVER POWER STATE FAILURE (see attached). The vm reboots
- on second boot, i get PAGE FAULT IN NON PAGED AREA (what failed nvddmkm.sys). The vm reboots
- on third reboot, the VM boots correctly.
It appears as if the L40s needs to be somewhat powered on before the VM boots. I have looked all over the Nvidia documentation but can't find any relevant option for my case.
Has anyone had a similar experience?
thanks