Hello everyone,
A Windows Server 2022 VM no longer starts after the backup job (stop). Other servers are not affected. A special feature of this VM is that a GPU is passed through. I suspect that this may be the issue.
I initially had the problem that the VM went into state "internal error" during the backup.
I found the following entries in the log:
I then expanded the /etc/kernel/cmdline to include the parameter pcie_aspm=off. Now at least the error above no longer occurs.
However, if I start a backup now and the affected VM is finished and tries to boot up, it doesn't work. I then find the following error in the log:
Any suggestions? Thanks and greetings.
A Windows Server 2022 VM no longer starts after the backup job (stop). Other servers are not affected. A special feature of this VM is that a GPU is passed through. I suspect that this may be the issue.
I initially had the problem that the VM went into state "internal error" during the backup.
I found the following entries in the log:
Code:
Nov 05 14:50:56 proxmox kernel: pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: 0000:01:00.1
Nov 05 14:50:56 proxmox kernel: vfio-pci 0000:01:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Nov 05 14:50:56 proxmox kernel: vfio-pci 0000:01:00.1: device [10de:228b] error status/mask=00100000/00000000
Nov 05 14:50:56 proxmox kernel: vfio-pci 0000:01:00.1: [20] UnsupReq (First)
Nov 05 14:50:56 proxmox kernel: vfio-pci 0000:01:00.1: AER: TLP Header: 40000001 0000000f 426254e8 f7f7f7f7
I then expanded the /etc/kernel/cmdline to include the parameter pcie_aspm=off. Now at least the error above no longer occurs.
However, if I start a backup now and the affected VM is finished and tries to boot up, it doesn't work. I then find the following error in the log:
Code:
Nov 05 17:09:42 proxmox kernel: Out of memory: Killed process 1032905 (kvm) total-vm:17847964kB, anon-rss:15412572kB, file-rss:256kB, shmem-rss:0kB, UID:0 pgtables:30508kB oom_score_adj:0
Nov 05 17:09:42 proxmox systemd[1]: 100.scope: A process of this unit has been killed by the OOM killer.
Nov 05 17:09:42 proxmox systemd[1]: 100.scope: Failed with result 'oom-kill'.
Nov 05 17:09:42 proxmox systemd[1]: 100.scope: Consumed 9.382s CPU time.
Nov 05 17:09:42 proxmox kernel: vmbr0: port 4(tap100i0) entered disabled state
Nov 05 17:09:42 proxmox kernel: vmbr0: port 4(tap100i0) entered disabled state
Nov 05 17:09:42 proxmox pvedaemon[1032793]: stopping swtpm instance (pid 1032897) due to QEMU startup error
Nov 05 17:09:42 proxmox pvedaemon[1032768]: start failed: QEMU exited with code 1
Nov 05 17:09:42 proxmox pvedaemon[2860]: <root@pam> end task UPID:proxmox:000FC240:000184E5:6547BE3C:qmstart:100:root@pam: start failed: QEMU exited with code 1
Nov 05 17:09:42 proxmox pvestatd[2825]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 qmp socket - Connection refused
Code:
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
Any suggestions? Thanks and greetings.