Windows 2022 BSOD with NUMA and Hotplug after upgrade

hanoon

Renowned Member
Jul 1, 2014
40
3
73
In Windows 2022 template we have CPU/Memory hotplug enabled with NUMA to support it on the processor - Recently all 2022 have been getting BSOD with an error

I did search the forum and multiple threads however the workaround to get the VM to boot is to disable NUMA and CPU/Memory hotplug - then it would normally boot

Additionally, I tried changing the CPU type to host, or machine from Default i440fx to q35 - neither seemed to resolve


Some of the articles referred to the pci_hotplug module - which they mentioned enabled by default but in our case, it's not
# modprobe pci_hotplug
modprobe: FATAL: Module pci_hotplug not found in directory /lib/modules/5.15.104-1-pve

We have the Qemu-guest-agent running and the latest virto drivers installed on the guest Windows OS : virtio-win-0.1.229
============

HW details:
R750XD Dell PowerEdge
96 x Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz (2 Sockets)
1024GB memory
============

VM Configs:
============
agent: 1
args: -machine type=q35,kernel_irqchip=on
balloon: 0
bios: seabios
boot: order=virtio0;net0;ide0
cores: 48
cpu: kvm64
cpuunits: 2048
hotplug: disk,network,usb
ide0: templates:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: pc-q35-7.2
memory: 16384
name: test-delete
net0: virtio=C6:8D:CX:X4:44:1B,bridge=vmbr1
numa: 1
ostype: win11
scsihw: virtio-scsi-pci
smbios1: uuid=905801c7-5421-4c18-bea5-b9c26c7d62db
sockets: 2
vcpus: 1
virtio0: local:5555/vm-5555-disk-0.qcow2,size=40G
vmgenid: 30817340-93fe-4a74-96ba-76c5b192da98

PVE Version
=============================
proxmox-ve: 7.4-1 (running kernel: 5.15.104-1-pve)
pve-manager: 7.4-13 (running version: 7.4-13/46c37d9c)
pve-kernel-5.15: 7.4-3
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph: 17.2.6-pve1
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-network-perl: 0.7.3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-4
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

1687017932936.png
1687017890358.png


Any assistance or help is much appreciated!
 
Last edited:
the newest version of virtio tools are virtio-win-0.1.229.iso

you can change i440fx to q35 easily. its like change your motherboard, the drivers are missing and it cant access the files needed. try change it back
 
Hi,
IIRC this is an issue @fweber also came across and is caused by a recent Windows update in combination with having an "empty" NUMA node, because of the low vCPU count. Can you check if the issue goes away if you increase the vCPU count to 25 (one more than half)?
 
I tried and still same results - here are the configurations after the change in case something is missing

agent: 1
args: -machine type=q35,kernel_irqchip=on
balloon: 0
bios: seabios
boot: order=virtio0;net0;ide0
cores: 48
cpu: kvm64
cpuunits: 2048
hotplug: disk,network,usb,memory,cpu
ide0: templates:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: pc-q35-7.2
memory: 16384
name: test-delete
net0: virtio=C6:8D:CA::1B,bridge=vmbr0
numa: 1
ostype: win11
scsihw: virtio-scsi-pci
smbios1: uuid=905801c7-5421-4c18-bea5-b9c26c7d62db
sockets: 2
vcpus: 25
virtio0: local:5555/vm-5555-disk-0.qcow2,size=40G
vmgenid: 30817340-93fe-4a74-96ba-76c5b192da98
 
args: -machine type=q35,kernel_irqchip=on
Is there a special reason for setting this?

I did search the forum and multiple threads however the workaround to get the VM to boot is to disable NUMA and CPU/Memory hotplug - then it would normally boot
What if you disable only CPU hotplug? What if you disable NUMA and memory hotplug, but leave CPU hotplug on?
 
Is there a special reason for setting this?
No - I tried different variations - this just happened to be the one when I sent configs. let me know if It should be something else
What if you disable only CPU hotplug? What if you disable NUMA and memory hotplug, but leave CPU hotplug on?
Case 1
NUMA on
Memory HP on
CPU HP off
= BSOD

Case 2
NUMA on
Memory HP off
CPU HP on
= BSOD

Case 3
NUMA on
Memory HP off
CPU HP off
= BSOD

Case 4
NUMA off
Memory HP off
CPU HP off
= Boot Normally

Case 5
NUMA off
Memory HP on
CPU HP off
= TASK ERROR: NUMA needs to be enabled for memory hotplug (No Start)

Case 6
NUMA off
Memory HP off
CPU HP on
= Boot Normally
 
If you downgrade your virtio driver it works again. It is only an workaround because it is depending on much more and an bad windows Update.
 
I just tested and It has the same BSOD after downgrading the Virtio driver - The issue is actually started after the Proxmox update not the Virtio driver update.
 
No - I tried different variations - this just happened to be the one when I sent configs. let me know if It should be something else

Case 1
NUMA on
Memory HP on
CPU HP off
= BSOD

Case 2
NUMA on
Memory HP off
CPU HP on
= BSOD

Case 3
NUMA on
Memory HP off
CPU HP off
= BSOD

Case 4
NUMA off
Memory HP off
CPU HP off
= Boot Normally

Case 5
NUMA off
Memory HP on
CPU HP off
= TASK ERROR: NUMA needs to be enabled for memory hotplug (No Start)

Case 6
NUMA off
Memory HP off
CPU HP on
= Boot Normally
One socket - same number of cores - works
Sounds like an issue triggered by NUMA then and not by hotplug. But if it's caused by the Windows update (and not just exposing another pre-existing bug), I'm not sure we can even do much about it.
 
Do we know which update KB causing the issue so we can remove and resolve the issue?
Sorry, I don't. If nobody else does, you could take a snapshot and try different ones.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!