[SOLVED] Proxmox 8 with Windows Server 2022 continual BSOD and disk corruption

prometheus76

New Member
Feb 22, 2024
11
2
3
Hey all just new to Proxmox and have my hands on 2 x HPE DL360 G10 and a HPE MSA2040 SAS array.

Took a while but got the 2 nodes up and running with multipath to the MSA, bit of a pain TBH.

But my issues is i'm installed Server 2022 onto the Proxmox nodes, but about 7 out of 10 times creating the VM the windows setup will say it cannot be installed on the configured hardware. I've tried all different hardware types, processor types but even if it eventually builds it will BSOD, mainly during updates but now when I console on they are all pretty much BSOD with error Memory Management stop code.
If I try to install a role on them they will fail as there are corrupt log files.

At this stage it's pretty much unusable as it's too unreliable.

Anyone know what could the issue be here. I can post logs.

P.s. i'm not very experienced with linux :-0

Thanks
 
i've installed another Server 2022 on the LVM-Thin drive which is local SSDs and it took the updates with out BSOD. I'll keep monitoring.

Question is why would multipath to the MSA 2040 be causing the issues? Is there anywhere I can check for errors?

Thanks
 
Can you please post
  • the error messages you get in the Windows VM
  • and the output of the following commands (replace VMID with the VMID of the Windows VM):
    Code:
    pveversion -v
    qm config VMID --current
    multipath -ll
Is the SAN connected via iSCSI or Fibre Channel?
 
1708700100049.png

root@proxmox01:~# pveversion -v
proxmox-ve: 8.1.0 (running kernel: 6.5.11-8-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5: 6.5.11-8
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.1
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.4
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.9-1
pve-ha-manager: 4.0.3
pve-i18n: 3.2.0
pve-qemu-kvm: 8.1.5-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve1


root@proxmox01:~# qm config 101 --current
agent: 1
bios: ovmf
boot: order=virtio0;ide2;net0
cores: 2
cpu: host
cpulimit: 4
efidisk0: Virtual_Machines:vm-101-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: none,media=cdrom
machine: pc-q35-8.1
memory: 4096
meta: creation-qemu=8.1.5,ctime=1708277528
name: PR-MH-DC-01
net0: virtio=BC:24:11:23:9D:5C,bridge=vmbr1,firewall=1
numa: 0
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=623ec75c-7c58-4bba-ad57-2e2cca1cd57c
sockets: 2
tpmstate0: Virtual_Machines:vm-101-disk-1,size=4M,version=v2.0
virtio0: Virtual_Machines:vm-101-disk-2,cache=writeback,discard=on,iothread=1,size=100G
vmgenid: e0f6807a-55a8-4ecf-a2f3-24913e135cc0

root@proxmox01:~# multipath -ll
Data1 (3600c0ff0001e398b1daec06501000000) dm-10 HP,MSA 2040 SAS
size=1.8T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:0:4 sde 8:64 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 0:0:1:4 sdh 8:112 active ready running
Data2 (3600c0ff0001e3a6459a5cc6501000000) dm-8 HP,MSA 2040 SAS
size=559G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:1:3 sdg 8:96 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 0:0:0:3 sdc 8:32 active ready running
ISOs (3600c0ff0001e398b3cc0c06501000000) dm-7 HP,MSA 2040 SAS
size=838G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:0:2 sdd 8:48 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 0:0:1:2 sdf 8:80 active ready running
Virtual_Machines (3600c0ff0001e3a64acb4bf6501000000) dm-6 HP,MSA 2040 SAS
size=1.8T features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=50 status=active
| `- 0:0:1:1 sdb 8:16 active ready running
`-+- policy='round-robin 0' prio=10 status=enabled
`- 0:0:0:1 sda 8:0 active ready running
root@proxmox01:~#



Its a SAS MSA 2040 HPE SAN.

Many thanks
 
Thanks. The virtio0 disk of the VM has cache mode "writeback" and discard enabled. Can you try resetting both options to their defaults, i.e., cache mode "none" and discard disabled? With discard enabled, there seem to be occasional data corruption issues with some SANs, see [1].

[1] https://forum.proxmox.com/threads/91267/#post-399279
 
Thanks fweber, i created a new Win 2022 VM and patched and no BSOD! It does seem a good bit slower though.

I guess from the link you posted from above i'll have to blow away the already created machines and build them from scratch?

Is it worth trying with write back enabled but discard left disabled?

Thanks again
 
Thanks fweber, i created a new Win 2022 VM and patched and no BSOD! It does seem a good bit slower though.
Interesting! Thanks for reporting back.
I guess from the link you posted from above i'll have to blow away the already created machines and build them from scratch?
I would assume that some VM disk corruption already happened, so starting from scratch seems like the safer choice.
Is it worth trying with write back enabled but discard left disabled?
Currently I'd suppose that discard=on is the culprit here, so yes, trying with writeback enabled but discard disabled might be worth a try. If you do try, it would be great if you could report back with your results.
 
Just as an up date, discard=on and Write Back looks to have sorted the issue.

I've reinstalled all Windows Server 2022 and updated fully with no issues, no corruption being reported on the disk.

@fweber many thanks for your help it was invaluable.
 
  • Like
Reactions: Kingneutron
Thanks for reporting back! Glad to hear the issue is solved.
Just as an up date, discard=on and Write Back looks to have sorted the issue.
Just to be sure: Do you mean that disabling discard, i.e., setting discard=ignore and writeback has sorted out the issue?
 
Lol sorry, yes Discard=off and write back=on is what I have it set as and it's all working perfectly.

Again grateful for al your help friedrich
 
  • Like
Reactions: fweber

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!