I have a Proxmox node that's been chugging along for about a year without a hitch. Last night, it crashed but I'm not able to determine why. One of the VMs is running Windows 10 with Plex hosted. I have another VM with Home Assistant and the Tautulli add-on with webhooks to Discord. The wife and I were watching a movie off Plex and I get a notification from the Tautulli monitoring that the Plex server is down at about 20:44 PM ET. I couldn't log into Proxmox (tried using the mobile app and a laptop/Chrome). The VMs start incrementally becoming inaccessible. I was able to still get the HA when the notification came through but couldn't get to it about two minutes later. I eventually did a hard restart on the node and everything came back up just fine. This was ~20:53 PM.
Seeing that the errors started with dev dm-9, I checked and it's the main disk for the Windows VM.
And the VM config:
Code:
pveversion --verbose
proxmox-ve: 8.2.0 (running kernel: 6.8.4-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.8: 6.8.4-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
intel-microcode: 3.20230808.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.2-1
proxmox-backup-file-restore: 3.2.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.0.11
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.6
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2
Code:
journalctl --since "2024-05-07 20:39:00" --until "2024-05-07 20:54:00"
May 07 20:39:15 prox pvedaemon[1438197]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PVE/Diskmanage.pm line 284.
May 07 20:39:22 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 333460
May 07 20:39:22 prox kernel: device-mapper: block manager: btree_node validator check failed for block 333460
May 07 20:39:22 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:39:22 prox kernel: Buffer I/O error on dev dm-9, logical block 12750464, lost async page write
May 07 20:39:22 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 333460
May 07 20:39:22 prox kernel: device-mapper: block manager: btree_node validator check failed for block 333460
May 07 20:39:22 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:39:22 prox kernel: Buffer I/O error on dev dm-9, logical block 12750465, lost async page write
May 07 20:39:22 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 333460
May 07 20:39:22 prox kernel: device-mapper: block manager: btree_node validator check failed for block 333460
May 07 20:39:22 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:39:22 prox kernel: Buffer I/O error on dev dm-9, logical block 12750477, lost async page write
May 07 20:39:35 prox kernel: pvestatd[1200]: segfault at 10 ip 00005d5d934572d1 sp 00007ffe443ce1e0 error 4 in perl[5d5d93331000+195000] likely on CPU 0 (core 0, socket 0)
May 07 20:39:35 prox kernel: Code: de 48 89 ef e8 e0 10 f4 ff 49 8b 45 10 48 89 85 d0 00 00 00 49 8b 07 48 63 50 60 48 8b 43 08 48 8b 04 d0 48 89 85 20 01 00 00 <48> 8b 40 10 >
May 07 20:39:35 prox systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
May 07 20:39:35 prox systemd[1]: pvestatd.service: Failed with result 'signal'.
May 07 20:39:35 prox systemd[1]: pvestatd.service: Consumed 31min 49.101s CPU time.
May 07 20:41:38 prox pvedaemon[1470396]: <root@pam> successful auth for user 'root@pam'
May 07 20:44:40 prox pvedaemon[1438197]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PVE/Diskmanage.pm line 284.
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:41 prox IPCC.xs[1470396]: pam_unix(proxmox-ve-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=::ffff:10.22.86.2 user=root
May 07 20:44:43 prox pvedaemon[1470396]: authentication failure; rhost=::ffff:10.22.86.2 user=root@pam msg=Authentication failure
May 07 20:44:43 prox kernel: pvedaemon worke[1470396]: segfault at 8 ip 00005862ca8bf115 sp 00007fff17338950 error 6 in perl[5862ca7e4000+195000] likely on CPU 1 (core 1, sock>
May 07 20:44:43 prox kernel: Code: 00 00 48 8b 40 10 48 89 43 10 84 c9 74 46 4c 8b 00 48 8b 83 b8 00 00 00 48 8b 40 10 48 8b 50 28 48 89 55 58 4c 89 f2 4c 29 e2 <41> 83 40 08 >
May 07 20:44:43 prox pvedaemon[1215]: worker 1470396 finished
May 07 20:44:43 prox pvedaemon[1215]: starting 1 worker(s)
May 07 20:44:43 prox pvedaemon[1215]: worker 1495679 started
May 07 20:44:52 prox kernel: node_check: 10 callbacks suppressed
May 07 20:44:52 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 332780
May 07 20:44:52 prox kernel: dm_bm_validate_buffer: 10 callbacks suppressed
May 07 20:44:52 prox kernel: device-mapper: block manager: btree_node validator check failed for block 332780
May 07 20:44:52 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:44:52 prox kernel: buffer_io_error: 9 callbacks suppressed
May 07 20:44:52 prox kernel: Buffer I/O error on dev dm-9, logical block 10434564, lost async page write
May 07 20:44:52 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 332780
May 07 20:44:52 prox kernel: device-mapper: block manager: btree_node validator check failed for block 332780
May 07 20:44:52 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:44:52 prox kernel: Buffer I/O error on dev dm-9, logical block 10434565, lost async page write
-- Boot f40b682d0dbd41d3bb3105b8d486aa77 --
Seeing that the errors started with dev dm-9, I checked and it's the main disk for the Windows VM.
Code:
dmsetup info /dev/dm-9
Name: pve-vm--103--disk--1
State: ACTIVE
Read Ahead: 256
Tables present: LIVE
Open count: 1
Event number: 0
Major, minor: 252, 9
Number of targets: 1
UUID: LVM-sB6ehUpfKjdwQOs7rDWHf1ZplSSde4USIf0RFdwwllHKgl9J8l8u2l6KNMgUcpMa
And the VM config:
Code:
cat /etc/pve/qemu-server/103.conf
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=ide0;net0
cores: 12
cpu: host
efidisk0: local-lvm:vm-103-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:01:00,pcie=1
ide0: local-lvm:vm-103-disk-1,cache=writethrough,size=128G,ssd=1
machine: pc-q35-8.0
memory: 32768
meta: creation-qemu=8.0.2,ctime=1694973452
name: winsuck
net0: virtio=66:5F:C2:6B:9B:24,bridge=vmbr0
net1: virtio=02:FE:E6:51:4F:43,bridge=vmbr1
numa: 0
onboot: 1
ostype: win10
protection: 0
scsi0: /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_500GB_S58SNM0T415357R_1,backup=0,cache=writethrough,size=488386584K,ssd=1
scsi2: /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_500GB_S5H7NS0N846243V,backup=0,size=488386584K
scsihw: virtio-scsi-single
smbios1: uuid=0617f923-205a-44e5-9867-987120843fc6
sockets: 1
startup: order=2
vmgenid: cc1ec081-4570-4417-90b0-0264e3fb089f