Random Crash

FriedCheese

New Member
May 8, 2024
4
0
1
I have a Proxmox node that's been chugging along for about a year without a hitch. Last night, it crashed but I'm not able to determine why. One of the VMs is running Windows 10 with Plex hosted. I have another VM with Home Assistant and the Tautulli add-on with webhooks to Discord. The wife and I were watching a movie off Plex and I get a notification from the Tautulli monitoring that the Plex server is down at about 20:44 PM ET. I couldn't log into Proxmox (tried using the mobile app and a laptop/Chrome). The VMs start incrementally becoming inaccessible. I was able to still get the HA when the notification came through but couldn't get to it about two minutes later. I eventually did a hard restart on the node and everything came back up just fine. This was ~20:53 PM.

Code:
pveversion --verbose
proxmox-ve: 8.2.0 (running kernel: 6.8.4-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.8: 6.8.4-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2.16-12-pve: 6.2.16-12
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
intel-microcode: 3.20230808.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.1
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.2-1
proxmox-backup-file-restore: 3.2.2-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.0.11
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.6
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2

Code:
journalctl --since "2024-05-07 20:39:00" --until "2024-05-07 20:54:00"
May 07 20:39:15 prox pvedaemon[1438197]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PVE/Diskmanage.pm line 284.
May 07 20:39:22 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 333460
May 07 20:39:22 prox kernel: device-mapper: block manager: btree_node validator check failed for block 333460
May 07 20:39:22 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:39:22 prox kernel: Buffer I/O error on dev dm-9, logical block 12750464, lost async page write
May 07 20:39:22 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 333460
May 07 20:39:22 prox kernel: device-mapper: block manager: btree_node validator check failed for block 333460
May 07 20:39:22 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:39:22 prox kernel: Buffer I/O error on dev dm-9, logical block 12750465, lost async page write
May 07 20:39:22 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 333460
May 07 20:39:22 prox kernel: device-mapper: block manager: btree_node validator check failed for block 333460
May 07 20:39:22 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:39:22 prox kernel: Buffer I/O error on dev dm-9, logical block 12750477, lost async page write
May 07 20:39:35 prox kernel: pvestatd[1200]: segfault at 10 ip 00005d5d934572d1 sp 00007ffe443ce1e0 error 4 in perl[5d5d93331000+195000] likely on CPU 0 (core 0, socket 0)
May 07 20:39:35 prox kernel: Code: de 48 89 ef e8 e0 10 f4 ff 49 8b 45 10 48 89 85 d0 00 00 00 49 8b 07 48 63 50 60 48 8b 43 08 48 8b 04 d0 48 89 85 20 01 00 00 <48> 8b 40 10 >
May 07 20:39:35 prox systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
May 07 20:39:35 prox systemd[1]: pvestatd.service: Failed with result 'signal'.
May 07 20:39:35 prox systemd[1]: pvestatd.service: Consumed 31min 49.101s CPU time.
May 07 20:41:38 prox pvedaemon[1470396]: <root@pam> successful auth for user 'root@pam'
May 07 20:44:40 prox pvedaemon[1438197]: Use of uninitialized value in pattern match (m//) at /usr/share/perl5/PVE/Diskmanage.pm line 284.
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 172374
May 07 20:44:40 prox kernel: device-mapper: block manager: btree_node validator check failed for block 172374
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:40 prox kernel: Buffer I/O error on dev dm-7, logical block 8286278, async page read
May 07 20:44:41 prox IPCC.xs[1470396]: pam_unix(proxmox-ve-auth:auth): authentication failure; logname= uid=0 euid=0 tty= ruser= rhost=::ffff:10.22.86.2  user=root
May 07 20:44:43 prox pvedaemon[1470396]: authentication failure; rhost=::ffff:10.22.86.2 user=root@pam msg=Authentication failure
May 07 20:44:43 prox kernel: pvedaemon worke[1470396]: segfault at 8 ip 00005862ca8bf115 sp 00007fff17338950 error 6 in perl[5862ca7e4000+195000] likely on CPU 1 (core 1, sock>
May 07 20:44:43 prox kernel: Code: 00 00 48 8b 40 10 48 89 43 10 84 c9 74 46 4c 8b 00 48 8b 83 b8 00 00 00 48 8b 40 10 48 8b 50 28 48 89 55 58 4c 89 f2 4c 29 e2 <41> 83 40 08 >
May 07 20:44:43 prox pvedaemon[1215]: worker 1470396 finished
May 07 20:44:43 prox pvedaemon[1215]: starting 1 worker(s)
May 07 20:44:43 prox pvedaemon[1215]: worker 1495679 started
May 07 20:44:52 prox kernel: node_check: 10 callbacks suppressed
May 07 20:44:52 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 332780
May 07 20:44:52 prox kernel: dm_bm_validate_buffer: 10 callbacks suppressed
May 07 20:44:52 prox kernel: device-mapper: block manager: btree_node validator check failed for block 332780
May 07 20:44:52 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:44:52 prox kernel: buffer_io_error: 9 callbacks suppressed
May 07 20:44:52 prox kernel: Buffer I/O error on dev dm-9, logical block 10434564, lost async page write
May 07 20:44:52 prox kernel: device-mapper: btree spine: node_check failed: blocknr 0 != wanted 332780
May 07 20:44:52 prox kernel: device-mapper: block manager: btree_node validator check failed for block 332780
May 07 20:44:52 prox kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
May 07 20:44:52 prox kernel: Buffer I/O error on dev dm-9, logical block 10434565, lost async page write
-- Boot f40b682d0dbd41d3bb3105b8d486aa77 --

Seeing that the errors started with dev dm-9, I checked and it's the main disk for the Windows VM.

Code:
dmsetup info /dev/dm-9
Name:              pve-vm--103--disk--1
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      252, 9
Number of targets: 1
UUID: LVM-sB6ehUpfKjdwQOs7rDWHf1ZplSSde4USIf0RFdwwllHKgl9J8l8u2l6KNMgUcpMa

And the VM config:

Code:
cat /etc/pve/qemu-server/103.conf
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=ide0;net0
cores: 12
cpu: host
efidisk0: local-lvm:vm-103-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:01:00,pcie=1
ide0: local-lvm:vm-103-disk-1,cache=writethrough,size=128G,ssd=1
machine: pc-q35-8.0
memory: 32768
meta: creation-qemu=8.0.2,ctime=1694973452
name: winsuck
net0: virtio=66:5F:C2:6B:9B:24,bridge=vmbr0
net1: virtio=02:FE:E6:51:4F:43,bridge=vmbr1
numa: 0
onboot: 1
ostype: win10
protection: 0
scsi0: /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_Plus_500GB_S58SNM0T415357R_1,backup=0,cache=writethrough,size=488386584K,ssd=1
scsi2: /dev/disk/by-id/nvme-Samsung_SSD_970_EVO_500GB_S5H7NS0N846243V,backup=0,size=488386584K
scsihw: virtio-scsi-single
smbios1: uuid=0617f923-205a-44e5-9867-987120843fc6
sockets: 1
startup: order=2
vmgenid: cc1ec081-4570-4417-90b0-0264e3fb089f
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!