Proxmox randomly freezes or reboot

EduCracker

Member
Jul 23, 2020
1
0
6
34
Hi good day.

We have a Proxmox system installed on a B560M DS3H V2 motherboard (Core i9-11900 11th gen, 64GB RAM). It have been working with no mistakes about a year, but since some months ago it started to randomly shutdown (sometimes it suddenly shutdown). We have clean reinstalled proxmox in a new NvME disk. Then CPU shutdowns stopped, but still being rebooting randomly.

After several tries and tests (Memtest86 test passes, power supply replacement) i've tried to set min and max ARC non persistent (following instructions on https://forum.proxmox.com/threads/disable-zfs-arc-or-limiting-it.77845/). It's strange, but that worked, and server have no reboots on about 10 days.

Weird thing is that we have no zfs storages, we just use LVM for VM data storage, and ext4 for local PVE storage.

I've make persistent changes adding lines in /etc/modprobe.d/zfs.conf specifying "options zfs zfs_arc_min=8059738368" and zfs_arc_max.

Server remains beween 4-8 days without troubles and unexpectedly reboots again.

Can someone give me any clues about my trouble, or telling me why could zfs ARC affecting my server even when i don't use zfs pools at all, or if i completely disable zfs on my server could affect in something else.

Thank you!
 
Are you sure your mem is fine? No error entries in bios or iomi (bmc) limit arc can result into not use ram to that extend where you might reach a specific ram slot, that makes you server crash because its faulty.
 
I am also a user of the B560M DS3H V2 motherboard. I'm having exactly the same issue as you described. Random freezes every now and then. It can be a month without an issue and then crash and then after another 3 days another crash. It's hard to gather logs, as this is a boot storage issue, so the logs aren't correctly saved, thus impossible to get after a reboot.
From my side i was able to get logs as the disk stopped communicating for only a few moments and i was able to connect to proxmox before it crashed.

May 20 17:45:53 kami-vm kernel: nvme nvme0: I/O tag 32 (3020) opcode 0x1 (I/O Cmd) QID 8 timeout, aborting req_op:WRITE(1) size:4096
May 20 17:45:53 kami-vm kernel: nvme nvme0: I/O tag 33 (e021) opcode 0x1 (I/O Cmd) QID 8 timeout, aborting req_op:WRITE(1) size:4096
May 20 17:45:53 kami-vm kernel: nvme nvme0: I/O tag 34 (5022) opcode 0x1 (I/O Cmd) QID 8 timeout, aborting req_op:WRITE(1) size:4096
May 20 17:45:53 kami-vm kernel: nvme nvme0: I/O tag 35 (c023) opcode 0x1 (I/O Cmd) QID 8 timeout, aborting req_op:WRITE(1) size:4096
May 20 17:45:53 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:45:53 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:45:53 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:45:53 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:46:18 kami-vm kernel: nvme nvme0: I/O tag 17 (b011) opcode 0x2 (I/O Cmd) QID 8 timeout, aborting req_op:READ(0) size:4096
May 20 17:46:18 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:46:23 kami-vm kernel: nvme nvme0: I/O tag 32 (3020) opcode 0x1 (I/O Cmd) QID 8 timeout, reset controller
May 20 17:46:23 kami-vm kernel: nvme nvme0: 12/0/0 default/read/poll queues
May 20 17:50:18 kami-vm kernel: nvme nvme0: I/O tag 56 (e038) opcode 0x1 (I/O Cmd) QID 10 timeout, aborting req_op:WRITE(1) size:4096
May 20 17:50:18 kami-vm kernel: nvme nvme0: I/O tag 57 (8039) opcode 0x1 (I/O Cmd) QID 10 timeout, aborting req_op:WRITE(1) size:4096
May 20 17:50:18 kami-vm kernel: nvme nvme0: I/O tag 58 (503a) opcode 0x1 (I/O Cmd) QID 10 timeout, aborting req_op:WRITE(1) size:4096
May 20 17:50:18 kami-vm kernel: nvme nvme0: I/O tag 59 (003b) opcode 0x1 (I/O Cmd) QID 10 timeout, aborting req_op:WRITE(1) size:4096
May 20 17:50:18 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:50:18 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:50:18 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:50:18 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:50:20 kami-vm kernel: nvme nvme0: I/O tag 41 (f029) opcode 0x0 (I/O Cmd) QID 10 timeout, aborting req_op:FLUSH(2) size:0
May 20 17:50:20 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:50:21 kami-vm kernel: nvme nvme0: I/O tag 42 (902a) opcode 0x2 (I/O Cmd) QID 10 timeout, aborting req_op:READ(0) size:4096
May 20 17:50:21 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:50:23 kami-vm kernel: nvme nvme0: I/O tag 43 (702b) opcode 0x2 (I/O Cmd) QID 10 timeout, aborting req_op:READ(0) size:4096
May 20 17:50:23 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:50:23 kami-vm pvedaemon[1144]: <root@pam> successful auth for user 'root@pam'
May 20 17:50:33 kami-vm kernel: nvme nvme0: I/O tag 44 (402c) opcode 0x2 (I/O Cmd) QID 10 timeout, aborting req_op:READ(0) size:131072
May 20 17:50:33 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:50:33 kami-vm kernel: nvme nvme0: I/O tag 45 (102d) opcode 0x1 (I/O Cmd) QID 10 timeout, aborting req_op:WRITE(1) size:4096
May 20 17:50:33 kami-vm kernel: nvme nvme0: Abort status: 0x0
May 20 17:50:48 kami-vm kernel: nvme nvme0: I/O tag 56 (e038) opcode 0x1 (I/O Cmd) QID 10 timeout, reset controller
May 20 17:50:48 kami-vm kernel: nvme0n1: I/O Cmd(0x2) @ LBA 191187920, 40 blocks, I/O Error (sct 0x3 / sc 0x71)
May 20 17:50:48 kami-vm kernel: I/O error, dev nvme0n1, sector 191187920 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0
May 20 17:50:48 kami-vm kernel: nvme0n1: I/O Cmd(0x2) @ LBA 154418696, 32 blocks, I/O Error (sct 0x3 / sc 0x71)
May 20 17:50:48 kami-vm kernel: I/O error, dev nvme0n1, sector 154418696 op 0x0:(READ) flags 0x80700 phys_seg 4 prio class 0
May 20 17:50:48 kami-vm kernel: nvme nvme0: 12/0/0 default/read/poll queues

I can provide any info you need as it seems the server was able to recover this time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!