Hello Community -
We built a new proxmox cluster (3 node) and started migrating VMs and CTs from an existing Proxmox single node unit to the cluster. I keep having issues with these nodes crashing with weird filesystem and mount errors. They run perfectly fine on the old proxmox unit and w are seeing this issue across multiple nodes of the cluster so I do not think it is hardware related. *NEW* installs on the cluster do not have these issues. I was hoping someone could shed some light on the issue. This is happening to every migrated VM, eventually, the system goes into RO mode and crashes, generally requiring a manual fsck on the next reboot.
We built a new proxmox cluster (3 node) and started migrating VMs and CTs from an existing Proxmox single node unit to the cluster. I keep having issues with these nodes crashing with weird filesystem and mount errors. They run perfectly fine on the old proxmox unit and w are seeing this issue across multiple nodes of the cluster so I do not think it is hardware related. *NEW* installs on the cluster do not have these issues. I was hoping someone could shed some light on the issue. This is happening to every migrated VM, eventually, the system goes into RO mode and crashes, generally requiring a manual fsck on the next reboot.
Code:
[Wed Mar 31 11:33:38 2021] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
[Wed Mar 31 11:33:38 2021] IPv6: ADDRCONF(NETDEV_UP): ens18: link is not ready
[Wed Mar 31 11:33:38 2021] e1000: ens18 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[Wed Mar 31 11:33:38 2021] IPv6: ADDRCONF(NETDEV_CHANGE): ens18: link becomes ready
[Wed Mar 31 11:33:38 2021] cgroup: new mount options do not match the existing superblock, will be ignored
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#14 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#14 CDB: Write(10) 2a 00 03 1d 6c a8 00 00 10 00
[Wed Mar 31 12:49:09 2021] blk_update_request: I/O error, dev sda, sector 52260008
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#13 CDB: Write(10) 2a 00 03 17 06 78 00 00 08 00
[Wed Mar 31 12:49:09 2021] blk_update_request: I/O error, dev sda, sector 51840632
[Wed Mar 31 12:49:09 2021] Aborting journal on device dm-0-8.
[Wed Mar 31 12:49:09 2021] Buffer I/O error on dev dm-0, logical block 6292175, lost async page write
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#12 CDB: Write(10) 2a 00 03 17 01 60 00 00 08 00
[Wed Mar 31 12:49:09 2021] blk_update_request: I/O error, dev sda, sector 51839328
[Wed Mar 31 12:49:09 2021] Buffer I/O error on dev dm-0, logical block 6292012, lost async page write
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#11 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#11 CDB: Write(10) 2a 00 02 23 e3 a0 00 00 08 00
[Wed Mar 31 12:49:09 2021] blk_update_request: I/O error, dev sda, sector 35906464
[Wed Mar 31 12:49:09 2021] EXT4-fs warning (device dm-0): ext4_end_bio:330: I/O error -5 writing to inode 1583870 (offset 0 size 0 starting block 4300404)
[Wed Mar 31 12:49:09 2021] Buffer I/O error on device dm-0, logical block 4300404
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
[Wed Mar 31 12:49:09 2021] sd 2:0:0:0: [sda] tag#10 CDB: Write(10) 2a 00 00 2c 26 c8 00 00 18 00
[Wed Mar 31 12:49:09 2021] blk_update_request: I/O error, dev sda, sector 2893512
[Wed Mar 31 12:49:09 2021] EXT4-fs warning (device dm-0): ext4_end_bio:330: I/O error -5 writing to inode 1583870 (offset 0 size 0 starting block 173785)
[Wed Mar 31 12:49:09 2021] Buffer I/O error on device dm-0, logical block 173785
[Wed Mar 31 12:49:09 2021] Buffer I/O error on device dm-0, logical block 173786
[Wed Mar 31 12:49:09 2021] Buffer I/O error on device dm-0, logical block 173787
[Wed Mar 31 12:49:09 2021] EXT4-fs error (device dm-0): ext4_journal_check_start:56: Detected aborted journal
[Wed Mar 31 12:49:09 2021] EXT4-fs (dm-0): Remounting filesystem read-only
[Wed Mar 31 12:50:49 2021] ata3.00: exception Emask 0x0 SAct 0x1000000 SErr 0x0 action 0x6 frozen
[Wed Mar 31 12:50:49 2021] ata3.00: failed command: READ FPDMA QUEUED
[Wed Mar 31 12:50:49 2021] ata3.00: cmd 60/00:c0:58:69:7b/01:00:00:00:00/40 tag 24 ncq 131072 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[Wed Mar 31 12:50:49 2021] ata3.00: status: { DRDY }
[Wed Mar 31 12:50:49 2021] ata3: hard resetting link
[Wed Mar 31 12:51:02 2021] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[Wed Mar 31 12:51:02 2021] ata3.00: configured for UDMA/100
[Wed Mar 31 12:51:02 2021] ata3.00: device reported invalid CHS sector 0
[Wed Mar 31 12:51:02 2021] ata3: EH complete