Should I worry about input/output errors?

donutgonuts

New Member
Jul 11, 2021
4
0
1
34
Hi,

I've just experienced a system failure on my proxmox host. All the commands such as less, smartctl, fsck, fdisk raised an Input/Output error with no error code. The only working commands werels and reboot. The system recovered after a reboot, but it somehow makes me question the reliability of my hardware. Does this error mean that I have faulty harddrive or RAM?
Also, I should mention that before this incident, the host went offline and I had to ask for a remote hard-reboot of the machine. Would this be the trigger? I've checked syslog but can't find any error message.

My hardware:
Intel i5-10400
DDR4 16G 2666Mhz
SanDisk Ultra 3D NVMeSandisk NVME 1Tb
 
Last edited:

bbgeek17

Active Member
Nov 20, 2020
161
31
28
www.blockbridge.com
The first suspect is the non-enterprise disk. It has no power-loss protection so an unexpected shutdown could very likely lead to file system issues.
Perhaps an auto-fsck was run on boot, perhaps it fixed things. Are there guarantees? Of course not.
Probably a good idea is to re-run all file system checks.


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

donutgonuts

New Member
Jul 11, 2021
4
0
1
34
The first suspect is the non-enterprise disk. It has no power-loss protection so an unexpected shutdown could very likely lead to file system issues.
Perhaps an auto-fsck was run on boot, perhaps it fixed things. Are there guarantees? Of course not.
Probably a good idea is to re-run all file system checks.


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Thanks for the reply. I spoke too soon and a few minutes ago the host crashed again. And this time the NVME disk is gone in fdisk. And this time I am able to see the error in syslog:

Code:
Sep 13 22:30:02 pve postfix/pickup[1244]: 17FE9502BCC: uid=0 from=<root>
Sep 13 22:30:02 pve postfix/cleanup[20103]: 17FE9502BCC: message-id=<20210913143002.17FE9502BCC@pve.lan>
Sep 13 22:30:02 pve postfix/qmgr[1245]: 17FE9502BCC: from=<root@pve.lan>, size=533, nrcpt=1 (queue active
)
Sep 13 22:30:02 pve pvemailforward[20106]: forward mail to <mail@example.com>
Sep 13 22:30:02 pve postfix/pickup[1244]: 6FBB6502BCD: uid=65534 from=<root>
Sep 13 22:30:02 pve postfix/cleanup[20103]: 6FBB6502BCD: message-id=<20210913143002.17FE9502BCC@pve.lan>
Sep 13 22:30:02 pve postfix/qmgr[1245]: 6FBB6502BCD: from=<root@pve.lan>, size=689, nrcpt=1 (queue active
)
Sep 13 22:30:02 pve postfix/local[20105]: 17FE9502BCC: to=<root@pve.lan>, orig_to=<root>, relay=local, de
lay=0.39, delays=0.04/0.01/0/0.34, dsn=2.0.0, status=sent (delivered to command: /usr/bin/pvemailforward)
Sep 13 22:30:02 pve postfix/qmgr[1245]: 17FE9502BCC: removed
Sep 13 22:30:02 pve postfix/smtp[20143]: 6FBB6502BCD: to=<mail@example.com>, relay=none, delay=0.02, delays=0/0.01/0.01/0, dsn=5.1.0, status=bounced (Domain example.com does not accept mail (nullMX))
Sep 13 22:30:02 pve postfix/qmgr[1245]: 6FBB6502BCD: removed
Sep 13 22:30:02 pve postfix/cleanup[20103]: 76E71502BCC: message-id=<20210913143002.76E71502BCC@pve.lan>
Sep 13 22:30:51 pve kernel: [ 4606.196593] nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Sep 13 22:30:51 pve kernel: [ 4606.256611] nvme 0000:02:00.0: enabling device (0000 -> 0002)
Sep 13 22:30:51 pve kernel: [ 4606.257007] nvme nvme0: Removing after probe failure status: -19
Sep 13 22:30:51 pve kernel: [ 4606.276578] blk_update_request: I/O error, dev nvme0n1, sector 1390204656 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276585] blk_update_request: I/O error, dev nvme0n1, sector 202655880 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276591] blk_update_request: I/O error, dev nvme0n1, sector 1390070288 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276609] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10 writing to inode 5771282 (offset 0 size 4096 starting block 23103249)
Sep 13 22:30:51 pve kernel: [ 4606.276618] blk_update_request: I/O error, dev nvme0n1, sector 1487591064 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276625] blk_update_request: I/O error, dev nvme0n1, sector 1050624 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276642] Buffer I/O error on device dm-1, logical block 23103249
Sep 13 22:30:51 pve kernel: [ 4606.276658] blk_update_request: I/O error, dev nvme0n1, sector 1390204480 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276666] blk_update_request: I/O error, dev nvme0n1, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276673] blk_update_request: I/O error, dev nvme0n1, sector 1390064520 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276684] blk_update_request: I/O error, dev nvme0n1, sector 1390064248 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276751] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10 writing to inode 5771281 (offset 0 size 4096 starting block 23103248)
Sep 13 22:30:51 pve kernel: [ 4606.276756] Buffer I/O error on device dm-1, logical block 23103248
Sep 13 22:30:51 pve kernel: [ 4606.276793] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10 writing to inode 5771280 (offset 0 size 4096 starting block 23103247)
Sep 13 22:30:51 pve kernel: [ 4606.276804] Buffer I/O error on device dm-1, logical block 23103247
Sep 13 22:30:51 pve kernel: [ 4606.276889] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10 writing to inode 3670703 (offset 0 size 0 starting block 6802091)
Sep 13 22:30:51 pve kernel: [ 4606.276893] Buffer I/O error on device dm-1, logical block 6802091
Sep 13 22:30:51 pve kernel: [ 4606.276909] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10
Sep 13 22:30:51 pve kernel: [ 4606.276666] blk_update_request: I/O error, dev nvme0n1, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276673] blk_update_request: I/O error, dev nvme0n1, sector 1390064520 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276684] blk_update_request: I/O error, dev nvme0n1, sector 1390064248 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Sep 13 22:30:51 pve kernel: [ 4606.276751] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10 writing to inode 5771281 (offset 0 size 4096 starting block 23103248)
Sep 13 22:30:51 pve kernel: [ 4606.276756] Buffer I/O error on device dm-1, logical block 23103248
Sep 13 22:30:51 pve kernel: [ 4606.276793] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10 writing to inode 5771280 (offset 0 size 4096 starting block 23103247)
Sep 13 22:30:51 pve kernel: [ 4606.276804] Buffer I/O error on device dm-1, logical block 23103247
Sep 13 22:30:51 pve kernel: [ 4606.276889] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10 writing to inode 3670703 (offset 0 size 0 starting block 6802091)
Sep 13 22:30:51 pve kernel: [ 4606.276893] Buffer I/O error on device dm-1, logical block 6802091
Sep 13 22:30:51 pve kernel: [ 4606.276909] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10
writing to inode 3670700 (offset 0 size 0 starting block 5330919)
Sep 13 22:30:51 pve kernel: [ 4606.276910] Buffer I/O error on device dm-1, logical block 6802092
Sep 13 22:30:51 pve kernel: [ 4606.276915] Buffer I/O error on device dm-1, logical block 6802093
Sep 13 22:30:51 pve kernel: [ 4606.276927] Buffer I/O error on device dm-1, logical block 5330919
Sep 13 22:30:51 pve kernel: [ 4606.276933] Buffer I/O error on device dm-1, logical block 6802094
Sep 13 22:30:51 pve kernel: [ 4606.276949] Buffer I/O error on device dm-1, logical block 6802095
Sep 13 22:30:51 pve kernel: [ 4606.277000] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10 writing to inode 3670690 (offset 0 size 0 starting block 6905115)
Sep 13 22:30:51 pve kernel: [ 4606.277006] Buffer I/O error on device dm-1, logical block 6905115
Sep 13 22:30:51 pve kernel: [ 4606.277152] EXT4-fs warning (device dm-1): ext4_end_bio:315: I/O error 10 writing to inode 5771275 (offset 1572864 size 192512 starting block 100724608)

I'm searching for this, but my gut tells me it's not good...
 

donutgonuts

New Member
Jul 11, 2021
4
0
1
34
A 95% chance its a bad disk. 5% chance that its the PCI slot


Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Looks like the M.2 controller is dead. Changing to another M.2 ssd didn't help and the M.2 slot is not detecting anything in BIOS. And this is the second B460 mobo I've tried. Exactly the same issue as on the first one. Both ssds are working fine on an old B85 mobo.

I can't help but wonder if it's Proxmox that destroyed my mobo...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!