Hardware or software issue - Disk failure

xlogic11

Member
Apr 9, 2021
9
0
6
70
I have a complete Sata disk failure that happens weeks if not months apart and requires a full server boot and Proxmox restart to recover. After the restart I'm good for sometimes 3 to 4 months but at other times once a week. Is this a hardware issue or a Proxmox issue. I run the boot and VM SDD's off the motherboard Sata controller and the Z2 array of harddisks off the HBA. Could it be possible as the result of a power spike ??

Dual ZEON ASRock motherboard - 64GB ECC memory
DEll H310 HBA LSI 9211-8i running in IT Mode
TRUENAS running as a Proxmox virtual



Feb 03 22:48:00 xlogic systemd[1]: Starting Proxmox VE replication runner...
Feb 03 22:48:00 xlogic systemd[1]: pvesr.service: Succeeded.
Feb 03 22:48:00 xlogic systemd[1]: Started Proxmox VE replication runner.
Feb 03 22:49:00 xlogic systemd[1]: Starting Proxmox VE replication runner...
Feb 03 22:49:00 xlogic systemd[1]: pvesr.service: Succeeded.
Feb 03 22:49:00 xlogic systemd[1]: Started Proxmox VE replication runner.
Feb 03 22:49:33 xlogic kernel: ata10.00: exception Emask 0x0 SAct 0xf000000 SErr 0x0 action 0x6 frozen
Feb 03 22:49:33 xlogic kernel: ata10.00: failed command: WRITE FPDMA QUEUED
Feb 03 22:49:33 xlogic kernel: ata10.00: cmd 61/08:c0:90:2e:00/00:00:04:00:00/40 tag 24 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 03 22:49:33 xlogic kernel: ata10.00: status: { DRDY }
Feb 03 22:49:33 xlogic kernel: ata10.00: failed command: WRITE FPDMA QUEUED
Feb 03 22:49:33 xlogic kernel: ata10.00: cmd 61/02:c8:b1:13:f9/00:00:0d:00:00/40 tag 25 ncq dma 1024 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 03 22:49:33 xlogic kernel: ata10.00: status: { DRDY }
Feb 03 22:49:33 xlogic kernel: ata10.00: failed command: WRITE FPDMA QUEUED
Feb 03 22:49:33 xlogic kernel: ata10.00: cmd 61/08:d0:d0:13:f9/00:00:0d:00:00/40 tag 26 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 03 22:49:33 xlogic kernel: ata10.00: status: { DRDY }
Feb 03 22:49:33 xlogic kernel: ata10.00: failed command: WRITE FPDMA QUEUED
Feb 03 22:49:33 xlogic kernel: ata10.00: cmd 61/08:d8:e0:13:f9/00:00:0d:00:00/40 tag 27 ncq dma 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 03 22:49:33 xlogic kernel: ata10.00: status: { DRDY }
Feb 03 22:49:33 xlogic kernel: ata10: hard resetting link
Feb 03 22:49:38 xlogic kernel: ata10: link is slow to respond, please be patient (ready=0)
Feb 03 22:49:43 xlogic kernel: ata10: COMRESET failed (errno=-16)
Feb 03 22:49:43 xlogic kernel: ata10: hard resetting link
Feb 03 22:49:48 xlogic kernel: ata10: link is slow to respond, please be patient (ready=0)
Feb 03 22:49:53 xlogic kernel: ata10: COMRESET failed (errno=-16)
Feb 03 22:49:53 xlogic kernel: ata10: hard resetting link
Feb 03 22:49:58 xlogic kernel: ata10: link is slow to respond, please be patient (ready=0)
Feb 03 22:50:00 xlogic systemd[1]: Starting Proxmox VE replication runner...
Feb 03 22:50:00 xlogic systemd[1]: pvesr.service: Succeeded.
Feb 03 22:50:00 xlogic systemd[1]: Started Proxmox VE replication runner.
Feb 03 22:50:03 xlogic kernel: ata7.00: exception Emask 0x0 SAct 0x400000 SErr 0x0 action 0x6 frozen
Feb 03 22:50:03 xlogic kernel: ata7.00: failed command: READ FPDMA QUEUED
Feb 03 22:50:03 xlogic kernel: ata7.00: cmd 60/00:b0:80:8a:40/02:00:01:00:00/40 tag 22 ncq dma 262144 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 03 22:50:03 xlogic kernel: ata7.00: status: { DRDY }
Feb 03 22:50:03 xlogic kernel: ata7: hard resetting link
Feb 03 22:50:04 xlogic kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 03 22:50:09 xlogic kernel: ata7.00: qc timeout (cmd 0xec)
Feb 03 22:50:10 xlogic kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb 03 22:50:10 xlogic kernel: ata7.00: revalidation failed (errno=-5)
Feb 03 22:50:10 xlogic kernel: ata7: hard resetting link
Feb 03 22:50:10 xlogic kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Feb 03 22:50:21 xlogic kernel: ata7.00: qc timeout (cmd 0xec)
Feb 03 22:50:21 xlogic kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb 03 22:50:21 xlogic kernel: ata7.00: revalidation failed (errno=-5)
Feb 03 22:50:21 xlogic kernel: ata7: limiting SATA link speed to 3.0 Gbps
Feb 03 22:50:21 xlogic kernel: ata7: hard resetting link
Feb 03 22:50:22 xlogic kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Feb 03 22:50:28 xlogic kernel: ata10: COMRESET failed (errno=-16)
Feb 03 22:50:28 xlogic kernel: ata10: limiting SATA link speed to 3.0 Gbps
Feb 03 22:50:28 xlogic kernel: ata10: hard resetting link
Feb 03 22:50:33 xlogic kernel: ata10: COMRESET failed (errno=-16)
Feb 03 22:50:33 xlogic kernel: ata10: reset failed, giving up
Feb 03 22:50:33 xlogic kernel: ata10.00: disabled
Feb 03 22:50:33 xlogic kernel: ata10: EH complete
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#0 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#29 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#29 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#31 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#1 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#31 CDB: Read(10) 28 00 2a 2a 0f d0 00 00 20 00
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 707399632 op 0x0:(READ) flags 0x0 phys_seg 3 prio class 0
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#1 CDB: Read(10) 28 00 00 00 08 00 00 01 00 00
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#2 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#2 CDB: Write(10) 2a 00 0d f9 22 20 00 00 08 00
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 234431008 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#3 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#3 CDB: Write(10) 2a 00 0e 0a 2a f0 00 00 20 00
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 235547376 op 0x1:(WRITE) flags 0x8800 phys_seg 2 prio class 0
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#4 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#4 CDB: Write(10) 2a 00 0e 0a 3e a0 00 00 10 00
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 235552416 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#6 CDB: Write(10) 2a 00 0e 11 30 38 00 00 08 00
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 236007480 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#7 CDB: Write(10) 2a 00 0e 16 a4 50 00 00 20 00
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 236364880 op 0x1:(WRITE) flags 0x8800 phys_seg 4 prio class 0
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#8 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:33 xlogic kernel: sd 11:0:0:0: [sdm] tag#8 CDB: Write(10) 2a 00 12 0a 3e a8 00 00 08 00
Feb 03 22:50:33 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 302661288 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Feb 03 22:50:52 xlogic kernel: ata7.00: qc timeout (cmd 0xec)
Feb 03 22:50:53 xlogic kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Feb 03 22:50:53 xlogic kernel: ata7.00: revalidation failed (errno=-5)
Feb 03 22:50:53 xlogic kernel: ata7.00: disabled
Feb 03 22:50:54 xlogic kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Feb 03 22:50:55 xlogic kernel: ata7: EH complete
Feb 03 22:50:55 xlogic kernel: scsi_io_completion_action: 165 callbacks suppressed
Feb 03 22:50:55 xlogic kernel: print_req_error: 166 callbacks suppressed
Feb 03 22:50:55 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 21006976 op 0x0:(READ) flags 0x0 phys_seg 64 prio class 0
Feb 03 22:50:55 xlogic kernel: sd 8:0:0:0: [sdk] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:55 xlogic kernel: sd 8:0:0:0: [sdk] tag#12 CDB: Write(10) 2a 00 00 de 29 d0 00 00 08 00
Feb 03 22:50:55 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 14559696 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Feb 03 22:50:55 xlogic kernel: sd 8:0:0:0: [sdk] tag#8 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:55 xlogic kernel: sd 8:0:0:0: [sdk] tag#8 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Feb 03 22:50:55 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Feb 03 22:50:55 xlogic kernel: sd 8:0:0:0: [sdk] tag#9 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:55 xlogic kernel: sd 8:0:0:0: [sdk] tag#9 CDB: Read(10) 28 00 00 00 00 22 00 01 00 00
Feb 03 22:50:55 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 34 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Feb 03 22:50:55 xlogic kernel: sd 8:0:0:0: [sdk] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:55 xlogic kernel: sd 8:0:0:0: [sdk] tag#14 CDB: Read(10) 28 00 00 00 80 00 00 01 00 00
Feb 03 22:50:55 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 32768 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Feb 03 22:50:55 xlogic kernel: sd 11:0:0:0: [sdm] tag#3 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:55 xlogic kernel: sd 11:0:0:0: [sdm] tag#3 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Feb 03 22:50:55 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Feb 03 22:50:55 xlogic kernel: sd 11:0:0:0: [sdm] tag#4 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:55 xlogic kernel: sd 11:0:0:0: [sdm] tag#4 CDB: Read(10) 28 00 00 00 08 00 00 01 00 00
Feb 03 22:50:55 xlogic kernel: blk_update_request: I/O error, dev sdm, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Feb 03 22:50:56 xlogic kernel: sd 8:0:0:0: [sdk] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:56 xlogic kernel: sd 8:0:0:0: [sdk] tag#12 CDB: Read(10) 28 00 01 40 8a 80 00 02 00 00
Feb 03 22:50:56 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 21006976 op 0x0:(READ) flags 0x0 phys_seg 64 prio class 0
Feb 03 22:50:56 xlogic kernel: sd 8:0:0:0: [sdk] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:56 xlogic kernel: sd 8:0:0:0: [sdk] tag#13 CDB: Write(10) 2a 00 00 de 29 d0 00 00 08 00
Feb 03 22:50:56 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 14559696 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Feb 03 22:50:57 xlogic kernel: sd 8:0:0:0: [sdk] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:57 xlogic kernel: sd 8:0:0:0: [sdk] tag#7 CDB: Read(10) 28 00 01 40 8a 80 00 02 00 00
Feb 03 22:50:57 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 21006976 op 0x0:(READ) flags 0x0 phys_seg 64 prio class 0
Feb 03 22:50:57 xlogic kernel: sd 8:0:0:0: [sdk] tag#14 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:50:57 xlogic kernel: sd 8:0:0:0: [sdk] tag#14 CDB: Write(10) 2a 00 00 de 29 d0 00 00 08 00
Feb 03 22:51:00 xlogic systemd[1]: Starting Proxmox VE replication runner...
Feb 03 22:51:00 xlogic kernel: scsi_io_completion_action: 7 callbacks suppressed
Feb 03 22:51:00 xlogic kernel: sd 8:0:0:0: [sdk] tag#3 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:51:00 xlogic kernel: sd 8:0:0:0: [sdk] tag#24 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:51:00 xlogic kernel: sd 8:0:0:0: [sdk] tag#24 CDB: Write(10) 2a 00 00 de 45 50 00 00 70 00
Feb 03 22:51:00 xlogic kernel: sd 8:0:0:0: [sdk] tag#3 CDB: Write(10) 2a 00 00 cb 4a 90 00 02 00 00
Feb 03 22:51:00 xlogic kernel: print_req_error: 8 callbacks suppressed
Feb 03 22:51:00 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 13322896 op 0x1:(WRITE) flags 0x800 phys_seg 64 prio class 0
Feb 03 22:51:00 xlogic kernel: sd 8:0:0:0: [sdk] tag#25 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 03 22:51:00 xlogic kernel: sd 8:0:0:0: [sdk] tag#25 CDB: Write(10) 2a 00 00 5e 10 40 00 00 08 00
Feb 03 22:51:00 xlogic kernel: blk_update_request: I/O error, dev sdk, sector 6164544 op 0x1:(WRITE) flags 0x8800 phys_seg 1 prio class 0
Feb 03 22:51:00 xlogic systemd[1]: pvesr.service: Succeeded.
Feb 03 22:51:00 xlogic systemd[1]: Started Proxmox VE replication runner.
 
bad news, this implies that both /dev/sdk and /dev/sdm are failing.

run full smart test on both. replace the fail(ing) drives. if smart comes clean you have a problem with your cabling/midplane.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!