I/O Errors when there is heavy disk activity

xlogic11

Member
Apr 9, 2021
9
0
6
71
Every time the system attempts to do a Proxmox backup I get multiple disks errors. I have the same problem sometimes in the middle of the night when the PLEX PVE runs some of it's maintenance routines. The WEB interface does not respond and it requires a full server reboot. After that it will run for days if not weeks at a time.

Jan 18 17:14:01 xlogic systemd[1]: pvesr.service: Succeeded.
Jan 18 17:14:01 xlogic systemd[1]: Started Proxmox VE replication runner.
Jan 18 17:15:00 xlogic systemd[1]: Starting Proxmox VE replication runner...
Jan 18 17:15:01 xlogic systemd[1]: pvesr.service: Succeeded.
Jan 18 17:15:01 xlogic systemd[1]: Started Proxmox VE replication runner.
Jan 18 17:15:34 xlogic smartd[1132]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 76 to 77
Jan 18 17:16:00 xlogic systemd[1]: Starting Proxmox VE replication runner...
Jan 18 17:16:01 xlogic systemd[1]: pvesr.service: Succeeded.
Jan 18 17:16:01 xlogic systemd[1]: Started Proxmox VE replication runner.
Jan 18 17:16:40 xlogic kernel: ata10: link is slow to respond, please be patient (ready=0)
Jan 18 17:16:45 xlogic kernel: ata10: COMRESET failed (errno=-16)
Jan 18 17:16:50 xlogic kernel: ata10: link is slow to respond, please be patient (ready=0)
Jan 18 17:16:55 xlogic kernel: ata10: COMRESET failed (errno=-16)
Jan 18 17:17:00 xlogic kernel: ata10: link is slow to respond, please be patient (ready=0)
Jan 18 17:17:00 xlogic systemd[1]: Starting Proxmox VE replication runner...
Jan 18 17:17:01 xlogic CRON[26421]: pam_unix(cron:session): session opened for user root by (uid=0)
Jan 18 17:17:01 xlogic CRON[26422]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jan 18 17:17:01 xlogic CRON[26421]: pam_unix(cron:session): session closed for user root
Jan 18 17:17:01 xlogic systemd[1]: pvesr.service: Succeeded.
Jan 18 17:17:01 xlogic systemd[1]: Started Proxmox VE replication runner.
Jan 18 17:17:06 xlogic kernel: ata7.00: exception Emask 0x0 SAct 0x80 SErr 0x0 action 0x6 frozen
Jan 18 17:17:06 xlogic kernel: ata7.00: failed command: READ FPDMA QUEUED
Jan 18 17:17:06 xlogic kernel: ata7.00: cmd 60/00:38:80:38:2e/02:00:01:00:00/40 tag 7 ncq dma 262144 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 18 17:17:06 xlogic kernel: ata7.00: status: { DRDY }
Jan 18 17:17:06 xlogic kernel: ata7: hard resetting link
Jan 18 17:17:07 xlogic kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 18 17:17:12 xlogic kernel: ata7.00: qc timeout (cmd 0xec)
Jan 18 17:17:12 xlogic kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 18 17:17:12 xlogic kernel: ata7.00: revalidation failed (errno=-5)
Jan 18 17:17:12 xlogic kernel: ata7: hard resetting link
Jan 18 17:17:13 xlogic kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 18 17:17:16 xlogic kernel: ata8.00: exception Emask 0x0 SAct 0x2000 SErr 0x0 action 0x6 frozen
Jan 18 17:17:16 xlogic kernel: ata8.00: failed command: WRITE FPDMA QUEUED
Jan 18 17:17:16 xlogic kernel: ata8.00: cmd 61/08:68:78:b6:93/00:00:51:01:00/40 tag 13 ncq dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 18 17:17:16 xlogic kernel: ata8.00: status: { DRDY }
Jan 18 17:17:16 xlogic kernel: ata8: hard resetting link
Jan 18 17:17:17 xlogic kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 18 17:17:22 xlogic kernel: ata8.00: qc timeout (cmd 0xec)
Jan 18 17:17:22 xlogic kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 18 17:17:22 xlogic kernel: ata8.00: revalidation failed (errno=-5)
Jan 18 17:17:22 xlogic kernel: ata8: hard resetting link
Jan 18 17:17:23 xlogic kernel: ata7.00: qc timeout (cmd 0xec)
Jan 18 17:17:23 xlogic kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Jan 18 17:17:24 xlogic kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 18 17:17:24 xlogic kernel: ata7.00: revalidation failed (errno=-5)
Jan 18 17:17:24 xlogic kernel: ata7: limiting SATA link speed to 3.0 Gbps
Jan 18 17:17:24 xlogic kernel: ata7: hard resetting link
Jan 18 17:17:24 xlogic kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jan 18 17:17:30 xlogic kernel: ata10: COMRESET failed (errno=-16)
Jan 18 17:17:30 xlogic kernel: ata10: limiting SATA link speed to 3.0 Gbps
Jan 18 17:17:33 xlogic kernel: ata8.00: qc timeout (cmd 0xec)
Jan 18 17:17:34 xlogic kernel: ata8.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 18 17:17:34 xlogic kernel: ata8.00: revalidation failed (errno=-5)
Jan 18 17:17:34 xlogic kernel: ata8: limiting SATA link speed to 3.0 Gbps
Jan 18 17:17:34 xlogic kernel: ata8: hard resetting link
Jan 18 17:17:35 xlogic kernel: ata8: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jan 18 17:17:35 xlogic kernel: ata10: COMRESET failed (errno=-16)
Jan 18 17:17:35 xlogic kernel: ata10: reset failed, giving up
Jan 18 17:17:35 xlogic kernel: ata10.00: disabled
Jan 18 17:17:35 xlogic kernel: sd 10:0:0:0: [sdf] tag#15 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:17:35 xlogic kernel: sd 10:0:0:0: [sdf] tag#15 CDB: Read(10) 28 00 00 00 00 00 00 01 00 00
Jan 18 17:17:35 xlogic kernel: blk_update_request: I/O error, dev sdf, sector 0 op 0x0:(READ) flags 0x0 phys_seg 32 prio class 0
Jan 18 17:17:35 xlogic smartd[1132]: Device: /dev/sdf [SAT], failed to read SMART Attribute Data
Jan 18 17:17:35 xlogic smartd[1132]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
Jan 18 17:17:35 xlogic smartd[1132]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
Jan 18 17:17:35 xlogic smartd[1132]: Device: /dev/sdf [SAT], Read SMART Self Test Log Failed
Jan 18 17:17:35 xlogic smartd[1132]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
Jan 18 17:17:35 xlogic postfix/pickup[18491]: B017D88CF0: uid=0 from=<root>
Jan 18 17:17:35 xlogic postfix/cleanup[26534]: B017D88CF0: message-id=<20220118221735.B017D88CF0@xlogic.local>
Jan 18 17:17:35 xlogic postfix/qmgr[1493]: B017D88CF0: from=<root@xlogic.local>, size=937, nrcpt=1 (queue active)
Jan 18 17:17:35 xlogic postfix/pickup[18491]: B524388CFB: uid=0 from=<root>
Jan 18 17:17:35 xlogic postfix/cleanup[26534]: B524388CFB: message-id=<20220118221735.B524388CFB@xlogic.local>
Jan 18 17:17:35 xlogic postfix/qmgr[1493]: B524388CFB: from=<root@xlogic.local>, size=940, nrcpt=1 (queue active)
Jan 18 17:17:35 xlogic smartd[1132]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
Jan 18 17:17:35 xlogic smartd[1132]: Device: /dev/sdf [SAT], Read Summary SMART Error Log failed
Jan 18 17:17:35 xlogic smartd[1132]: Sending warning via /usr/share/smartmontools/smartd-runner to root ...
Jan 18 17:17:35 xlogic postfix/pickup[18491]: BB68588D1A: uid=0 from=<root>
Jan 18 17:17:35 xlogic postfix/cleanup[26534]: BB68588D1A: message-id=<20220118221735.BB68588D1A@xlogic.local>
Jan 18 17:17:35 xlogic postfix/qmgr[1493]: BB68588D1A: from=<root@xlogic.local>, size=1013, nrcpt=1 (queue active)
Jan 18 17:17:35 xlogic smartd[1132]: Warning via /usr/share/smartmontools/smartd-runner to root: successful
.....
Jan 18 17:17:55 xlogic kernel: ata7.00: qc timeout (cmd 0xec)
Jan 18 17:17:55 xlogic kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 18 17:17:55 xlogic kernel: ata7.00: revalidation failed (errno=-5)
Jan 18 17:17:55 xlogic kernel: ata7.00: disabled
Jan 18 17:17:57 xlogic kernel: ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 320)
Jan 18 17:17:57 xlogic kernel: ata7: EH complete
Jan 18 17:17:57 xlogic kernel: sd 7:0:0:0: [sdc] tag#30 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:17:57 xlogic kernel: sd 7:0:0:0: [sdc] tag#30 CDB: Read(10) 28 00 01 2e 38 80 00 02 00 00
Jan 18 17:17:57 xlogic kernel: blk_update_request: I/O error, dev sdc, sector 19806336 op 0x0:(READ) flags 0x0 phys_seg 63 prio class 0
Jan 18 17:17:58 xlogic kernel: sd 7:0:0:0: [sdc] tag#15 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:17:58 xlogic kernel: sd 7:0:0:0: [sdc] tag#15 CDB: Read(10) 28 00 01 2e 38 80 00 02 00 00
Jan 18 17:17:58 xlogic kernel: blk_update_request: I/O error, dev sdc, sector 19806336 op 0x0:(READ) flags 0x0 phys_seg 63 prio class 0
Jan 18 17:17:59 xlogic kernel: sd 7:0:0:0: [sdc] tag#7 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:17:59 xlogic kernel: sd 7:0:0:0: [sdc] tag#7 CDB: Read(10) 28 00 01 2e 38 80 00 02 00 00
Jan 18 17:17:59 xlogic kernel: blk_update_request: I/O error, dev sdc, sector 19806336 op 0x0:(READ) flags 0x0 phys_seg 63 prio class 0
Jan 18 17:18:00 xlogic kernel: sd 7:0:0:0: [sdc] tag#23 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:18:00 xlogic kernel: sd 7:0:0:0: [sdc] tag#23 CDB: Read(10) 28 00 01 2e 38 80 00 02 00 00
Jan 18 17:18:00 xlogic kernel: blk_update_request: I/O error, dev sdc, sector 19806336 op 0x0:(READ) flags 0x0 phys_seg 63 prio class 0
Jan 18 17:18:00 xlogic systemd[1]: Starting Proxmox VE replication runner...
Jan 18 17:18:01 xlogic systemd[1]: pvesr.service: Succeeded.
Jan 18 17:18:01 xlogic systemd[1]: Started Proxmox VE replication runner.
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#31 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#31 CDB: Read(10) 28 00 01 2e 38 80 00 02 00 00
Jan 18 17:18:01 xlogic kernel: blk_update_request: I/O error, dev sdc, sector 19806336 op 0x0:(READ) flags 0x0 phys_seg 63 prio class 0
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#12 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#12 CDB: Write(10) 2a 00 01 2e 36 40 00 02 08 00
Jan 18 17:18:01 xlogic kernel: blk_update_request: I/O error, dev sdc, sector 19805760 op 0x1:(WRITE) flags 0x8800 phys_seg 64 prio class 0
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#13 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#13 CDB: Write(10) 2a 00 01 2e 38 48 00 00 78 00
Jan 18 17:18:01 xlogic kernel: blk_update_request: I/O error, dev sdc, sector 19806280 op 0x1:(WRITE) flags 0x8800 phys_seg 14 prio class 0
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#3 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#3 CDB: Write(10) 2a 00 01 2e 38 c0 00 01 80 00
Jan 18 17:18:01 xlogic kernel: blk_update_request: I/O error, dev sdc, sector 19806400 op 0x1:(WRITE) flags 0x8800 phys_seg 46 prio class 0
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#6 FAILED Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Jan 18 17:18:01 xlogic kernel: sd 7:0:0:0: [sdc] tag#6 CDB: Write(10) 2a 00 01 2e 3a 40 00 01 00 00
Jan 18 17:18:01 xlogic kernel: blk_update_request: I/O error, dev sdc, sector 19806784 op 0x1:(WRITE) flags 0x8800 phys_seg 31 prio class 0
 
Are you maybe using a SMR HDD that gets unresponsive on high writes?
Did you check the SMART attributes of the disk if your disk is healthy?
 
The WEB gui was locked solid so I was unable to check the hard disks.

I did think of the SMR problem but both of my hard disks are CMR - WD40EFRX and WD80EFAX.

One possible issue is that my storage connection for the backup is to TrueNAS running as a VM. The storage type is CIFS ... Should I use another storage type to connect for the TrueNAS disk for backups ??
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!