HI,
My pve hang from time to time. I could said it happen 1once a month.
It happened tonight and the solution that I found working is a reboot of the server
From the error message it seems like I need to change one or more drives.
But after running tests on them, they seems healthy,
I would like to know what could cause the IO hang and how can I solve it.
Thanks for your help.
Description PowerEdge R610
BIOS Version 6.6.0
Lifecycle Controller Firmware 1.7.5.4
Raid card : LSI SAS2008-IT
4X 146Gb drive installed PVE in ZFS RAID10
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)
In the attachement you can see what the screen looked like before rebooting.
--------------------------------
print_req_error: I/O error, dev sda, sector XXXXXXXXX flag 701
INFO: task z_wr_iss:XXX blocked for more than 120 seconds
Tainted : P IO 5.0.15-1 pve #1
"echo 0 > /proc/sys/kernel/hung_task_timout_secs" disable this message
--------------------------------
Looking at the kernel logs after reboot :
--------------------------------
root@pve:~# cat /var/log/kern.log.1 | grep sda
Feb 4 15:51:17 pve kernel: [1996460.343206] sd 2:0:0:0: [sda] tag#1247 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 4 15:51:17 pve kernel: [1996460.343211] sd 2:0:0:0: [sda] tag#1247 CDB: Write(10) 2a 00 00 f3 00 78 00 01 30 00
Feb 4 15:51:17 pve kernel: [1996460.343213] print_req_error: I/O error, dev sda, sector 15925368 flags 701
Feb 6 03:43:02 pve kernel: [2125558.892570] sd 2:0:0:0: [sda] tag#2887 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 6 03:43:02 pve kernel: [2125558.892575] sd 2:0:0:0: [sda] tag#2887 CDB: Write(10) 2a 00 08 49 8f 98 00 00 40 00
Feb 6 03:43:02 pve kernel: [2125558.892577] print_req_error: I/O error, dev sda, sector 139038616 flags 701
Feb 7 09:55:06 pve kernel: [2234278.018079] sd 2:0:0:0: [sda] tag#623 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 09:55:06 pve kernel: [2234278.018087] sd 2:0:0:0: [sda] tag#623 CDB: Write(10) 2a 00 0a 9c 59 00 00 00 30 00
Feb 7 09:55:06 pve kernel: [2234278.018089] print_req_error: I/O error, dev sda, sector 178018560 flags 701
Feb 7 11:28:01 pve kernel: [2239853.255288] sd 2:0:0:0: [sda] tag#727 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 11:28:01 pve kernel: [2239853.255337] sd 2:0:0:0: [sda] tag#727 CDB: Write(10) 2a 00 02 09 d2 40 00 01 00 00
Feb 7 11:28:01 pve kernel: [2239853.255356] print_req_error: I/O error, dev sda, sector 34198080 flags 701
--------------------------------
root@pve:/etc# zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:05:24 with 0 errors on Sun Nov 10 00:29:26 2019
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-35000c5000c6c5c87-part3 ONLINE 0 0 0
scsi-35000c5000ad66403-part3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
scsi-35000c5000eeb5dcb ONLINE 0 0 0
scsi-35000c5000ad664b3 ONLINE 0 0 0
errors: No known data errors
#fdisk -l
Disk /dev/sda: 136.8 GiB, 146815733760 bytes, 286749480 sectors
Disk model: ST9146802SS
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 35B15105-5C87-46D5-9078-ABFED8B227AD
Device Start End Sectors Size Type
/dev/sda1 34 2047 2014 1007K BIOS boot
/dev/sda2 2048 1050623 1048576 512M EFI System
/dev/sda3 1050624 286749446 285698823 136.2G Solaris /usr & Apple ZFS
root@pve:~# smartctl -l selftest /dev/sda
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.0.15-1-pve] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed - 50761 - [- - -]
# 2 Background short Completed - 50759 - [- - -]
# 3 Background short Completed - 46771 - [- - -]
# 4 Background short Completed - 40208 - [- - -]
# 5 Background short Completed - 40181 - [- - -]
# 6 Background long Completed - 0 - [- - -]
# 7 Background short Completed - 0 - [- - -]
Long (extended) Self-test duration: 2070 seconds [34.5 minutes]
root@pve:/etc# cat /var/log/kern.log.1
Feb 3 09:28:31 pve kernel: [1887098.503308] perf: interrupt took too long (17941 > 17835), lowering kernel.perf_event_max_sample_rate to 11000
Feb 4 15:51:17 pve kernel: [1996460.343190] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 4 15:51:17 pve kernel: [1996460.343206] sd 2:0:0:0: [sda] tag#1247 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 4 15:51:17 pve kernel: [1996460.343211] sd 2:0:0:0: [sda] tag#1247 CDB: Write(10) 2a 00 00 f3 00 78 00 01 30 00
Feb 4 15:51:17 pve kernel: [1996460.343213] print_req_error: I/O error, dev sda, sector 15925368 flags 701
Feb 4 15:51:17 pve kernel: [1996460.343258] zio pool=rpool vdev=/dev/disk/by-id/scsi-35000c5000c6c5c87-part3 error=5 type=2 offset=7615868928 size=155648 flags=40080c80
Feb 6 03:02:47 pve kernel: [2123144.590392] device tap103i0 entered promiscuous mode
Feb 6 03:43:02 pve kernel: [2125558.892553] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 6 03:43:02 pve kernel: [2125558.892570] sd 2:0:0:0: [sda] tag#2887 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 6 03:43:02 pve kernel: [2125558.892575] sd 2:0:0:0: [sda] tag#2887 CDB: Write(10) 2a 00 08 49 8f 98 00 00 40 00
Feb 6 03:43:02 pve kernel: [2125558.892577] print_req_error: I/O error, dev sda, sector 139038616 flags 701
Feb 6 03:43:02 pve kernel: [2125558.892623] zio pool=rpool vdev=/dev/disk/by-id/scsi-35000c5000c6c5c87-part3 error=5 type=2 offset=70649851904 size=32768 flags=40080c80
Feb 7 09:55:06 pve kernel: [2234278.018031] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 7 09:55:06 pve kernel: [2234278.018079] sd 2:0:0:0: [sda] tag#623 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 09:55:06 pve kernel: [2234278.018087] sd 2:0:0:0: [sda] tag#623 CDB: Write(10) 2a 00 0a 9c 59 00 00 00 30 00
Feb 7 09:55:06 pve kernel: [2234278.018089] print_req_error: I/O error, dev sda, sector 178018560 flags 701
Feb 7 09:55:06 pve kernel: [2234278.018137] zio pool=rpool vdev=/dev/disk/by-id/scsi-35000c5000c6c5c87-part3 error=5 type=2 offset=90607583232 size=24576 flags=40080c80
Feb 7 11:28:01 pve kernel: [2239853.255228] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 7 11:28:01 pve kernel: [2239853.255288] sd 2:0:0:0: [sda] tag#727 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 11:28:01 pve kernel: [2239853.255337] sd 2:0:0:0: [sda] tag#727 CDB: Write(10) 2a 00 02 09 d2 40 00 01 00 00
Feb 7 11:28:01 pve kernel: [2239853.255356] print_req_error: I/O error, dev sda, sector 34198080 flags 701
Feb 7 11:28:01 pve kernel: [2239853.255391] zio pool=rpool vdev=/dev/disk/by-id/scsi-35000c5000c6c5c87-part3 error=5 type=2 offset=16971497472 size=131072 flags=40080c80
root@pve:~# cat /var/log/kern.log.1 | grep sda
Feb 4 15:51:17 pve kernel: [1996460.343206] sd 2:0:0:0: [sda] tag#1247 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 4 15:51:17 pve kernel: [1996460.343211] sd 2:0:0:0: [sda] tag#1247 CDB: Write(10) 2a 00 00 f3 00 78 00 01 30 00
Feb 4 15:51:17 pve kernel: [1996460.343213] print_req_error: I/O error, dev sda, sector 15925368 flags 701
Feb 6 03:43:02 pve kernel: [2125558.892570] sd 2:0:0:0: [sda] tag#2887 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 6 03:43:02 pve kernel: [2125558.892575] sd 2:0:0:0: [sda] tag#2887 CDB: Write(10) 2a 00 08 49 8f 98 00 00 40 00
Feb 6 03:43:02 pve kernel: [2125558.892577] print_req_error: I/O error, dev sda, sector 139038616 flags 701
Feb 7 09:55:06 pve kernel: [2234278.018079] sd 2:0:0:0: [sda] tag#623 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 09:55:06 pve kernel: [2234278.018087] sd 2:0:0:0: [sda] tag#623 CDB: Write(10) 2a 00 0a 9c 59 00 00 00 30 00
Feb 7 09:55:06 pve kernel: [2234278.018089] print_req_error: I/O error, dev sda, sector 178018560 flags 701
Feb 7 11:28:01 pve kernel: [2239853.255288] sd 2:0:0:0: [sda] tag#727 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 11:28:01 pve kernel: [2239853.255337] sd 2:0:0:0: [sda] tag#727 CDB: Write(10) 2a 00 02 09 d2 40 00 01 00 00
Feb 7 11:28:01 pve kernel: [2239853.255356] print_req_error: I/O error, dev sda, sector 34198080 flags 701
My pve hang from time to time. I could said it happen 1once a month.
It happened tonight and the solution that I found working is a reboot of the server
From the error message it seems like I need to change one or more drives.
But after running tests on them, they seems healthy,
I would like to know what could cause the IO hang and how can I solve it.
Thanks for your help.
Description PowerEdge R610
BIOS Version 6.6.0
Lifecycle Controller Firmware 1.7.5.4
Raid card : LSI SAS2008-IT
4X 146Gb drive installed PVE in ZFS RAID10
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)
In the attachement you can see what the screen looked like before rebooting.
--------------------------------
print_req_error: I/O error, dev sda, sector XXXXXXXXX flag 701
INFO: task z_wr_iss:XXX blocked for more than 120 seconds
Tainted : P IO 5.0.15-1 pve #1
"echo 0 > /proc/sys/kernel/hung_task_timout_secs" disable this message
--------------------------------
Looking at the kernel logs after reboot :
--------------------------------
root@pve:~# cat /var/log/kern.log.1 | grep sda
Feb 4 15:51:17 pve kernel: [1996460.343206] sd 2:0:0:0: [sda] tag#1247 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 4 15:51:17 pve kernel: [1996460.343211] sd 2:0:0:0: [sda] tag#1247 CDB: Write(10) 2a 00 00 f3 00 78 00 01 30 00
Feb 4 15:51:17 pve kernel: [1996460.343213] print_req_error: I/O error, dev sda, sector 15925368 flags 701
Feb 6 03:43:02 pve kernel: [2125558.892570] sd 2:0:0:0: [sda] tag#2887 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 6 03:43:02 pve kernel: [2125558.892575] sd 2:0:0:0: [sda] tag#2887 CDB: Write(10) 2a 00 08 49 8f 98 00 00 40 00
Feb 6 03:43:02 pve kernel: [2125558.892577] print_req_error: I/O error, dev sda, sector 139038616 flags 701
Feb 7 09:55:06 pve kernel: [2234278.018079] sd 2:0:0:0: [sda] tag#623 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 09:55:06 pve kernel: [2234278.018087] sd 2:0:0:0: [sda] tag#623 CDB: Write(10) 2a 00 0a 9c 59 00 00 00 30 00
Feb 7 09:55:06 pve kernel: [2234278.018089] print_req_error: I/O error, dev sda, sector 178018560 flags 701
Feb 7 11:28:01 pve kernel: [2239853.255288] sd 2:0:0:0: [sda] tag#727 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 11:28:01 pve kernel: [2239853.255337] sd 2:0:0:0: [sda] tag#727 CDB: Write(10) 2a 00 02 09 d2 40 00 01 00 00
Feb 7 11:28:01 pve kernel: [2239853.255356] print_req_error: I/O error, dev sda, sector 34198080 flags 701
--------------------------------
root@pve:/etc# zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 0 days 00:05:24 with 0 errors on Sun Nov 10 00:29:26 2019
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-35000c5000c6c5c87-part3 ONLINE 0 0 0
scsi-35000c5000ad66403-part3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
scsi-35000c5000eeb5dcb ONLINE 0 0 0
scsi-35000c5000ad664b3 ONLINE 0 0 0
errors: No known data errors
#fdisk -l
Disk /dev/sda: 136.8 GiB, 146815733760 bytes, 286749480 sectors
Disk model: ST9146802SS
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 35B15105-5C87-46D5-9078-ABFED8B227AD
Device Start End Sectors Size Type
/dev/sda1 34 2047 2014 1007K BIOS boot
/dev/sda2 2048 1050623 1048576 512M EFI System
/dev/sda3 1050624 286749446 285698823 136.2G Solaris /usr & Apple ZFS
root@pve:~# smartctl -l selftest /dev/sda
smartctl 7.0 2018-12-30 r4883 [x86_64-linux-5.0.15-1-pve] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed - 50761 - [- - -]
# 2 Background short Completed - 50759 - [- - -]
# 3 Background short Completed - 46771 - [- - -]
# 4 Background short Completed - 40208 - [- - -]
# 5 Background short Completed - 40181 - [- - -]
# 6 Background long Completed - 0 - [- - -]
# 7 Background short Completed - 0 - [- - -]
Long (extended) Self-test duration: 2070 seconds [34.5 minutes]
root@pve:/etc# cat /var/log/kern.log.1
Feb 3 09:28:31 pve kernel: [1887098.503308] perf: interrupt took too long (17941 > 17835), lowering kernel.perf_event_max_sample_rate to 11000
Feb 4 15:51:17 pve kernel: [1996460.343190] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 4 15:51:17 pve kernel: [1996460.343206] sd 2:0:0:0: [sda] tag#1247 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 4 15:51:17 pve kernel: [1996460.343211] sd 2:0:0:0: [sda] tag#1247 CDB: Write(10) 2a 00 00 f3 00 78 00 01 30 00
Feb 4 15:51:17 pve kernel: [1996460.343213] print_req_error: I/O error, dev sda, sector 15925368 flags 701
Feb 4 15:51:17 pve kernel: [1996460.343258] zio pool=rpool vdev=/dev/disk/by-id/scsi-35000c5000c6c5c87-part3 error=5 type=2 offset=7615868928 size=155648 flags=40080c80
Feb 6 03:02:47 pve kernel: [2123144.590392] device tap103i0 entered promiscuous mode
Feb 6 03:43:02 pve kernel: [2125558.892553] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 6 03:43:02 pve kernel: [2125558.892570] sd 2:0:0:0: [sda] tag#2887 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 6 03:43:02 pve kernel: [2125558.892575] sd 2:0:0:0: [sda] tag#2887 CDB: Write(10) 2a 00 08 49 8f 98 00 00 40 00
Feb 6 03:43:02 pve kernel: [2125558.892577] print_req_error: I/O error, dev sda, sector 139038616 flags 701
Feb 6 03:43:02 pve kernel: [2125558.892623] zio pool=rpool vdev=/dev/disk/by-id/scsi-35000c5000c6c5c87-part3 error=5 type=2 offset=70649851904 size=32768 flags=40080c80
Feb 7 09:55:06 pve kernel: [2234278.018031] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 7 09:55:06 pve kernel: [2234278.018079] sd 2:0:0:0: [sda] tag#623 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 09:55:06 pve kernel: [2234278.018087] sd 2:0:0:0: [sda] tag#623 CDB: Write(10) 2a 00 0a 9c 59 00 00 00 30 00
Feb 7 09:55:06 pve kernel: [2234278.018089] print_req_error: I/O error, dev sda, sector 178018560 flags 701
Feb 7 09:55:06 pve kernel: [2234278.018137] zio pool=rpool vdev=/dev/disk/by-id/scsi-35000c5000c6c5c87-part3 error=5 type=2 offset=90607583232 size=24576 flags=40080c80
Feb 7 11:28:01 pve kernel: [2239853.255228] mpt2sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Feb 7 11:28:01 pve kernel: [2239853.255288] sd 2:0:0:0: [sda] tag#727 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 11:28:01 pve kernel: [2239853.255337] sd 2:0:0:0: [sda] tag#727 CDB: Write(10) 2a 00 02 09 d2 40 00 01 00 00
Feb 7 11:28:01 pve kernel: [2239853.255356] print_req_error: I/O error, dev sda, sector 34198080 flags 701
Feb 7 11:28:01 pve kernel: [2239853.255391] zio pool=rpool vdev=/dev/disk/by-id/scsi-35000c5000c6c5c87-part3 error=5 type=2 offset=16971497472 size=131072 flags=40080c80
root@pve:~# cat /var/log/kern.log.1 | grep sda
Feb 4 15:51:17 pve kernel: [1996460.343206] sd 2:0:0:0: [sda] tag#1247 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 4 15:51:17 pve kernel: [1996460.343211] sd 2:0:0:0: [sda] tag#1247 CDB: Write(10) 2a 00 00 f3 00 78 00 01 30 00
Feb 4 15:51:17 pve kernel: [1996460.343213] print_req_error: I/O error, dev sda, sector 15925368 flags 701
Feb 6 03:43:02 pve kernel: [2125558.892570] sd 2:0:0:0: [sda] tag#2887 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 6 03:43:02 pve kernel: [2125558.892575] sd 2:0:0:0: [sda] tag#2887 CDB: Write(10) 2a 00 08 49 8f 98 00 00 40 00
Feb 6 03:43:02 pve kernel: [2125558.892577] print_req_error: I/O error, dev sda, sector 139038616 flags 701
Feb 7 09:55:06 pve kernel: [2234278.018079] sd 2:0:0:0: [sda] tag#623 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 09:55:06 pve kernel: [2234278.018087] sd 2:0:0:0: [sda] tag#623 CDB: Write(10) 2a 00 0a 9c 59 00 00 00 30 00
Feb 7 09:55:06 pve kernel: [2234278.018089] print_req_error: I/O error, dev sda, sector 178018560 flags 701
Feb 7 11:28:01 pve kernel: [2239853.255288] sd 2:0:0:0: [sda] tag#727 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK
Feb 7 11:28:01 pve kernel: [2239853.255337] sd 2:0:0:0: [sda] tag#727 CDB: Write(10) 2a 00 02 09 d2 40 00 01 00 00
Feb 7 11:28:01 pve kernel: [2239853.255356] print_req_error: I/O error, dev sda, sector 34198080 flags 701