Need help with I/O wait problem

hugosxm · Oct 12, 2015

Hi everybody, i know that this is not specific to proxmox but i am always finding good help here so i try anyway if the staff is ok

I have a proxmox running on a machine with 6 disk in raid10 + 1 spare
the disk were connected to a lsi 9211-8i and i always had strange logs so i decided to change the controller sas/sata cable, so i bought 2 lsi original cable then swapped with the new one

After that i loosed one disk... so i lauched a mdadm recovery by re-adding the disk then i loosed two during the recovery :S

so i decided to turn the server off, put every drive directly on the mother board and give a try like that : recovery done; no error...

My lsi was with the P20 firmware so i downgraded it to P18 that seems more reliable...

i did extended smarttest on each drive and everything is OK

so i was going to put back the disk on the HBA to give a try with the new firmware BUT :

since this morning the server is showing high level of io wait, but it is running nothing heavy ! almost idle !

i found that in dmesg output :

Code:

[188489.228510] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)[188489.315586] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[188489.395651] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[188731.962971] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[188731.963042] ata2.00: failed command: FLUSH CACHE EXT
[188731.963101] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 19
         res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[188731.963216] ata2.00: status: { DRDY }
[188731.963287] ata2: hard resetting link
[188732.898874] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[188732.899461] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
[188732.899535] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT1._GTF] (Node ffff88040e0bfe88), AE_NOT_FOUND (20150619/psparse-536)
[188732.900264] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
[188732.900368] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT1._GTF] (Node ffff88040e0bfe88), AE_NOT_FOUND (20150619/psparse-536)
[188732.900619] ata2.00: configured for UDMA/133
[188732.900741] ata2.00: retrying FLUSH 0xea Emask 0x4
[188732.900897] ata2.00: device reported invalid CHS sector 0
[188732.901022] ata2: EH complete
[189089.461664] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[189089.571193] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[189089.651375] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[189689.718649] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[189689.798596] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[189689.877976] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[190289.943671] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[190290.023620] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)

we can see that the controler give the same strange logs than before swapping the cable

put the most scary part is the device reset, i have maybe 20 time this king of entry in the log for today

now see this line in "atop" output

Code:

DSK |          sdb |               | busy     99% |  read       6 |              |  write     40 | KiB/r      4 |               | KiB/w     20 | MBr/s   0.00  |              | MBw/s   0.08  | avq     6.49 |               | avio  215 ms |

/dev/sdb is at is maximum, everybody is waiting for it, and the server is really slow !

now take a look at the smart :

Code:

SMART Attributes Data Structure revision number: 16Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       2
  3 Spin_Up_Time            0x0027   172   171   021    Pre-fail  Always       -       2375
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       72
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       7707
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       72
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       59
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       188
194 Temperature_Celsius     0x0022   119   106   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   199   000    Old_age   Always       -       13450
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0


SMART Error Log Version: 1
No Errors Logged


SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      7696         -
# 2  Extended offline    Interrupted (host reset)      10%      7674         -
# 3  Short offline       Completed without error       00%      7671         -
# 4  Short offline       Completed without error       00%      7648         -
# 5  Short offline       Completed without error       00%      7624         -
# 6  Short offline       Completed without error       00%      7601         -
# 7  Extended offline    Completed without error       00%      7579         -
# 8  Short offline       Completed without error       00%      7553         -
# 9  Short offline       Completed without error       00%      7529         -
#10  Short offline       Completed without error       00%      7505         -
#11  Short offline       Completed without error       00%      7483         -
#12  Short offline       Completed without error       00%      7459         -
#13  Short offline       Completed without error       00%      7435         -
#14  Short offline       Completed without error       00%      7411         -
#15  Short offline       Completed without error       00%      7387         -
#16  Short offline       Completed without error       00%      7363         -
#17  Short offline       Completed without error       00%      7363         -


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

nothing really scary here... just the device reset :S

mdadm not crying :

Code:

Personalities : [raid1] [raid10] md1 : active raid1 sde1[6](S) sdd1[4] sdg1[3] sda1[0] sdf1[5] sdc1[2] sdb1[1]
      204736 blocks [6/6] [UUUUUU]
      
md10 : active raid10 sde2[6](S) sdd2[9] sdg2[8] sda2[7] sdf2[5] sdc2[2] sdb2[1]
      2929276416 blocks super 1.2 512K chunks 2 near-copies [6/6] [UUUUUU]
      
unused devices: <none>

So what i have to do ?

can it be a sata cable problem ?
do i have to try to put back the drives on the HBA ?
maybe i can FAIL the drive manually in mdadm to restore io performance ?
do i just have to stop thinking and change this fu****g drive with a new one ?

one more strange thin : pve-root lvm seems to be loaded too : ( atop command )

Code:

ve-root |               | busy     82% |  read       6 |              |  write     34 | KiB/r      4 |               | KiB/w      4 | MBr/s   0.00  |              | MBw/s   0.01  | avq     2.22 |               | avio  206 ms |
LVM | 100--disk--1 |               | busy     77% |  read       1 |              |  write      6 | KiB/r      4 |               | KiB/w      6 | MBr/s   0.00  |              | MBw/s   0.00  | avq     1.00 |               | avio 1106 ms

please give advice

have a nice night or day

Hugo

EDIT : 5 minutes after writing this post :

[202440.493712] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen[202440.493792] ata2.00: failed command: FLUSH CACHE EXT
[202440.493853] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 18
res 40/00:01:09:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[202440.493979] ata2.00: status: { DRDY }
[202440.494043] ata2: hard resetting link
[202440.981683] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[202440.982264] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
[202440.982340] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT1._GTF] (Node ffff88040e0bfe88), AE_NOT_FOUND (20150619/psparse-536)
[202440.983107] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
[202440.983200] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT1._GTF] (Node ffff88040e0bfe88), AE_NOT_FOUND (20150619/psparse-536)
[202440.983422] ata2.00: configured for UDMA/133
[202440.983541] ata2.00: retrying FLUSH 0xea Emask 0x4
[202440.983710] ata2: EH complete
[202894.779638] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[202894.859945] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[202894.939835] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[202920.475286] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[202920.475381] ata2.00: failed command: FLUSH CACHE EXT
[202920.475474] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 30
res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[202920.475960] ata2.00: status: { DRDY }
[202920.476303] ata2: hard resetting link
[202920.963216] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[202920.963829] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
[202920.963939] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT1._GTF] (Node ffff88040e0bfe88), AE_NOT_FOUND (20150619/psparse-536)
[202920.964658] ACPI Error: [DSSP] Namespace lookup failure, AE_NOT_FOUND (20150619/psargs-359)
[202920.964786] ACPI Error: Method parse/execution failed [\_SB_.PCI0.SAT0.SPT1._GTF] (Node ffff88040e0bfe88), AE_NOT_FOUND (20150619/psparse-536)
[202920.965041] ata2.00: configured for UDMA/133
[202920.965341] ata2.00: retrying FLUSH 0xea Emask 0x4
[202920.965885] ata2.00: device reported invalid CHS sector 0
[202920.966282] ata2: EH complete
root@proxmox /home/admin #

EDIT 2 :

i tested the drive with hdparm :

Code:

root@proxmox /home/admin # hdparm -tT /dev/sdb

/dev/sdb:

Broadcast message from nut@proxmox (somewhere) (Tue Oct 13 13:51:49 2015):

UPS ups@localhost is unavailable

 Timing cached reads:   27226 MB in  2.00 seconds = 13626.76 MB/sec
 Timing buffered disk reads:   4 MB in  6.65 seconds = 615.65 kB/sec
root@proxmox /home/admin # hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   27052 MB in  2.00 seconds = 13539.38 MB/sec
 Timing buffered disk reads: 148 MB in  4.60 seconds =  32.18 MB/sec
root@proxmox /home/admin # hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:     2 MB in 51.76 seconds =  39.57 kB/sec
 Timing buffered disk reads:  94 MB in  6.97 seconds =  13.49 MB/sec

seems really like a faulty cable on port on the motherboard, cache is fast then slow then fast, like a bad connection

I will fault the drive to restore performance...

udo · Oct 13, 2015

hugosxm said:
Hi everybody, i know that this is not specific to proxmox but i am always finding good help here so i try anyway if the staff is ok

...

Code:

... [188731.962971] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [188731.963042] ata2.00: failed command: FLUSH CACHE EXT ...

Hi Hugo,
looks for me, that the lsi-driver give the error because it's can't flush the cache fast enough because sdb is too busy to write the data.

I would first try to use another bay for sdb (switch with you spare-drive?) to use another port/cable...

If this don't help change the disk.

Udo

hugosxm · Oct 13, 2015

OK thanks I will try swap but the drive are directly connected to motherboard, Maybe there is a bad cable...

I opened a RMA at Western digital just in case

erwis · Nov 21, 2015

Hi Hugo and Udo,

I have a similar problem on two different Proxmox 4 hosts, one is HP DL380 G6 and second Supermicro. Both are using 9211-8i compatible controllers in HT mode. Supermicro is running ZFS, HP not.
No disk's are yet attached to the controller on HP (Second cage). Will be used later for ZFS.

Dmesg from HP:
[63021.406964] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[63021.480822] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[63621.399219] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)

Dmesg from Supermicro:

[30667.181273] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[30667.274037] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)
[30667.367200] mpt2sas0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)

I'm pretty sure that the disks and cables are OK. With Proxmox 3.x I had no such problems.

Bug in the kernel or driver?

Eryk

erwis · Nov 21, 2015

Hi,

I did:
apt-get --purge remove sas2ircu

And the errors are gone.

Eryk

erwis · Nov 21, 2015

Should be "Both are using 9211-8i compatible controllers in IT mode".
Sorry;-(

Search

Search

Need help with I/O wait problem

hugosxm

Member

udo

Distinguished Member

hugosxm

Member

erwis

Renowned Member

erwis

Renowned Member

erwis

Renowned Member