Dell r710 IOs are slow, proxmox or hardware issue ?

Inglebard

Renowned Member
May 20, 2016
102
7
83
32
Hi,

We notice on a DELL r710 IOs issues. This seems to happens once every 6 month randomly (then 2-3 month and now once a month) and require a hard reboot of the server.

VMs are slow, proxmox is slow/unresponsive.
The server is a DELL r710 on a raid 10 (PERC 6/i Integrated).

The read and write are below 1MiB/s.

Here is a example where everything is good :
INFO: 0% (514.7 MiB of 300.0 GiB) in 3s, read: 171.6 MiB/s, write: 29.6 MiB/s
INFO: 1% (3.0 GiB of 300.0 GiB) in 1m 35s, read: 27.9 MiB/s, write: 27.5 MiB/s
INFO: 2% (6.0 GiB of 300.0 GiB) in 3m 24s, read: 28.3 MiB/s, write: 27.7 MiB/s
INFO: 3% (9.0 GiB of 300.0 GiB) in 5m 21s, read: 26.2 MiB/s, write: 25.9 MiB/s
INFO: 4% (12.0 GiB of 300.0 GiB) in 7m 18s, read: 26.3 MiB/s, write: 26.1 MiB/s
INFO: 5% (15.0 GiB of 300.0 GiB) in 8m 59s, read: 30.8 MiB/s, write: 30.4 MiB/s
INFO: 6% (18.1 GiB of 300.0 GiB) in 9m 49s, read: 61.5 MiB/s, write: 61.3 MiB/s

And when there is IOs issue :
INFO: 0% (2.6 MiB of 300.0 GiB) in 3s, read: 896.0 KiB/s, write: 133.3 KiB/s
INFO: 1% (3.0 GiB of 300.0 GiB) in 1h 15m 40s, read: 692.9 KiB/s, write: 589.4 KiB/s
INFO: 2% (6.0 GiB of 300.0 GiB) in 2h 29m 30s, read: 710.1 KiB/s, write: 695.7 KiB/s
INFO: 3% (9.0 GiB of 300.0 GiB) in 3h 50m 57s, read: 643.7 KiB/s, write: 634.5 KiB/s
INFO: 4% (12.0 GiB of 300.0 GiB) in 5h 8m 10s, read: 678.9 KiB/s, write: 674.4 KiB/s

Here is a top when there is the issue :
top - 09:49:52 up 16 days, 23:47, 1 user, load average: 2.23, 2.53, 2.32
Tasks: 370 total, 1 running, 369 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.9 us, 0.7 sy, 0.0 ni, 90.1 id, 4.2 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 96660.7 total, 53299.8 free, 34248.4 used, 9112.6 buff/cache
MiB Swap: 8192.0 total, 7577.0 free, 615.0 used. 61490.0 avail Mem


The IOs are terrible slow that I hardly access to the webUi or access to system journal. It is not a network issue.


Based on a zabbix graph (CPU IOwait time avg1), we can see IOwait goes from 0.1 to 4.0. It is a lot more but does not seems very high.

It seems to begin at 00:00.

3 scripts are executed at 00:00 but should not be "dangerous".

Code:
#!/bin/sh
raid=$(/usr/sbin/megaclisas-status)
datapercent=$(/usr/sbin/lvs pve/data -o data_percent --noheading | /usr/bin/sed -e 's/^[[:space:]]*//')

/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.raid.disk.status -o "$raid"
/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.lvm.data.percent -o "$datapercent"


Code:
#!/bin/sh
smart1=$(/usr/sbin/megacli -PDList -aAll | grep "Drive has flagged a S.M.A.R.T alert" | sed '1q;d')
smart2=$(/usr/sbin/megacli -PDList -aAll | grep "Drive has flagged a S.M.A.R.T alert" | sed '2q;d')
smart3=$(/usr/sbin/megacli -PDList -aAll | grep "Drive has flagged a S.M.A.R.T alert" | sed '3q;d')
smart4=$(/usr/sbin/megacli -PDList -aAll | grep "Drive has flagged a S.M.A.R.T alert" | sed '4q;d')

/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.disk.smartmegacli[1] -o "$smart1"
/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.disk.smartmegacli[2] -o "$smart2"                                                                                                    
/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.disk.smartmegacli[3] -o "$smart3"                                                                                                    
/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.disk.smartmegacli[4] -o "$smart4"

Code:
#!/bin/sh
temp1=$(/usr/sbin/megacli -PDList -aAll | grep Temperature | sed '1q;d' | grep -o -P "\d+C" | grep -o -P "\d+")
temp2=$(/usr/sbin/megacli -PDList -aAll | grep Temperature | sed '2q;d' | grep -o -P "\d+C" | grep -o -P "\d+")
temp3=$(/usr/sbin/megacli -PDList -aAll | grep Temperature | sed '3q;d' | grep -o -P "\d+C" | grep -o -P "\d+")
temp4=$(/usr/sbin/megacli -PDList -aAll | grep Temperature | sed '4q;d' | grep -o -P "\d+C" | grep -o -P "\d+")


/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.disk.temperature[1] -o "$temp1"
/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.disk.temperature[2] -o "$temp2"                                                                                                    
/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.disk.temperature[3] -o "$temp3"                                                                                                    
/usr/bin/zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -k system.disk.temperature[4] -o "$temp4"

If someone can confirm or not this is a hardware issue, I will glad to hear.
 

Attachments

  • io_issue.jpg
    io_issue.jpg
    90.4 KB · Views: 8
Don't have such hardware around - so did not run into this issue - but one thing that happens quite often with similar problems is installing the latest available Firmware updates for all components of the system (afaik Dell is providing updates for quite a long time, and installing them is quite comfortable)

I hope this helps!
 
anything in the logs or dmesg?

also make sure that you're running the latest PVE versions

If you're already on version 7 you might also consider installing the pve-kernel-5.15 meta package - maybe the newer kernel version fixes the issue
 
Actually, I am not able to connect, so I am not able to see the dmesg. The disk read/write at 250kbps.
I must hard reboot from idrac.

Edit : I use 6.4-13.
 
So log_output.txt correspond to journalctl -o short-precise -k -b -1, I thing it shoudl be similar to dmesg for last boot.

Based on zabbix, the issue start at 4h20.
 

Attachments

  • log output.txt
    log output.txt
    93.7 KB · Views: 0
  • zabbix_report.png
    zabbix_report.png
    40.5 KB · Views: 5

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!