Proxmox 'soft lockups' on megaraid controller

marotori

Member
Jun 17, 2009
161
1
16
Hi All,

I have a proxmox sytem that has been installed onto a raid 10 system running on a LSI megaraid controller.

Now.. it seems that their are some issues in that the system will hang up when the drives are being driven hard. Not a crash as such - but the system seems to get into a state where it is waiting for io response... that never comes.

I recall somewhere it being mentioned that the proxmox kernel is redhat based? Digging around the net has shown that this issue also occurs on redhat kernels generally.

So....

A little digging and I found this article:

http://docs.redhat.com/docs/en-US/R.../6/html/Performance_Tuning_Guide/ch06s04.html

This is very good reading!

The long and short of it is I have at this point done some configuration that seems to help on my systems.

I simply run the following:

Code:
# raid controller tuning
echo cfq > /sys/block/sda/queue/scheduler
echo 512 > /sys/block/sda/queue/nr_requests
echo 16 > /sys/block/sda/queue/iosched/slice_idle
echo 64 > /sys/block/sda/queue/iosched/quantum
echo 1 > /sys/block/sda/queue/iosched/group_idle
echo 500000000 >/proc/sys/vm/dirty_bytes
echo 30 > /proc/sys/vm/dirty_background_ratio
echo 60 > /proc/sys/vm/dirty_ratio
/sbin/blockdev --setra 32768 /dev/sda
/sbin/blockdev --setra 32768 /dev/mapper/pve-data
/sbin/blockdev --setra 32768 /dev/mapper/pve-root

in /etc/rc.local

The place to start playing is really these values:

echo 16 > /sys/block/sda/queue/iosched/slice_idle (default 8)
echo 64 > /sys/block/sda/queue/iosched/quantum
echo 1 > /sys/block/sda/queue/iosched/group_idle

Changing the slice_idle setting to 16 seems to have really helped when under load.


No doubt this is specific to the mega raid controllers I have.. but you never know. May help all round!

Rob
 

softlion

Active Member
Dec 10, 2009
63
0
26
Thanks for the tip.

I'm experiencing soft lockups sometimes in a debian vm, and more frequently in windows vm especially under load.
I'm using mdadm with a soft raid 1 array, and i think your setting will help a lot.
I've applied them, i'll let you know if they are working in my case.

Thks.
 

e100

Renowned Member
Nov 6, 2010
1,268
42
73
Columbus, Ohio
ulbuilder.wordpress.com
In searching to see what exactly slice_idle does I came across this which describes similar issues in Veritas Volume Manager
http://www.symantec.com/business/support/index?page=content&id=TECH68863

They recommended setting slice_idle to 0 if using CFQ scheduler or switching to the deadline scheduler.
This does seem to confirm that this is not limited to megaraid.

I have been using deadline scheduler with any system that has a hardware RAID controller and BBU.

Rather than editing rc.local you can also install sysfsutils:
Code:
apt-get install sysfsutils

Then put your changes in /etc/sysfs.conf(how I set a disk to use deadline):
Code:
block/sdb/queue/scheduler = deadline
block/sdb/queue/iosched/front_merges = 0
block/sdb/queue/iosched/read_expire = 150
block/sdb/queue/iosched/write_expire = 1500

reboot, or apply the changes immediately:
Code:
/etc/init.d/sysfsutils restart
 

ybcnyc

New Member
Mar 12, 2012
3
0
1
thanks for the tip. i have lsi 8888elp and just started with proxmox and haven't seen anything yet as i'm just testing the box now. what's the exact message of the lock up in syslog. also did you check disks' health?
 

softlion

Active Member
Dec 10, 2009
63
0
26
Hi again.
Since one full week these settings are applied to the mdadm array (soft raid) the vms didn't experience any pause.
It seems to work nicely !

They should be applied by the default proxmox setup !

> what's the exact message of the lock up in syslog

Well ... there is none on the host.
On a windows guest there is no message, but all running tcp services stop responding for 2 to 5 seconds.
On a debian guest there is also no message. On one debian guest the network sometimes stays down after the 'pause'. A reboot is then required.

This is why it is so hard to detect and fix.

 

softlion

Active Member
Dec 10, 2009
63
0
26
I've applied the settings to both physical disks in the MDMADM array (sda and sdb for my config),and to all PVE mapped folders (/dev/mapper/pve*)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!