Finally stable! - LSI Megaraid - Proxmox 2

marotori

Member
Jun 17, 2009
161
1
16
Hi All,

Anyone following my various posts on this forum will have noted that I have been having allot of problems recently!

Poor disk performance... lvm snapshots not working... random disk io lockup etc.


I have now finally got it working and my machines are backing up no problem! Disks are working etc..

The bottom line is. The current proxmox kernel has a very bugy lsi driver. I have managed to re-create the same issues across a range of machines. Super Micro - Dell - OEM. The common issue is always the LSI card.

Symptoms and issues are always related to high io.

1. Load the disk with lots of writes - disk io grinds to a halt and no option but to hard boot.
2. Do an LVM Snapshot.. 50/50 it works.
3. Remove an LVM Snapshot. 50/50 it works. Normally results in the /var/lib/vz/ volume hanging - and then hard boot follows

All of this is rather painfull when running live systems!

The solution is rather simple. Change the LSI driver.. and the issues go away!

I hope that the working driver is integratated soon :eek:

For thoise who are wanting to upgrade the driver - and I highly recommend it if you use LSI cards; the following procedure applies.

--

  1. Download the megaraid driver for your card:
    http://www.lsi.com/downloads/Public/MegaRAID Common Files/Debian5.0.x_05_30.zip

    Unzip this file. Inside it you will find another file.. probably called: megaraid_sas-v00.00.05.30-src.tgz
    This is the one you want!

    Extract the files in this tgz file to: /usr/local/src/

    (you should end up with a folder... /usr/local/src/megaraid_sas-v00.00.05.30)
  2. Now.. go into the folder and do the following:

    Rename Makefile to Makefile.orig (mv Makefile Makefile.orig)
    Copy Makefile.standalone to Makefile (cp Makefile.standalone Makefile)
  3. Next step - install development tools

    apt-get install build-essential
    pve-headers-2.6.32-10-pve (note. The version may be different if you are on a different kernel)
  4. Now - compile time:

    cd to : /usr/local/src/megaraid_sas-v00.00.05.30 (you may well still be there!)

    and run the following to compile the module:

    make -C /usr/src/linux-headers-2.6.32-10-pve/ M=$PWD modules

    Note! The header location should be changed to suit your kernel.
  5. Time to replace the driver

    Backup this file: /lib/modules/2.6.32-10-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko (simply rename)

    Copy the new driver in: cp /usr/local/src/megaraid_sas-v00.00.05.30/megaraid_sas.ko /lib/modules/2.6.32-10-pve/kernel/drivers/scsi/megaraid/megaraid_sas.ko (again.. note the potential different kernel version in the lib path)
  6. We now need to make sure this is loaded at boot time. To do this we update the initial ram disk.

    mv /boot/initrd.img-2.6.32-10-pve /boot/initrd.img-2.6.32-10-pve.bak (note different kernel versions)

    Re-create this initial ram disk: update-initramfs -c -k 2.6.32-10-pve

    Apply the changes: update-grub
  7. While you are at it.. edit /etc/rc.local and add the following:

    echo "975" > /sys/block/sda/queue/nr_requests
    #echo "975" > /sys/block/sda/device/queue_depth (this should work but is disabled in the proxmox kernel??)

    /sbin/blockdev --setra 1024 /dev/sda
    /sbin/blockdev --setra 1024 /dev/mapper/pve-data
    /sbin/blockdev --setra 1024 /dev/mapper/pve-root
  8. Reboot!
All going well.. you should now be on a safe version of the driver.

Since doing this on all my 'test' servers I have not killed them once. I have for the last 12 hours been running some live sites on 2 servers with the changes - and it all is working. Even backups are working with no proble again :p

Note: As of kernel 2.6.32-10-pve this module is definitingly needed. It may not be required past this - dependant on if the proxmox dev team pick up what I am posting and update the driver.

Enjoy!
 
  • Like
Reactions: Akkarin
Will the Proxmox team merge this driver into the oficial pve kernel? I dont like to compile anything, this requiere extra work in each server and mantain it in futures releases...
 
I just checked the source. GPL :)

/*
* Linux MegaRAID driver for SAS based RAID controllers
*
* Copyright (c) 2009-2011 LSI Corporation.
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation; either version 2
* of the License, or (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* FILE: megaraid_sas_base.c
*
* Authors: LSI Corporation
* Sreenivas Bagalkote
* Sumant Patro
* Bo Yang
*
* Send feedback to: <megaraidlinux@lsi.com>
*
* Mail to: LSI Corporation, 1621 Barber Lane, Milpitas, CA 95035
* ATTN: Linuxraid
*/

MODULE_LICENSE("GPL");
MODULE_VERSION(MEGASAS_VERSION);
MODULE_AUTHOR("megaraidlinux@lsi.com");
MODULE_DESCRIPTION("LSI MegaRAID SAS Driver");
 
I would still prefer the upstream version. Lets see if they include it. If no, we can think about adding it ourself.
 
Well...

24 hours on and LVM snapshot removal fails again!

It seems that a lvm snapshot can be removed easily if you create & remove it... but leave it running for a long time (e.g. my backups) and it crashes!

Looking round the net - this is a kernel/udev issue.

Anyone got any magic fixes?

Rob
 
I have the same problem but whit a LSI SAS1064ET PCI-Express Fusion-MPT-SAS

The backup's of a vz containers failes 25% of the time and the load will go up and the lvm commando's will freaze. The command:
Code:
dmsetup udevcookies
Will show a large list of cookies.

if you run
Code:
dmsetup udevcomplete_all
will clear that list, but will not work.

Load will reach around 6000.02

The only fix is to completly force restart.

This problem stops me from migrate the old vz containers(proxmox 1.9) to the new platform with proxmox 2.0.
 
My fix is to never ever ever use vz dump!

I now use r1soft hcp to do manual snapshots. But would love a proper solution :)


Sent from my iPhone using Tapatalk
 
My fix is to never ever ever use vz dump!

that's right, the problem is with lvm snapshot removal.
hcp snapshot works like a charm.

the only problem is that it cannot make a snapshot from an unmounted volume - and you cannot mount the vms' lvm disks.
thatswhy, instead of lvm, i use directory storage, then i do a hcp snapshot from the whole logical volume containing the raw disk files, do the tar backup, remove the hcp snapshot, and done :)
i tested this several times in the last few days without ANY problems.

i also wrote a script for this, it's a bit ugly because i'm a newbie but it works.
at least until the lvremove bug is fixed, who knows, maybe 4ever.
 
ok but PLEASE DON'T LAUGH as i told i'm a newbie.

first get the kernel header, then

#dpkg -i r1soft-hotcopy-amd64-3.18.2.deb
#hcp-setup --get-module

backup.sh:
assume a vm has all disk files in the same volume; backups older than 2 days will be removed; result (output of the commands) will be mailed.
Code:
#!/bin/sh

SEP="-------------------------------------------------------------------------------"
REPORT=""
BACKUPDIR="/var/backup/hcpbackup"
SNAPDIR="/var/snap"
IMGVOL="/dev/vgvirt/lvvirt"

# -20 to 19
NICE=19
# 0 to 7
IONICE=7
# -1 or -9
COMPRESS=-1

for VMID in $@
do

    VMID=`echo "$VMID"|grep -P '\d+' -o`
    if ! [ -f /etc/pve/qemu-server/$VMID.conf ]
    then
        continue
    fi

    if ! [ -d $BACKUPDIR/$VMID ]
    then
        mkdir $BACKUPDIR/$VMID
    fi

    DATE="$(date +%F_%H-%M-%S)"
    TODAY="$(date +%F)"
    LOGFILE="$BACKUPDIR/$VMID/"$DATE"_$VMID.log"
    CONFFILE="$BACKUPDIR/$VMID/"$DATE"_$VMID.conf"

    cat /etc/pve/qemu-server/$VMID.conf > $CONFFILE

    #exec > $LOGFILE 2>&1
    exec > $LOGFILE

    #echo $SEP
    echo "\nBackup log for vmid $VMID, created: $(date)"

    echo $SEP
    cat /etc/pve/qemu-server/$VMID.conf

    # don't know if suspend is needed, i do
    ERR=`/usr/sbin/qm suspend $VMID 2>&1`
    if [ "$ERR" ]
    then
        echo $SEP
        echo "qm suspend $VMID: $ERR"
    fi

    echo $SEP
    /usr/sbin/hcp -m $SNAPDIR $IMGVOL

    ERR=`/usr/sbin/qm resume $VMID 2>&1`
    if [ "$ERR" ]
    then
        echo $SEP
        echo "qm resume $VMID: $ERR"
    fi

    /usr/sbin/qm set $VMID -lock backup

    echo $SEP
    for FILE in `ls $SNAPDIR/images/$VMID`
    do
        GZIP=$COMPRESS nice -n$NICE ionice -n$IONICE tar -cvzf $BACKUPDIR/$VMID/"$DATE"_"$VMID"_$FILE.tar.gz -C $SNAPDIR/images/$VMID/ $FILE
    done

    echo $SEP
    /usr/sbin/hcp -l

    echo $SEP
    HCPDEV=`/usr/sbin/hcp -s $IMGVOL | grep /dev/hcp`
    /usr/sbin/hcp -r $HCPDEV

    /usr/sbin/qm unlock $VMID

    echo $SEP
    for BACKUP in `ls $BACKUPDIR/$VMID | grep -P '^\d{4}-\d{2}-\d{2}' | grep -v $TODAY | grep -v $(date -d "$TODAY -1 day" +%F)`
    do
        rm -v $BACKUPDIR/$VMID/$BACKUP
    done

    REPORT="$REPORT"`cat $LOGFILE`"\n\n\n"

done

MAILADDR="addr@domain.tld"
SUBJECT="Backup report at $(hostname -f)"
MESSAGE="Backup job finished at $(hostname -f) on $(date) with the following results:\n\n$REPORT"

echo "$MESSAGE" | mail -s "$SUBJECT" $MAILADDR

/etc/cron.d/backup
Code:
15 22 * * *           root /etc/backup/backup.sh 101 102 108 109 111 112

a tip:
you can zero out the free space inside a vm to get much smaller archives:
Code:
dd if=/dev/zero of=zero.small.file bs=1024 count=102400
dd if=/dev/zero of=zero.file bs=1024
rm zero.small.file
sync ; sleep 60 ; sync
rm zero.file

in a windows vm you can use sdelete:

sdelete.exe -z c::
 
Hi All,

Anyone following my various posts on this forum will have noted that I have been having allot of problems recently!
....

Registered just to say thank you. Following your instructions i've build driver for MegaRAID SAS 9341-4i for current kernel 4.15.18-18-pve.
sources from: broadcom.com/products/storage/raid-controllers/megaraid-sas-9341-4i#downloads

upload_2019-7-10_18-9-16.png

In case if anyone want to update driver, zip is attached.

Done that because there was FW updates since driver version, included in kernel. Worried for consistency.
 

Attachments

  • megaraid_sas.zip
    704.3 KB · Views: 3

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!