HW Raid performance query: LSI 3008

fortechitsolutions

Renowned Member
Jun 4, 2008
447
51
93
Hi,

I wonder if anyone has experience and can comment maybe.

I've just spent some time reviewing a pair of lenovo servers, which have this HW Raid controller. 2 x identical nodes in a small proxmox cluster, proxmox 5.Latest.

There is no problem with the controller being recognized and usable. There are 2 x Raid1 mirrors present,
2 x 500gig SAS drives for main proxmox install
2 x 2Tb SATA drives for /localraid mirror which is for extra VM storage, just setup as a 'directory' and configured in proxmox as such / formatted as EXT4.

Via CLI tools, we see

Code:
LISTED IN output from LSPCI and in DMESG:

01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3008 [Fury] (rev 02)

and

ServeRAID M1215 SAS/SATA Controller

and checking what we see from megaclisas-status: general setup of the raid volumes / health etc:

root@pve252:/etc/apt/sources.list.d# megaclisas-status

-- Controller information --

-- ID | H/W Model       | RAM    | Temp | BBU    | Firmware
c0    | ServeRAID M1215 | 0MB    | 80C  | Absent | FW: 24.16.0-0082



-- Array information --
-- ID | Type   |    Size |  Strpsz | Flags | DskCache |   Status |  OS Path | CacheCade |InProgress
c0u0  | RAID-1 |    465G |   64 KB | RA,WT | Disabled |  Optimal | /dev/sda | None      |None
c0u1  | RAID-1 |   1817G |   64 KB | RA,WT | Disabled |  Optimal | /dev/sdb | None      |None



-- Disk information --

-- ID  | Type | Drive Model                                   | Size     | Status          | Speed    | Temp | Slot ID  | LSI ID
c0u0p0 | HDD  | 9XF47RH8ST9500620NS 00AJ137 00AJ140IBM LE2B   | 464.7 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [62:2]   | 8
c0u0p1 | HDD  | 9XF47QYJST9500620NS 00AJ137 00AJ140IBM LE2B   | 464.7 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [62:0]   | 9
c0u1p0 | HDD  | LENOVO-XST2000NX0433 LD4AW460GL89LD4ALD4ALD4A | 1.817 TB | Online, Spun Up | 12.0Gb/s | 34C  | [62:1]   | 10
c0u1p1 | HDD  | LENOVO-XST2000NX0433 LD48W460AM3ALD48LD48LD48 | 1.817 TB | Online, Spun Up | 12.0Gb/s | 34C  | [62:3]   | 11


So. When I do some basic performance tests with pveperf,


first on localraid
then on gig-ether NFS mounted Synology storage pool

we see

Code:
root@pve252:/localraid# pveperf /localraid

CPU BOGOMIPS:      134392.00
REGEX/SECOND:      2372072
HD SIZE:           1831.49 GB (/dev/sdb1)
BUFFERED READS:    53.70 MB/sec
AVERAGE SEEK TIME: 15.07 ms
FSYNCS/SECOND:     25.61
DNS EXT:           79.67 ms
DNS INT:           1.47 ms (prox.local)


root@pve252:/localraid# pveperf /mnt/pve/nfs-prod-nic2
CPU BOGOMIPS:      134392.00
REGEX/SECOND:      2248736
HD SIZE:           8553.00 GB (192.168.11.250:/volume3/PROD)
FSYNCS/SECOND:     1342.24
DNS EXT:           79.33 ms
DNS INT:           1.54 ms (prox.local)
root@pve252:/localraid#

ie, we have dreadful fsyncs per second on the local raid controller. The NFS Gig Ether Synology has much better performance. Woot.

I tweaked config of the local raid slightly,

Code:
root@pve252:/localraid# megacli -LDSetProp EnDskCache -LAll -aAll

Set Disk Cache Policy to Enabled on Adapter 0, VD 0 (target id: 0) success
Set Disk Cache Policy to Enabled on Adapter 0, VD 1 (target id: 1) success

and then after this the fsyncs per second jumped to an awe-inspiring level of 230. So better than 26 but still pretty dreadful.

I am curious if anyone else has banged their head against this problem before. If there is a known good workaround to try to make things less bad with a controller like this? Clearly by design this controller has no battery and no good-proper controller cache. It is I believe an entry-level raid controller. But to have such utterly dreadful performance - seems like there is something wrong, not just 'mediocre' performance.


PlanB for this thing is (a) migrate all VMs from Host2 onto Host1 (there are 2 nodes in a cluster here) (b) Install new raid controller, mid-range unit which has BBU and cache (c) Blow away old config on host, setup new raid / controller (d) Migrate everything to this host, upgrade the other proxmox node in similar manner, then once finished re-balanced VMs across the 2 x proxmox nodes.

Any comments or feedback are greatly appreciated.

Thanks!

Tim
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!