HW Raid performance query: LSI 3008

fortechitsolutions

Renowned Member
Jun 4, 2008
465
55
93
Hi,

I wonder if anyone has experience and can comment maybe.

I've just spent some time reviewing a pair of lenovo servers, which have this HW Raid controller. 2 x identical nodes in a small proxmox cluster, proxmox 5.Latest.

There is no problem with the controller being recognized and usable. There are 2 x Raid1 mirrors present,
2 x 500gig SAS drives for main proxmox install
2 x 2Tb SATA drives for /localraid mirror which is for extra VM storage, just setup as a 'directory' and configured in proxmox as such / formatted as EXT4.

Via CLI tools, we see

Code:
LISTED IN output from LSPCI and in DMESG:

01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3008 [Fury] (rev 02)

and

ServeRAID M1215 SAS/SATA Controller

and checking what we see from megaclisas-status: general setup of the raid volumes / health etc:

root@pve252:/etc/apt/sources.list.d# megaclisas-status

-- Controller information --

-- ID | H/W Model       | RAM    | Temp | BBU    | Firmware
c0    | ServeRAID M1215 | 0MB    | 80C  | Absent | FW: 24.16.0-0082



-- Array information --
-- ID | Type   |    Size |  Strpsz | Flags | DskCache |   Status |  OS Path | CacheCade |InProgress
c0u0  | RAID-1 |    465G |   64 KB | RA,WT | Disabled |  Optimal | /dev/sda | None      |None
c0u1  | RAID-1 |   1817G |   64 KB | RA,WT | Disabled |  Optimal | /dev/sdb | None      |None



-- Disk information --

-- ID  | Type | Drive Model                                   | Size     | Status          | Speed    | Temp | Slot ID  | LSI ID
c0u0p0 | HDD  | 9XF47RH8ST9500620NS 00AJ137 00AJ140IBM LE2B   | 464.7 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [62:2]   | 8
c0u0p1 | HDD  | 9XF47QYJST9500620NS 00AJ137 00AJ140IBM LE2B   | 464.7 Gb | Online, Spun Up | 6.0Gb/s  | 30C  | [62:0]   | 9
c0u1p0 | HDD  | LENOVO-XST2000NX0433 LD4AW460GL89LD4ALD4ALD4A | 1.817 TB | Online, Spun Up | 12.0Gb/s | 34C  | [62:1]   | 10
c0u1p1 | HDD  | LENOVO-XST2000NX0433 LD48W460AM3ALD48LD48LD48 | 1.817 TB | Online, Spun Up | 12.0Gb/s | 34C  | [62:3]   | 11


So. When I do some basic performance tests with pveperf,


first on localraid
then on gig-ether NFS mounted Synology storage pool

we see

Code:
root@pve252:/localraid# pveperf /localraid

CPU BOGOMIPS:      134392.00
REGEX/SECOND:      2372072
HD SIZE:           1831.49 GB (/dev/sdb1)
BUFFERED READS:    53.70 MB/sec
AVERAGE SEEK TIME: 15.07 ms
FSYNCS/SECOND:     25.61
DNS EXT:           79.67 ms
DNS INT:           1.47 ms (prox.local)


root@pve252:/localraid# pveperf /mnt/pve/nfs-prod-nic2
CPU BOGOMIPS:      134392.00
REGEX/SECOND:      2248736
HD SIZE:           8553.00 GB (192.168.11.250:/volume3/PROD)
FSYNCS/SECOND:     1342.24
DNS EXT:           79.33 ms
DNS INT:           1.54 ms (prox.local)
root@pve252:/localraid#

ie, we have dreadful fsyncs per second on the local raid controller. The NFS Gig Ether Synology has much better performance. Woot.

I tweaked config of the local raid slightly,

Code:
root@pve252:/localraid# megacli -LDSetProp EnDskCache -LAll -aAll

Set Disk Cache Policy to Enabled on Adapter 0, VD 0 (target id: 0) success
Set Disk Cache Policy to Enabled on Adapter 0, VD 1 (target id: 1) success

and then after this the fsyncs per second jumped to an awe-inspiring level of 230. So better than 26 but still pretty dreadful.

I am curious if anyone else has banged their head against this problem before. If there is a known good workaround to try to make things less bad with a controller like this? Clearly by design this controller has no battery and no good-proper controller cache. It is I believe an entry-level raid controller. But to have such utterly dreadful performance - seems like there is something wrong, not just 'mediocre' performance.


PlanB for this thing is (a) migrate all VMs from Host2 onto Host1 (there are 2 nodes in a cluster here) (b) Install new raid controller, mid-range unit which has BBU and cache (c) Blow away old config on host, setup new raid / controller (d) Migrate everything to this host, upgrade the other proxmox node in similar manner, then once finished re-balanced VMs across the 2 x proxmox nodes.

Any comments or feedback are greatly appreciated.

Thanks!

Tim