Hi,
I wonder if anyone has experience and can comment maybe.
I've just spent some time reviewing a pair of lenovo servers, which have this HW Raid controller. 2 x identical nodes in a small proxmox cluster, proxmox 5.Latest.
There is no problem with the controller being recognized and usable. There are 2 x Raid1 mirrors present,
2 x 500gig SAS drives for main proxmox install
2 x 2Tb SATA drives for /localraid mirror which is for extra VM storage, just setup as a 'directory' and configured in proxmox as such / formatted as EXT4.
Via CLI tools, we see
So. When I do some basic performance tests with pveperf,
first on localraid
then on gig-ether NFS mounted Synology storage pool
we see
ie, we have dreadful fsyncs per second on the local raid controller. The NFS Gig Ether Synology has much better performance. Woot.
I tweaked config of the local raid slightly,
and then after this the fsyncs per second jumped to an awe-inspiring level of 230. So better than 26 but still pretty dreadful.
I am curious if anyone else has banged their head against this problem before. If there is a known good workaround to try to make things less bad with a controller like this? Clearly by design this controller has no battery and no good-proper controller cache. It is I believe an entry-level raid controller. But to have such utterly dreadful performance - seems like there is something wrong, not just 'mediocre' performance.
PlanB for this thing is (a) migrate all VMs from Host2 onto Host1 (there are 2 nodes in a cluster here) (b) Install new raid controller, mid-range unit which has BBU and cache (c) Blow away old config on host, setup new raid / controller (d) Migrate everything to this host, upgrade the other proxmox node in similar manner, then once finished re-balanced VMs across the 2 x proxmox nodes.
Any comments or feedback are greatly appreciated.
Thanks!
Tim
I wonder if anyone has experience and can comment maybe.
I've just spent some time reviewing a pair of lenovo servers, which have this HW Raid controller. 2 x identical nodes in a small proxmox cluster, proxmox 5.Latest.
There is no problem with the controller being recognized and usable. There are 2 x Raid1 mirrors present,
2 x 500gig SAS drives for main proxmox install
2 x 2Tb SATA drives for /localraid mirror which is for extra VM storage, just setup as a 'directory' and configured in proxmox as such / formatted as EXT4.
Via CLI tools, we see
Code:
LISTED IN output from LSPCI and in DMESG:
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3008 [Fury] (rev 02)
and
ServeRAID M1215 SAS/SATA Controller
and checking what we see from megaclisas-status: general setup of the raid volumes / health etc:
root@pve252:/etc/apt/sources.list.d# megaclisas-status
-- Controller information --
-- ID | H/W Model | RAM | Temp | BBU | Firmware
c0 | ServeRAID M1215 | 0MB | 80C | Absent | FW: 24.16.0-0082
-- Array information --
-- ID | Type | Size | Strpsz | Flags | DskCache | Status | OS Path | CacheCade |InProgress
c0u0 | RAID-1 | 465G | 64 KB | RA,WT | Disabled | Optimal | /dev/sda | None |None
c0u1 | RAID-1 | 1817G | 64 KB | RA,WT | Disabled | Optimal | /dev/sdb | None |None
-- Disk information --
-- ID | Type | Drive Model | Size | Status | Speed | Temp | Slot ID | LSI ID
c0u0p0 | HDD | 9XF47RH8ST9500620NS 00AJ137 00AJ140IBM LE2B | 464.7 Gb | Online, Spun Up | 6.0Gb/s | 30C | [62:2] | 8
c0u0p1 | HDD | 9XF47QYJST9500620NS 00AJ137 00AJ140IBM LE2B | 464.7 Gb | Online, Spun Up | 6.0Gb/s | 30C | [62:0] | 9
c0u1p0 | HDD | LENOVO-XST2000NX0433 LD4AW460GL89LD4ALD4ALD4A | 1.817 TB | Online, Spun Up | 12.0Gb/s | 34C | [62:1] | 10
c0u1p1 | HDD | LENOVO-XST2000NX0433 LD48W460AM3ALD48LD48LD48 | 1.817 TB | Online, Spun Up | 12.0Gb/s | 34C | [62:3] | 11
So. When I do some basic performance tests with pveperf,
first on localraid
then on gig-ether NFS mounted Synology storage pool
we see
Code:
root@pve252:/localraid# pveperf /localraid
CPU BOGOMIPS: 134392.00
REGEX/SECOND: 2372072
HD SIZE: 1831.49 GB (/dev/sdb1)
BUFFERED READS: 53.70 MB/sec
AVERAGE SEEK TIME: 15.07 ms
FSYNCS/SECOND: 25.61
DNS EXT: 79.67 ms
DNS INT: 1.47 ms (prox.local)
root@pve252:/localraid# pveperf /mnt/pve/nfs-prod-nic2
CPU BOGOMIPS: 134392.00
REGEX/SECOND: 2248736
HD SIZE: 8553.00 GB (192.168.11.250:/volume3/PROD)
FSYNCS/SECOND: 1342.24
DNS EXT: 79.33 ms
DNS INT: 1.54 ms (prox.local)
root@pve252:/localraid#
ie, we have dreadful fsyncs per second on the local raid controller. The NFS Gig Ether Synology has much better performance. Woot.
I tweaked config of the local raid slightly,
Code:
root@pve252:/localraid# megacli -LDSetProp EnDskCache -LAll -aAll
Set Disk Cache Policy to Enabled on Adapter 0, VD 0 (target id: 0) success
Set Disk Cache Policy to Enabled on Adapter 0, VD 1 (target id: 1) success
and then after this the fsyncs per second jumped to an awe-inspiring level of 230. So better than 26 but still pretty dreadful.
I am curious if anyone else has banged their head against this problem before. If there is a known good workaround to try to make things less bad with a controller like this? Clearly by design this controller has no battery and no good-proper controller cache. It is I believe an entry-level raid controller. But to have such utterly dreadful performance - seems like there is something wrong, not just 'mediocre' performance.
PlanB for this thing is (a) migrate all VMs from Host2 onto Host1 (there are 2 nodes in a cluster here) (b) Install new raid controller, mid-range unit which has BBU and cache (c) Blow away old config on host, setup new raid / controller (d) Migrate everything to this host, upgrade the other proxmox node in similar manner, then once finished re-balanced VMs across the 2 x proxmox nodes.
Any comments or feedback are greatly appreciated.
Thanks!
Tim