Bad disk I/O performance with LSI SAS2008 controller

ezanolin

New Member
Mar 18, 2024
3
1
3
Good day, I am running into terrible performance issues after upgrading from 6 to 7. Here are my system details

Code:
~# pveversion
pve-manager/7.4-17/513c62be (running kernel: 5.15.131-1-pve)

Code:
:~# pveperf
CPU BOGOMIPS:      127671.60
REGEX/SECOND:      1500551
HD SIZE:           45.53 GB (/dev/mapper/ssdvg-system)
BUFFERED READS:    39.29 MB/sec
AVERAGE SEEK TIME: 0.42 ms
FSYNCS/SECOND:     3.38
DNS EXT:           43.54 ms

Code:
~# lspci | grep SAS
03:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 02)

Code:
~# sas2ircu 0 DISPLAY
Device is a Hard disk
  Enclosure #                             : 1
  Slot #                                  : 0
  SAS Address                             : 4433221-1-0700-0000
  State                                   : Optimal (OPT)
  Size (in MB)/(in sectors)               : 953869/1953525167
  Manufacturer                            : ATA
  Model Number                            : Samsung SSD 860
  Firmware Revision                       : 1B6Q
  Serial No                               : S4CZNF0M734723M
  GUID                                    : 5002538e49784aa3
  Protocol                                : SATA
  Drive Type                              : SATA_SSD

Device is a Hard disk
  Enclosure #                             : 1
  Slot #                                  : 1
  SAS Address                             : 4433221-1-0600-0000
  State                                   : Optimal (OPT)
  Size (in MB)/(in sectors)               : 953869/1953525167
  Manufacturer                            : ATA
  Model Number                            : Samsung SSD 860
  Firmware Revision                       : 1B6Q
  Serial No                               : S4CZNF0M734705R
  GUID                                    : 5002538e49784a82
  Protocol                                : SATA
  Drive Type                              : SATA_SSD


Server is a Dell R710 with dual X5650 and 64Gb of memory, ssdvg is a LVM volume on a hardware RAID mirror of 2 Samsung 860 SSD's on a SAS2008 controller. Disk operations are extremely slow and when I perform serious disk operations like dd'ing images on the hypervisor it can cause VM's to stop responding due to locked i/o. Other than the disk throughput being very much on the low end FSYNCS/SECOND are horrifically bad. The servers current I/O workloads are VERY light, currently about 2M/s read and < 200K/s write and CPU is under control too at flat 7% across the day. Everything is using LVM and nothing fancy like ZFS.

This issue seems to have been introduced by upgrading from 6 to 7, any advice on what is going on here?
 
Last edited:
afaik, LSI SAS2008 in hw raid can't trim disks, so it's a not recommended configuration.
imo, your disks are slow because of this, and are wearout faster than excepted.
you can try to boot to the previous kernel to confirm.
 
Have booted into kernel 5.4.203-1 and 5.3.18-2 and performance is still terrible. Would this controller really tank the fsyncs/s like this? Are these controllers really that bad. Perhaps the performance issue was always there and not due to the upgrade but as I mentioned I have very little I/O load and I would imagine that in most use cases this kind of performance would be considered broken. These are pretty common / popular Dell RAID controllers and I would have expected to see more people complaining if this was "normal"
 
It does indeed seem probable that this is not an upgrade issue. I found this post https://blog.erben.sk/2019/01/10/enabling-writecache-on-lsi-raid-adapters/ which outlines how to use the obscure lsiutil binary to enable write caching on the raid disks and after doing so and rebooting the performance is hugely improved.

Code:
~# pveperf
CPU BOGOMIPS:      127673.28
REGEX/SECOND:      1518369
HD SIZE:           45.53 GB (/dev/mapper/ssdvg-system)
BUFFERED READS:    167.68 MB/sec
AVERAGE SEEK TIME: 0.29 ms
FSYNCS/SECOND:     2272.15
DNS EXT:           42.49 ms


Not amazing read speed but I'll take it. The fyncs/s are massively improved.

Thank you.
 
  • Like
Reactions: Kingneutron
These are pretty common / popular Dell RAID controllers and I would have expected to see more people complaining if this was "normal"
because they use the HW cache (that's why disk's cache is disabled) + many spinning disks and/or SSD provided/recommended by the vendor which are datacenter SSD drives.
Consumer SSD drives like your Samsung 860 aren't designed for any HW RAID, even if its cache is enabled, missing the TRIM command will slowdown SSD and wearout faster than expeced.
 
  • Like
Reactions: Kingneutron

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!