Really slow sequential write with 8 x 10kRPM SAS disks on HW raid

mailinglists

Renowned Member
Mar 14, 2012
641
67
93
Hi guys,

on HP DL 580 G5 with p800 BBC HW RAID 10 setup using 8 x 10kRPM SAS disks on 4.3 (4.3-9/f7c6f0cd (running kernel: 4.4.21-1-pve)) and ext4 on thin LVM default install, we get really poor disk write speeds in host (as well on VMs).

Here's a few simple tests which show write speed of around 70MB/s !?! inside host (not from VM):
Code:
fio --filename=brisi --sync=1 --rw=write --bs=10M --numjobs=1 --iodepth=1 --size=2000MB --name=test
...
Run status group 0 (all jobs):
  WRITE: io=2000.0MB, aggrb=69567KB/s, minb=69567KB/s, maxb=69567KB/s, mint=29439msec, maxt=29439msec
Disk stats (read/write):
    dm-0: ios=0/2668, merge=0/0, ticks=0/209376, in_queue=209904, util=87.09%, aggrios=126/2423, aggrmerge=0/261, aggrticks=272/210032, aggrin_queue=210300, aggrutil=87.07%
  cciss!c0d0: ios=126/2423, merge=0/261, ticks=272/210032, in_queue=210300, util=87.07%

dd if=/dev/zero of=brisi bs=100M count=30 oflag=dsync
30+0 records in
30+0 records out
3145728000 bytes (3.1 GB) copied, 36.3609 s, 86.5 MB/s

pveperf 
CPU BOGOMIPS:      115201.14
REGEX/SECOND:      913465
HD SIZE:           32.36 GB (/dev/dm-0)
BUFFERED READS:    360.37 MB/sec
AVERAGE SEEK TIME: 4.55 ms
FSYNCS/SECOND:     2600.34
DNS EXT:           34.94 ms
DNS INT:           1.62 ms

I was expecting sequential write speeds around 400MB/s and read around 800MB/s.

Hw controller is configured as write back.
Code:
=> controller slot=8 show detail
Smart Array P800 in Slot 8
  Bus Interface: PCI
  Slot: 8
  Serial Number: X
  Cache Serial Number: X
  RAID 6 (ADG) Status: Enabled
  Controller Status: OK
  Hardware Revision: E
  Firmware Version: 7.22
  Rebuild Priority: Medium
  Expand Priority: Medium
  Surface Scan Delay: 15 secs
  Surface Scan Mode: Idle
  Queue Depth: Automatic
  Monitor and Performance Delay: 60  min
  Elevator Sort: Enabled
  Degraded Performance Optimization: Disabled
  Inconsistency Repair Policy: Disabled
  Wait for Cache Room: Disabled
  Surface Analysis Inconsistency Notification: Disabled
  Post Prompt Timeout: 0 secs
  Cache Board Present: True
  Cache Status: OK
  Cache Ratio: 25% Read / 75% Write
  Drive Write Cache: Disabled
  Total Cache Size: 512 MB
  Total Cache Memory Available: 456 MB
  No-Battery Write Cache: Disabled
  Cache Backup Power Source: Batteries
  Battery/Capacitor Count: 2
  Battery/Capacitor Status: OK
  SATA NCQ Supported: True

I have no idea what else to check. I have disabled barriers in FS:
Code:
cat /etc/fstab  | grep ext4
/dev/pve/root / ext4 errors=remount-ro,barrier=0 0 1

Any ideas would be greatly appreciated.
 
Read speed seems acceptable:
Code:
# /sbin/sysctl -w vm.drop_caches=3
# dd if=brisi of=/dev/null bs=10M
300+0 records in
300+0 records out
3145728000 bytes (3.1 GB) copied, 7.87702 s, 399 MB/s
 
Why would you do that considering OP's controller contains a working BBU?
BBU (or super-capacitor) attached to controller is protecting only controller-cache, NOT hard-drive (or SSD) cache! Once data is sent by controller to disk, there is no way for controller to know if data are in disk-cache, or already written to disk. If disk is using its own cache even for write-ops, it reports data as "written" at the moment when they are in disk-cache.

And that is exactly the reason why some raid-controllers automatically turn disk-cache for write-ops off: to avoid data corruption in case of power loss.
 
Hi guys. Thank you for your replys.

As you have already figured out, I have had enabled write cache on BBU RAID card and I will keep write cache on the drives themselves disabled, even thou we have parallel UPS cluster with diesel engine backing.
Code:
  Cache Status: OK -> Means cache on RAID card.
  Cache Ratio: 25% Read / 75% Write
  Drive Write Cache: Disabled -> Means cache on disks themselves.

I did test the install a bit more and what i found it is that if I write directly to a partition, i get around 170MB/s average sequential write speed. As soon as I put ext3, ext4 on it, it slows down to around 80MB/s. I will continue to investigate. All ideas and suggestions are welcome.

Code:
# mount | grep data1
/dev/cciss/c0d0p4 on /data1 type ext3 (rw,relatime,data=ordered)
# dd if=/dev/zero of=/data1/brisi bs=100MB count=10 oflag=dsync
10+0 records in
10+0 records out
1000000000 bytes (1.0 GB) copied, 13.7555 s, 72.7 MB/s
# umount /data1
# dd if=/dev/zero of=/dev/cciss/c0d0p4  bs=100MB count=10 oflag=dsync
10+0 records in
10+0 records out
1000000000 bytes (1.0 GB) copied, 5.9946 s, 167 MB/s
 
During my tests I will also do that and report back. If you want me to do other tests, just ask. Because I have other production related work, it might not happen instantly. :)
 
Uf.. Well ... I looked at the RAID setup first before debugging Linux any further .. and ... I'm kinda ashamed to say, but the server has had RAID 6 setup and not RAID 10 as I thought.

200w.gif


Results are as expected for RAID 6 with 8 x 300 GB SAS disks.
Sorry for wasting your time guys. :-(
 
  • Like
Reactions: sumsum

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!