Really slow sequential write with 8 x 10kRPM SAS disks on HW raid

mailinglists

Renowned Member
Mar 14, 2012
643
70
93
Hi guys,

on HP DL 580 G5 with p800 BBC HW RAID 10 setup using 8 x 10kRPM SAS disks on 4.3 (4.3-9/f7c6f0cd (running kernel: 4.4.21-1-pve)) and ext4 on thin LVM default install, we get really poor disk write speeds in host (as well on VMs).

Here's a few simple tests which show write speed of around 70MB/s !?! inside host (not from VM):
Code:
fio --filename=brisi --sync=1 --rw=write --bs=10M --numjobs=1 --iodepth=1 --size=2000MB --name=test
...
Run status group 0 (all jobs):
  WRITE: io=2000.0MB, aggrb=69567KB/s, minb=69567KB/s, maxb=69567KB/s, mint=29439msec, maxt=29439msec
Disk stats (read/write):
    dm-0: ios=0/2668, merge=0/0, ticks=0/209376, in_queue=209904, util=87.09%, aggrios=126/2423, aggrmerge=0/261, aggrticks=272/210032, aggrin_queue=210300, aggrutil=87.07%
  cciss!c0d0: ios=126/2423, merge=0/261, ticks=272/210032, in_queue=210300, util=87.07%

dd if=/dev/zero of=brisi bs=100M count=30 oflag=dsync
30+0 records in
30+0 records out
3145728000 bytes (3.1 GB) copied, 36.3609 s, 86.5 MB/s

pveperf 
CPU BOGOMIPS:      115201.14
REGEX/SECOND:      913465
HD SIZE:           32.36 GB (/dev/dm-0)
BUFFERED READS:    360.37 MB/sec
AVERAGE SEEK TIME: 4.55 ms
FSYNCS/SECOND:     2600.34
DNS EXT:           34.94 ms
DNS INT:           1.62 ms

I was expecting sequential write speeds around 400MB/s and read around 800MB/s.

Hw controller is configured as write back.
Code:
=> controller slot=8 show detail
Smart Array P800 in Slot 8
  Bus Interface: PCI
  Slot: 8
  Serial Number: X
  Cache Serial Number: X
  RAID 6 (ADG) Status: Enabled
  Controller Status: OK
  Hardware Revision: E
  Firmware Version: 7.22
  Rebuild Priority: Medium
  Expand Priority: Medium
  Surface Scan Delay: 15 secs
  Surface Scan Mode: Idle
  Queue Depth: Automatic
  Monitor and Performance Delay: 60  min
  Elevator Sort: Enabled
  Degraded Performance Optimization: Disabled
  Inconsistency Repair Policy: Disabled
  Wait for Cache Room: Disabled
  Surface Analysis Inconsistency Notification: Disabled
  Post Prompt Timeout: 0 secs
  Cache Board Present: True
  Cache Status: OK
  Cache Ratio: 25% Read / 75% Write
  Drive Write Cache: Disabled
  Total Cache Size: 512 MB
  Total Cache Memory Available: 456 MB
  No-Battery Write Cache: Disabled
  Cache Backup Power Source: Batteries
  Battery/Capacitor Count: 2
  Battery/Capacitor Status: OK
  SATA NCQ Supported: True

I have no idea what else to check. I have disabled barriers in FS:
Code:
cat /etc/fstab  | grep ext4
/dev/pve/root / ext4 errors=remount-ro,barrier=0 0 1

Any ideas would be greatly appreciated.
 
Read speed seems acceptable:
Code:
# /sbin/sysctl -w vm.drop_caches=3
# dd if=brisi of=/dev/null bs=10M
300+0 records in
300+0 records out
3145728000 bytes (3.1 GB) copied, 7.87702 s, 399 MB/s
 
Why would you do that considering OP's controller contains a working BBU?
BBU (or super-capacitor) attached to controller is protecting only controller-cache, NOT hard-drive (or SSD) cache! Once data is sent by controller to disk, there is no way for controller to know if data are in disk-cache, or already written to disk. If disk is using its own cache even for write-ops, it reports data as "written" at the moment when they are in disk-cache.

And that is exactly the reason why some raid-controllers automatically turn disk-cache for write-ops off: to avoid data corruption in case of power loss.
 
Hi guys. Thank you for your replys.

As you have already figured out, I have had enabled write cache on BBU RAID card and I will keep write cache on the drives themselves disabled, even thou we have parallel UPS cluster with diesel engine backing.
Code:
  Cache Status: OK -> Means cache on RAID card.
  Cache Ratio: 25% Read / 75% Write
  Drive Write Cache: Disabled -> Means cache on disks themselves.

I did test the install a bit more and what i found it is that if I write directly to a partition, i get around 170MB/s average sequential write speed. As soon as I put ext3, ext4 on it, it slows down to around 80MB/s. I will continue to investigate. All ideas and suggestions are welcome.

Code:
# mount | grep data1
/dev/cciss/c0d0p4 on /data1 type ext3 (rw,relatime,data=ordered)
# dd if=/dev/zero of=/data1/brisi bs=100MB count=10 oflag=dsync
10+0 records in
10+0 records out
1000000000 bytes (1.0 GB) copied, 13.7555 s, 72.7 MB/s
# umount /data1
# dd if=/dev/zero of=/dev/cciss/c0d0p4  bs=100MB count=10 oflag=dsync
10+0 records in
10+0 records out
1000000000 bytes (1.0 GB) copied, 5.9946 s, 167 MB/s
 
During my tests I will also do that and report back. If you want me to do other tests, just ask. Because I have other production related work, it might not happen instantly. :-)
 
Uf.. Well ... I looked at the RAID setup first before debugging Linux any further .. and ... I'm kinda ashamed to say, but the server has had RAID 6 setup and not RAID 10 as I thought.

200w.gif


Results are as expected for RAID 6 with 8 x 300 GB SAS disks.
Sorry for wasting your time guys. :-(
 
  • Like
Reactions: sumsum