DL380 gen 8 ZFS stripped mirror of 4x SSD poor performance

nlubello

New Member
Oct 12, 2020
4
0
1
33
Hi all,
I'm following this forum from a couple of years up to now, getting all the best knowledge from the community and solving all the small problems that I've encountered in Proxmox VE.

Now it's my turn to submit a new issue on our new installation over an HP DL380 gen 8 server with this specification:
Chassis
0 x HP|DL380p G8|2U|16*SFF-HS|SAS|HSP

Processor(s)
2 x Intel Xeon E5-2670 V2 2.50Ghz Ten (10) Core CPU

Heatsink(s)
2 x HP ProLiant DL380p Gen8 High Power 130W+ Heatsink

Fan(s)
2 x HP (667855-B21) ProLiant DL380p, DL380e Gen8, DL385p Gen8 Fan (662520-001)

Memory (RAM)
8 x 16GB - DDR3 1866MHz (PC3-14900R, 2Rx4)

RAID
1 x HP|P822 FH 2GB FBWC|FBOM

Hard Drive Caddy(s) & Blank(s)
8 x HP ProLiant Gen8, Gen9 SFF Hot-Swap Caddy

8 x HP ProLiant Gen8, Gen9, Gen10 SFF Hot-Swap Blank

Hard Drives
1x SSD on P420i KINGSTON A400 120Gb
4x SSD on P822 KINGSTON SEDC450 480Gb

Network Connectivity (FLOM)
1 x HP 331FLR Quad Port - 1GbE RJ45 FLR Ethernet

Full-Height Expansion Card(s)
1 x Qlogic QLE3044 Quad Port - 1GbE RJ45 Full Height PCIe-x4 Ethernet

Power Supply(s)
2 x HP Common Slot HS PSU 1200W High Eff.

The idea was to use the 120gb SSD as system disk and the array of 4x SSD in raid 10 configuration for the VM datastore.
Since we have the P822 raid card the easiest approach will be to build the RAID array in the controller and use it directly from PVE, but we would like to give a try with ZFS since from many topics in the net can give us many advantages in terms of reliability and performance.

So we started to create single disk volumes on the P822 card to let proxmox see individually each single drive (No SMART info unfortunately), we created the ZFS pool named VM-ZFS and we started our test with one Ubuntu VM and another Windows 10 VM with all disk using write back cache.
oot@pve2:/# ssacli ctrl slot=2 show config

Smart Array P822 in Slot 2 (sn: PDVTF0CRHAX01H)



Internal Drive Cage at Port 5I, Box 2, OK



Internal Drive Cage at Port 6I, Box 0, OK


Port Name: 1E

Port Name: 2E

Port Name: 3E

Port Name: 4E

Port Name: 5I

Port Name: 6I

Array A (Solid State SATA, Unused Space: 0 MB)

logicaldrive 1 (447.10 GB, RAID 0, OK)

physicaldrive 5I:2:1 (port 5I:box 2:bay 1, SATA SSD, 480 GB, OK)


Array B (Solid State SATA, Unused Space: 0 MB)

logicaldrive 2 (447.10 GB, RAID 0, OK)

physicaldrive 5I:2:2 (port 5I:box 2:bay 2, SATA SSD, 480 GB, OK)


Array C (Solid State SATA, Unused Space: 0 MB)

logicaldrive 3 (447.10 GB, RAID 0, OK)

physicaldrive 5I:2:3 (port 5I:box 2:bay 3, SATA SSD, 480 GB, OK)


Array D (Solid State SATA, Unused Space: 0 MB)

logicaldrive 4 (447.10 GB, RAID 0, OK)

physicaldrive 5I:2:4 (port 5I:box 2:bay 4, SATA SSD, 480 GB, OK)

SEP (Vendor ID PMCSIERA, Model SRCv24x6G) 380 (WWID: 50014380292FD51F)

On Ubuntu VM we started to monitor the performance using fio command tool:
oot@ubuntu:~# fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=read --size=500m --io_size=5g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
TEST: Laying out IO file (1 file / 500MiB)

TEST: (groupid=0, jobs=1): err= 0: pid=1311: Sat Nov 21 12:48:51 2020
read: IOPS=6066, BW=6066MiB/s (6361MB/s)(5120MiB/844msec)
slat (usec): min=36, max=4307, avg=133.65, stdev=82.54
clat (usec): min=545, max=20800, avg=4914.40, stdev=1221.92
lat (usec): min=735, max=21788, avg=5052.24, stdev=1250.03
clat percentiles (usec):
| 1.00th=[ 1598], 5.00th=[ 3490], 10.00th=[ 4178], 20.00th=[ 4490],
| 30.00th=[ 4621], 40.00th=[ 4686], 50.00th=[ 4686], 60.00th=[ 4817],
| 70.00th=[ 5211], 80.00th=[ 5538], 90.00th=[ 5735], 95.00th=[ 5932],
| 99.00th=[ 9372], 99.50th=[10814], 99.90th=[18482], 99.95th=[19792],
| 99.99th=[20841]
bw ( MiB/s): min= 5798, max= 5798, per=95.58%, avg=5798.00, stdev= 0.00, samples=1
iops : min= 5798, max= 5798, avg=5798.00, stdev= 0.00, samples=1
lat (usec) : 750=0.12%, 1000=0.25%
lat (msec) : 2=1.07%, 4=6.41%, 10=91.50%, 20=0.61%, 50=0.04%
cpu : usr=11.74%, sys=85.05%, ctx=75, majf=0, minf=8202
IO depths : 1=0.2%, 2=0.4%, 4=0.9%, 8=1.7%, 16=3.4%, 32=93.3%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.2%, 64=0.0%, >=64=0.0%
issued rwt: total=5120,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
READ: bw=6066MiB/s (6361MB/s), 6066MiB/s-6066MiB/s (6361MB/s-6361MB/s), io=5120MiB (5369MB), run=844-844msec

Disk stats (read/write):
sda: ios=4718/0, merge=0/0, ticks=4260/0, in_queue=3652, util=85.39%
root@ubuntu:~# fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=write --size=500m --io_size=5g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.1
Starting 1 process
Jobs: 1 (f=1)
TEST: (groupid=0, jobs=1): err= 0: pid=1314: Sat Nov 21 12:50:45 2020
write: IOPS=4604, BW=4604MiB/s (4828MB/s)(5120MiB/1112msec)
slat (usec): min=56, max=862, avg=199.01, stdev=46.97
clat (usec): min=404, max=9098, avg=6535.65, stdev=1266.94
lat (usec): min=583, max=9334, avg=6737.31, stdev=1285.84
clat percentiles (usec):
| 1.00th=[ 1418], 5.00th=[ 5211], 10.00th=[ 5932], 20.00th=[ 6063],
| 30.00th=[ 6128], 40.00th=[ 6194], 50.00th=[ 6259], 60.00th=[ 6325],
| 70.00th=[ 6521], 80.00th=[ 7963], 90.00th=[ 8356], 95.00th=[ 8586],
| 99.00th=[ 8717], 99.50th=[ 8848], 99.90th=[ 8979], 99.95th=[ 9110],
| 99.99th=[ 9110]
bw ( MiB/s): min= 4098, max= 4980, per=98.58%, avg=4539.00, stdev=623.67, samples=2
iops : min= 4098, max= 4980, avg=4539.00, stdev=623.67, samples=2
lat (usec) : 500=0.08%, 750=0.21%, 1000=0.25%
lat (msec) : 2=1.04%, 4=2.15%, 10=96.27%
cpu : usr=68.77%, sys=30.96%, ctx=44, majf=0, minf=13
IO depths : 1=0.2%, 2=0.4%, 4=0.9%, 8=1.7%, 16=3.4%, 32=93.3%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.8%, 8=0.0%, 16=0.0%, 32=0.2%, 64=0.0%, >=64=0.0%
issued rwt: total=0,5120,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
WRITE: bw=4604MiB/s (4828MB/s), 4604MiB/s-4604MiB/s (4828MB/s-4828MB/s), io=5120MiB (5369MB), run=1112-1112msec

Disk stats (read/write):
sda: ios=0/5006, merge=0/0, ticks=0/3788, in_queue=2992, util=87.28%
root@ubuntu:~# fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randread --size=500m --io_size=5g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [r(1)][11.7%][r=19.0MiB/s,w=0KiB/s][r=4876,w=0 IOPS][eta 00m:53s]
Jobs: 1 (f=1): [r(1)][21.7%][r=19.7MiB/s,w=0KiB/s][r=5033,w=0 IOPS][eta 00m:47s]
Jobs: 1 (f=1): [r(1)][31.7%][r=23.2MiB/s,w=0KiB/s][r=5942,w=0 IOPS][eta 00m:41s]
Jobs: 1 (f=1): [r(1)][41.7%][r=22.4MiB/s,w=0KiB/s][r=5735,w=0 IOPS][eta 00m:35s]
Jobs: 1 (f=1): [r(1)][51.7%][r=17.5MiB/s,w=0KiB/s][r=4487,w=0 IOPS][eta 00m:29s]
Jobs: 1 (f=1): [r(1)][61.7%][r=33.1MiB/s,w=0KiB/s][r=8478,w=0 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [r(1)][71.7%][r=17.5MiB/s,w=0KiB/s][r=4484,w=0 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [r(1)][81.7%][r=17.3MiB/s,w=0KiB/s][r=4419,w=0 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [r(1)][91.7%][r=21.4MiB/s,w=0KiB/s][r=5469,w=0 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [r(1)][100.0%][r=17.4MiB/s,w=0KiB/s][r=4449,w=0 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=1318: Sat Nov 21 12:52:08 2020
read: IOPS=5789, BW=22.6MiB/s (23.7MB/s)(1357MiB/60001msec)
slat (usec): min=18, max=543, avg=40.05, stdev=12.66
clat (usec): min=4, max=5360, avg=117.61, stdev=50.93
lat (usec): min=59, max=5409, avg=160.72, stdev=59.17
clat percentiles (usec):
| 1.00th=[ 50], 5.00th=[ 54], 10.00th=[ 58], 20.00th=[ 63],
| 30.00th=[ 92], 40.00th=[ 103], 50.00th=[ 116], 60.00th=[ 133],
| 70.00th=[ 161], 80.00th=[ 165], 90.00th=[ 169], 95.00th=[ 176],
| 99.00th=[ 186], 99.50th=[ 194], 99.90th=[ 494], 99.95th=[ 586],
| 99.99th=[ 701]
bw ( KiB/s): min=16592, max=46336, per=100.00%, avg=23203.89, stdev=8263.27, samples=119
iops : min= 4148, max=11584, avg=5800.97, stdev=2065.82, samples=119
lat (usec) : 10=0.01%, 20=0.01%, 50=1.30%, 100=31.81%, 250=66.74%
lat (usec) : 500=0.05%, 750=0.09%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
cpu : usr=15.32%, sys=32.82%, ctx=347387, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=347383,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: bw=22.6MiB/s (23.7MB/s), 22.6MiB/s-22.6MiB/s (23.7MB/s-23.7MB/s), io=1357MiB (1423MB), run=60001-60001msec

Disk stats (read/write):
sda: ios=346906/3, merge=0/1, ticks=37988/0, in_queue=4, util=0.01%

And here is were we first found issue on our configuration, we didn't expect crazy numbers but neither a result slower than single disk performance!

We performed the same random read test on the PVE host with much different results
root@pve2:/VM-ZFS# fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randread --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [r(1)][31.8%][r=511MiB/s][r=131k IOPS][eta 00m:15s]
Jobs: 1 (f=1): [r(1)][61.9%][r=509MiB/s][r=130k IOPS][eta 00m:08s]
Jobs: 1 (f=1): [r(1)][90.5%][r=510MiB/s][r=131k IOPS][eta 00m:02s]
Jobs: 1 (f=1): [r(1)][100.0%][r=508MiB/s][r=130k IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=16467: Sat Nov 21 13:06:31 2020
read: IOPS=125k, BW=489MiB/s (513MB/s)(10.0GiB/20930msec)
slat (usec): min=4, max=719, avg= 6.09, stdev= 7.91
clat (nsec): min=850, max=401984, avg=949.30, stdev=864.11
lat (usec): min=5, max=727, avg= 7.19, stdev= 8.07
clat percentiles (nsec):
| 1.00th=[ 892], 5.00th=[ 900], 10.00th=[ 908], 20.00th=[ 908],
| 30.00th=[ 916], 40.00th=[ 916], 50.00th=[ 924], 60.00th=[ 924],
| 70.00th=[ 932], 80.00th=[ 940], 90.00th=[ 948], 95.00th=[ 964],
| 99.00th=[ 1912], 99.50th=[ 1944], 99.90th=[ 2672], 99.95th=[ 2864],
| 99.99th=[ 6368]
bw ( KiB/s): min=72104, max=524296, per=99.92%, avg=500596.68, stdev=79430.22, samples=41
iops : min=18026, max=131074, avg=125149.17, stdev=19857.55, samples=41
lat (nsec) : 1000=97.19%
lat (usec) : 2=2.58%, 4=0.20%, 10=0.03%, 20=0.01%, 50=0.01%
lat (usec) : 250=0.01%, 500=0.01%
cpu : usr=26.71%, sys=72.48%, ctx=389, majf=0, minf=17
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: bw=489MiB/s (513MB/s), 489MiB/s-489MiB/s (513MB/s-513MB/s), io=10.0GiB (10.7GB), run=20930-20930msec

Second message with detail due to limit on 15k char per message
 
We digged in ZFS configuration using all the posts we found possibly related to this issue (ZFS performance is a trending topic) and we set 8GB of ram for ARC cache and also those additional options:

zfs set compression=on VM_ZFS
zfs set sync=disabled VM_ZFS
zfs set primarycache=all VM_ZFS
zfs set atime=off VM_ZFS
zfs set checksum=off VM_ZFS
zfs set dedup=off VM_ZFS

At this point the performance on Random Read 4k in the VM has slightly improved but not too consistent, maybe also because we added a second VIRTIO hard drive to perform test
root@ubuntu:/mnt/virtio1# fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randread --size=500m --io_size=10g --blocksize=4k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.1
Starting 1 process
Jobs: 1 (f=1): [r(1)][11.5%][r=48.6MiB/s,w=0KiB/s][r=12.4k,w=0 IOPS][eta 00m:54s]
Jobs: 1 (f=1): [r(1)][19.7%][r=52.6MiB/s,w=0KiB/s][r=13.5k,w=0 IOPS][eta 00m:49s]
Jobs: 1 (f=1): [r(1)][29.5%][r=51.8MiB/s,w=0KiB/s][r=13.2k,w=0 IOPS][eta 00m:43s]
Jobs: 1 (f=1): [r(1)][37.7%][r=52.6MiB/s,w=0KiB/s][r=13.5k,w=0 IOPS][eta 00m:38s]
Jobs: 1 (f=1): [r(1)][45.9%][r=46.5MiB/s,w=0KiB/s][r=11.9k,w=0 IOPS][eta 00m:33s]
Jobs: 1 (f=1): [r(1)][54.1%][r=41.8MiB/s,w=0KiB/s][r=10.7k,w=0 IOPS][eta 00m:28s]
Jobs: 1 (f=1): [r(1)][63.9%][r=25.6MiB/s,w=0KiB/s][r=6548,w=0 IOPS][eta 00m:22s]
Jobs: 1 (f=1): [r(1)][72.1%][r=40.5MiB/s,w=0KiB/s][r=10.4k,w=0 IOPS][eta 00m:17s]
Jobs: 1 (f=1): [r(1)][82.0%][r=23.7MiB/s,w=0KiB/s][r=6064,w=0 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [r(1)][91.8%][r=48.0MiB/s,w=0KiB/s][r=12.3k,w=0 IOPS][eta 00m:05s]
Jobs: 1 (f=1): [r(1)][100.0%][r=51.9MiB/s,w=0KiB/s][r=13.3k,w=0 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=1382: Sat Nov 21 13:20:18 2020
read: IOPS=10.8k, BW=42.3MiB/s (44.4MB/s)(2540MiB/60001msec)
slat (usec): min=17, max=5641, avg=24.32, stdev=11.75
clat (usec): min=3, max=10833, avg=58.26, stdev=33.52
lat (usec): min=49, max=10874, avg=84.61, stdev=40.02
clat percentiles (usec):
| 1.00th=[ 42], 5.00th=[ 45], 10.00th=[ 45], 20.00th=[ 46],
| 30.00th=[ 46], 40.00th=[ 46], 50.00th=[ 48], 60.00th=[ 50],
| 70.00th=[ 52], 80.00th=[ 70], 90.00th=[ 101], 95.00th=[ 105],
| 99.00th=[ 116], 99.50th=[ 120], 99.90th=[ 131], 99.95th=[ 433],
| 99.99th=[ 529]
bw ( KiB/s): min=23936, max=54848, per=99.80%, avg=43262.66, stdev=11606.74, samples=119
iops : min= 5984, max=13712, avg=10815.68, stdev=2901.70, samples=119
lat (usec) : 4=0.01%, 10=0.01%, 20=0.01%, 50=62.94%, 100=25.02%
lat (usec) : 250=11.97%, 500=0.03%, 750=0.02%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%
cpu : usr=18.38%, sys=37.83%, ctx=650164, majf=0, minf=8
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwt: total=650252,0,0, short=0,0,0, dropped=0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
READ: bw=42.3MiB/s (44.4MB/s), 42.3MiB/s-42.3MiB/s (44.4MB/s-44.4MB/s), io=2540MiB (2663MB), run=60001-60001msec

Disk stats (read/write):
vda: ios=648958/0, merge=0/0, ticks=32820/0, in_queue=0, util=0.00%

We made some test also using HW raid 10 from the P822 but performance were slower than 15-20%.
Just as a reference I'll add some screenshot of the I/O performance under windows
photo5892971374274786215.jpg


With all this said we now have 2 main doubts:
1) What is causing those huge performance difference in the VM (random read 4k) compared to PVE host?
2) With the given test using hardware raid 10 is this the maximum we can get or are we losing some step?


Many thanks in advance to anyone that would like to give us any comment.
 
Hello all, there is no one interested in this topic?
Hey @nlubello

its most likely the controller card being an issue.

stick with LSI or HBA Broadcom 9300 SAS 8-port dedicated HBA controllers.

Common issue with HP and Dell controllers that switch between RAID and HBA mode having their own caches and other features that interfere with ZFS managing the drives directly.

there are many threads on here that already cover this issue and probably why you are not getting any hits to your post as its covering the same ground.

hope the above helps.

""Cheers
G
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!