Hi there
On my SSD servers, the SSD performance is really bad.
I test the performance with
(I'm aware that dd is not the best tool to test performance)
System 1 (SSD + ZFS & RAID1)
SuperMicro E300-8D
32 GB RAM
1x Samsung SSD 860 EVO 250GB (SATA)
1x Samsung SSD 970 EVO Plus 250GB (NVME)
ZFS Software RAID 1
System 2 (SSD + MDRAID 6 & LVM)
SuperMicro 5018D-FN8T
64 GB RAM
4x Samsung SSD 860 EVO 500GB (SATA)
1x Samsung SSD 970 EVO 500GB
Linux Software RAID 6 with Hot Spare
System 3 (HDD + ZFS + RAID10)
SuperMicro 5017C-MTF
32 GB RAM
4x WDC WD1005FBYZ-01YCBB2
ZFS Software RAID 10
The (really old) server with only HDDs is much faster than the both SSD systems. I know that the RAID 1 / 6 performance is not as good as with RAID 10, but 54 / 95 MB/s is really poor for SSD RAIDs in my opinion.
All servers are almost IDLE. The servers were installed with ProxMox installer using the defaults.
Any ideas what the root cause could be?
Thanks and best regards
Patrick
On my SSD servers, the SSD performance is really bad.
I test the performance with
Code:
dd if=/dev/zero of=test.img bs=4k count=262133 conv=fdatasync
System 1 (SSD + ZFS & RAID1)
SuperMicro E300-8D
32 GB RAM
1x Samsung SSD 860 EVO 250GB (SATA)
1x Samsung SSD 970 EVO Plus 250GB (NVME)
ZFS Software RAID 1
Code:
(zrh1)root@vms1:~# dd if=/dev/zero of=test.img bs=4k count=262133 conv=fdatasync
262133+0 records in
262133+0 records out
1073696768 bytes (1.1 GB, 1.0 GiB) copied, 19.6898 s, 54.5 MB/s
Code:
(zrh1)root@vms1:~# pveperf
CPU BOGOMIPS: 35201.68
REGEX/SECOND: 215189
HD SIZE: 211.53 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 2820.62
DNS EXT: 44.44 ms
DNS INT: 52.88 ms (mydomain.link)
Code:
(zrh1)root@vms1:~# fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=test.img
seq_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=6014KiB/s][r=1503 IOPS][eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=24093: Fri Dec 20 13:33:53 2019
read: IOPS=1561, BW=6247KiB/s (6397kB/s)(366MiB/60001msec)
slat (usec): min=488, max=51385, avg=625.70, stdev=211.97
clat (usec): min=2, max=168, avg= 8.17, stdev= 9.65
lat (usec): min=514, max=51423, avg=635.28, stdev=212.48
clat percentiles (nsec):
| 1.00th=[ 2992], 5.00th=[ 3024], 10.00th=[ 3056], 20.00th=[ 3088],
| 30.00th=[ 3120], 40.00th=[ 3248], 50.00th=[ 3664], 60.00th=[ 3664],
| 70.00th=[ 3856], 80.00th=[ 6048], 90.00th=[27264], 95.00th=[27520],
| 99.00th=[28544], 99.50th=[29056], 99.90th=[42240], 99.95th=[69120],
| 99.99th=[85504]
bw ( KiB/s): min= 3080, max= 6752, per=99.98%, avg=6244.92, stdev=435.70, samples=119
iops : min= 770, max= 1688, avg=1561.20, stdev=108.91, samples=119
lat (usec) : 4=73.19%, 10=7.46%, 20=0.01%, 50=19.28%, 100=0.05%
lat (usec) : 250=0.01%
cpu : usr=2.39%, sys=97.31%, ctx=309, majf=8, minf=12
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=93702,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=6247KiB/s (6397kB/s), 6247KiB/s-6247KiB/s (6397kB/s-6397kB/s), io=366MiB (384MB), run=60001-60001msec
System 2 (SSD + MDRAID 6 & LVM)
SuperMicro 5018D-FN8T
64 GB RAM
4x Samsung SSD 860 EVO 500GB (SATA)
1x Samsung SSD 970 EVO 500GB
Linux Software RAID 6 with Hot Spare
Code:
(zrh2)root@vms1:~# dd if=/dev/zero of=test.img bs=4k count=262133 conv=fdatasync
262133+0 records in
262133+0 records out
1073696768 bytes (1.1 GB, 1.0 GiB) copied, 11.2465 s, 95.5 MB/s
Code:
(zrh2)root@vms1:~# pveperf
CPU BOGOMIPS: 35202.88
REGEX/SECOND: 1939532
HD SIZE: 45.58 GB (/dev/mapper/vg0-host--root)
BUFFERED READS: 939.91 MB/sec
AVERAGE SEEK TIME: 0.14 ms
FSYNCS/SECOND: 143.51
DNS EXT: 26.50 ms
DNS INT: 31.67 ms (mydomain.link)
Code:
(zrh2)root@vms1:~# fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=test.img
seq_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=93.6MiB/s][r=23.0k IOPS][eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=1902: Fri Dec 20 13:31:53 2019
read: IOPS=25.2k, BW=98.6MiB/s (103MB/s)(5913MiB/60001msec)
slat (usec): min=5, max=145, avg= 8.02, stdev= 3.66
clat (nsec): min=1182, max=73419k, avg=30545.47, stdev=122502.62
lat (usec): min=25, max=73451, avg=38.70, stdev=122.86
clat percentiles (usec):
| 1.00th=[ 21], 5.00th=[ 21], 10.00th=[ 21], 20.00th=[ 22],
| 30.00th=[ 22], 40.00th=[ 22], 50.00th=[ 22], 60.00th=[ 23],
| 70.00th=[ 23], 80.00th=[ 24], 90.00th=[ 27], 95.00th=[ 98],
| 99.00th=[ 200], 99.50th=[ 235], 99.90th=[ 330], 99.95th=[ 979],
| 99.99th=[ 3294]
bw ( KiB/s): min=54328, max=109736, per=100.00%, avg=100935.13, stdev=7821.72, samples=119
iops : min=13582, max=27434, avg=25233.78, stdev=1955.43, samples=119
lat (usec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.15%, 50=93.68%
lat (usec) : 100=1.95%, 250=3.88%, 500=0.24%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.04%, 10=0.01%, 50=0.01%, 100=0.01%
cpu : usr=7.97%, sys=27.25%, ctx=1513659, majf=0, minf=13
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=1513794,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=98.6MiB/s (103MB/s), 98.6MiB/s-98.6MiB/s (103MB/s-103MB/s), io=5913MiB (6201MB), run=60001-60001msec
Disk stats (read/write):
dm-0: ios=1510636/964, merge=0/0, ticks=45956/17180, in_queue=63136, util=99.28%, aggrios=1513847/2791, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
md0: ios=1513847/2791, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=303220/1937, aggrmerge=1085/7327, aggrticks=9787/2117, aggrin_queue=583, aggrutil=50.33%
nvme0n1: ios=6/0, merge=0/0, ticks=85/0, in_queue=72, util=0.08%
sdd: ios=378995/2546, merge=1448/8759, ticks=13169/3238, in_queue=688, util=50.33%
sdc: ios=378757/2470, merge=1162/9324, ticks=11282/1132, in_queue=64, util=48.38%
sdb: ios=379101/2424, merge=1345/9288, ticks=12530/3667, in_queue=1716, util=49.15%
sda: ios=379242/2246, merge=1471/9268, ticks=11872/2548, in_queue=376, util=48.92%
System 3 (HDD + ZFS + RAID10)
SuperMicro 5017C-MTF
32 GB RAM
4x WDC WD1005FBYZ-01YCBB2
ZFS Software RAID 10
Code:
(fra1)root@vms2:~# dd if=/dev/zero of=test.img bs=4k count=262133 conv=fdatasync
262133+0 records in
262133+0 records out
1073696768 bytes (1.1 GB, 1.0 GiB) copied, 3.25517 s, 330 MB/s
Code:
(fra1)root@vms2:~# pveperf
CPU BOGOMIPS: 52800.16
REGEX/SECOND: 2101031
HD SIZE: 1760.22 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 569.48
DNS EXT: 11.82 ms
DNS INT: 20.43 ms (mydomain.link)
Code:
(fra1)root@vms2:~# fio --ioengine=libaio --direct=1 --sync=1 --rw=read --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name seq_read --filename=test.img
seq_read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=159MiB/s][r=40.7k IOPS][eta 00m:00s]
seq_read: (groupid=0, jobs=1): err= 0: pid=2949: Fri Dec 20 13:42:00 2019
read: IOPS=40.4k, BW=158MiB/s (166MB/s)(9472MiB/60001msec)
slat (usec): min=20, max=291, avg=23.30, stdev= 3.37
clat (nsec): min=602, max=33623, avg=731.08, stdev=378.63
lat (usec): min=21, max=293, avg=24.16, stdev= 3.54
clat percentiles (nsec):
| 1.00th=[ 644], 5.00th=[ 652], 10.00th=[ 660], 20.00th=[ 668],
| 30.00th=[ 676], 40.00th=[ 684], 50.00th=[ 692], 60.00th=[ 692],
| 70.00th=[ 700], 80.00th=[ 708], 90.00th=[ 756], 95.00th=[ 948],
| 99.00th=[ 1288], 99.50th=[ 1400], 99.90th=[ 8384], 99.95th=[ 8640],
| 99.99th=[11328]
bw ( KiB/s): min=151392, max=165504, per=99.98%, avg=161630.95, stdev=2146.14, samples=119
iops : min=37848, max=41376, avg=40407.71, stdev=536.53, samples=119
lat (nsec) : 750=89.81%, 1000=6.33%
lat (usec) : 2=3.61%, 4=0.06%, 10=0.19%, 20=0.01%, 50=0.01%
cpu : usr=7.10%, sys=92.91%, ctx=295, majf=8, minf=10
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=2424920,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=158MiB/s (166MB/s), 158MiB/s-158MiB/s (166MB/s-166MB/s), io=9472MiB (9932MB), run=60001-60001msec
The (really old) server with only HDDs is much faster than the both SSD systems. I know that the RAID 1 / 6 performance is not as good as with RAID 10, but 54 / 95 MB/s is really poor for SSD RAIDs in my opinion.
All servers are almost IDLE. The servers were installed with ProxMox installer using the defaults.
Any ideas what the root cause could be?
Thanks and best regards
Patrick
Last edited: