I am seeing a huge difference in write performance on my proxmox hosts in a single cluster.
They are both running the same version of proxmox (8.2.5) and have the same hardware. (HP DL380 G8 with 2x Xeon E5-2670 and 128GB RAM, 2x 1TB HDD in RAID1)
I have run fio on both hosts with the same parameters and the results are very different (usec on first host and msec on second).
Host 1:
Host 2:
Any idea why this might be happening? I have checked the RAID controller and it is in good health.
The VMs are running fine on both hosts, but those disks are stored on a Ceph cluster.
I want to use the local disks to store the etcd of a kubernetes cluster, preferably on SSD (not yet installed).
But the difference in write performance is making me doubt if an SSD would be worth it. If the problem is not the disks, but something else.
They are both running the same version of proxmox (8.2.5) and have the same hardware. (HP DL380 G8 with 2x Xeon E5-2670 and 128GB RAM, 2x 1TB HDD in RAID1)
I have run fio on both hosts with the same parameters and the results are very different (usec on first host and msec on second).
Host 1:
Bash:
/var/lib/vz/dump# fio --rw=write --ioengine=sync --fdatasync=1 --directory=write-test --size=100m --bs=2300 --name=kube-storage-test
Bash:
kube-storage-test: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.33
Starting 1 process
kube-storage-test: Laying out IO file (1 file / 100MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=15.9MiB/s][w=7231 IOPS][eta 00m:00s]
kube-storage-test: (groupid=0, jobs=1): err= 0: pid=3066826: Mon Oct 21 12:49:25 2024
write: IOPS=7570, BW=16.6MiB/s (17.4MB/s)(100.0MiB/6022msec); 0 zone resets
clat (usec): min=3, max=599, avg= 9.16, stdev= 5.99
lat (usec): min=4, max=599, avg= 9.52, stdev= 6.13
clat percentiles (usec):
| 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 5], 20.00th=[ 5],
| 30.00th=[ 7], 40.00th=[ 9], 50.00th=[ 9], 60.00th=[ 10],
| 70.00th=[ 10], 80.00th=[ 13], 90.00th=[ 15], 95.00th=[ 16],
| 99.00th=[ 22], 99.50th=[ 26], 99.90th=[ 50], 99.95th=[ 61],
| 99.99th=[ 269]
bw ( KiB/s): min=15722, max=17946, per=100.00%, avg=17018.50, stdev=747.14, samples=12
iops : min= 7000, max= 7990, avg=7577.17, stdev=332.64, samples=12
lat (usec) : 4=2.74%, 10=68.21%, 20=27.20%, 50=1.76%, 100=0.07%
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%
fsync/fdatasync/sync_file_range:
sync (usec): min=39, max=1051, avg=119.50, stdev=65.44
sync percentiles (usec):
| 1.00th=[ 42], 5.00th=[ 42], 10.00th=[ 43], 20.00th=[ 45],
| 30.00th=[ 53], 40.00th=[ 60], 50.00th=[ 153], 60.00th=[ 159],
| 70.00th=[ 169], 80.00th=[ 180], 90.00th=[ 192], 95.00th=[ 204],
| 99.00th=[ 231], 99.50th=[ 241], 99.90th=[ 330], 99.95th=[ 449],
| 99.99th=[ 570]
cpu : usr=5.41%, sys=35.18%, ctx=96172, majf=0, minf=39
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,45590,0,0 short=45590,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=16.6MiB/s (17.4MB/s), 16.6MiB/s-16.6MiB/s (17.4MB/s-17.4MB/s), io=100.0MiB (105MB), run=6022-6022msec
Disk stats (read/write):
dm-1: ios=0/144862, merge=0/0, ticks=0/4904, in_queue=4904, util=57.56%, aggrios=4/96983, aggrmerge=0/51267, aggrticks=0/3550, aggrin_queue=3551, aggrutil=58.74%
sda: ios=4/96983, merge=0/51267, ticks=0/3550, in_queue=3551, util=58.74%
Bash:
/var/lib/vz/dump# fio --rw=write --ioengine=sync --fdatasync=1 --directory=write-test --size=100m --bs=2300 --name=kube-storage-test
Bash:
Starting 1 process
kube-storage-test: Laying out IO file (1 file / 100MiB)
Jobs: 1 (f=1): [W(1)][99.9%][w=69KiB/s][w=31 IOPS][eta 00m:01s]
kube-storage-test: (groupid=0, jobs=1): err= 0: pid=2769123: Mon Oct 21 13:06:33 2024
write: IOPS=45, BW=102KiB/s (105kB/s)(100.0MiB/1002846msec); 0 zone resets
clat (usec): min=8, max=55195, avg=31.18, stdev=314.75
lat (usec): min=9, max=55195, avg=32.07, stdev=314.76
clat percentiles (usec):
| 1.00th=[ 12], 5.00th=[ 14], 10.00th=[ 16], 20.00th=[ 19],
| 30.00th=[ 21], 40.00th=[ 23], 50.00th=[ 25], 60.00th=[ 28],
| 70.00th=[ 32], 80.00th=[ 37], 90.00th=[ 44], 95.00th=[ 51],
| 99.00th=[ 65], 99.50th=[ 72], 99.90th=[ 90], 99.95th=[ 277],
| 99.99th=[ 9896]
bw ( KiB/s): min= 26, max= 148, per=98.91%, avg=101.72, stdev=17.74, samples=2005
iops : min= 12, max= 66, avg=45.47, stdev= 7.88, samples=2005
lat (usec) : 10=0.04%, 20=28.79%, 50=65.89%, 100=5.21%, 250=0.02%
lat (usec) : 500=0.03%
lat (msec) : 10=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
fsync/fdatasync/sync_file_range:
sync (msec): min=4, max=299, avg=21.96, stdev=16.54
sync percentiles (msec):
| 1.00th=[ 7], 5.00th=[ 8], 10.00th=[ 9], 20.00th=[ 10],
| 30.00th=[ 11], 40.00th=[ 18], 50.00th=[ 21], 60.00th=[ 22],
| 70.00th=[ 24], 80.00th=[ 29], 90.00th=[ 43], 95.00th=[ 55],
| 99.00th=[ 79], 99.50th=[ 94], 99.90th=[ 155], 99.95th=[ 182],
| 99.99th=[ 249]
cpu : usr=0.08%, sys=0.45%, ctx=96953, majf=0, minf=43
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,45590,0,0 short=45590,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=102KiB/s (105kB/s), 102KiB/s-102KiB/s (105kB/s-105kB/s), io=100.0MiB (105MB), run=1002846-1002846msec
Disk stats (read/write):
dm-1: ios=0/158286, merge=0/0, ticks=0/1787031, in_queue=1787031, util=99.49%, aggrios=716/132705, aggrmerge=0/54666, aggrticks=6171/1981263, aggrin_queue=1987434, aggrutil=95.09%
sda: ios=716/132705, merge=0/54666, ticks=6171/1981263, in_queue=1987434, util=95.09%
The VMs are running fine on both hosts, but those disks are stored on a Ceph cluster.
I want to use the local disks to store the etcd of a kubernetes cluster, preferably on SSD (not yet installed).
But the difference in write performance is making me doubt if an SSD would be worth it. If the problem is not the disks, but something else.