Hello @all,
we are running a Proxmox cluster with five nodes. Three of them are used for ceph, providing 2 pools, one with hdd, the other one with ssd. The two other nodes are used for virtualization with qemu.
We have redundant 10 GBE storage networks and we have redundant 10 GBE ceph networks.
The nodes are equipped with dual cpus and between 96 and 128 MB RAM. The three ceph nodes are completely identical.
We read a lot of proxmox docs, this forum, did hours of googling, but we didn't find a solution for our performance troubles, yet.
We are using the latest Proxmox:
We made rados benchmarks from our virtualization host into our ceph hdd pool and got the following results.
Write:
Read:
The maximum latency is a little high but shall not be the focus of this conversation.
The OSD tree is synchon on all nodes:
On our first virtualization server we have eight linux guests and two windows guests. The qemu agent is activated on all guests. All guest disks are created as VirtIO drives and are stored on our hdd pool.
A linux guest configuration looks like this:
Running a fio benchmark on the client with the configuration above, while there was almost no traffic on the other clients, gives the following results:
This looks like we are loosing quite a bit of disk performance. But why?
We tried to switch to SCSI disk access in guests, but that doesn't improve anything against VirtIO.
We have actived the extra thread for each disk and set the caching strategy to Writeback for best performance.
What else can we do to improve the disk performance?
How much bandwidth from the host should one expect within a guest?
Why is the %util value 100% while doing the fio test. Is this a hint of the source of the problem?
Any help or ideas are welcome.
Best greets,
Mario Minati
we are running a Proxmox cluster with five nodes. Three of them are used for ceph, providing 2 pools, one with hdd, the other one with ssd. The two other nodes are used for virtualization with qemu.
We have redundant 10 GBE storage networks and we have redundant 10 GBE ceph networks.
The nodes are equipped with dual cpus and between 96 and 128 MB RAM. The three ceph nodes are completely identical.
We read a lot of proxmox docs, this forum, did hours of googling, but we didn't find a solution for our performance troubles, yet.
We are using the latest Proxmox:
Code:
# pveversion -v
proxmox-ve: 5.4-1 (running kernel: 4.15.18-14-pve)
pve-manager: 5.4-6 (running version: 5.4-6/aa7856c5)
pve-kernel-4.15: 5.4-2
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-11-pve: 4.15.18-34
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 12.2.12-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-10
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-52
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-43
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-37
pve-container: 2.0-39
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-21
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-2
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-51
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
We made rados benchmarks from our virtualization host into our ceph hdd pool and got the following results.
Write:
Code:
# rados -p pub.hdd.bench bench -b 4M 60 write -t 16 --no-cleanup
[...]
Total time run: 60.571563
Total writes made: 1715
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 113.254
Stddev Bandwidth: 40.2683
Max bandwidth (MB/sec): 176
Min bandwidth (MB/sec): 0
Average IOPS: 28
Stddev IOPS: 10
Max IOPS: 44
Min IOPS: 0
Average Latency(s): 0.564394
Stddev Latency(s): 0.343622
Max latency(s): 2.84305
Min latency(s): 0.0969665
Read:
Code:
# rados -p pub.hdd.bench bench 60 seq -t 16 --no-cleanup
[...]
Total time run: 17.727840
Total reads made: 1715
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 386.962
Average IOPS: 96
Stddev IOPS: 21
Max IOPS: 135
Min IOPS: 48
Average Latency(s): 0.163484
Max latency(s): 1.54406
Min latency(s): 0.0274371
The maximum latency is a little high but shall not be the focus of this conversation.
The OSD tree is synchon on all nodes:
Code:
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 35.36691 root default
-3 11.78897 host pub-ceph-node-01
0 hdd 5.45789 osd.0 up 1.00000 1.00000
1 hdd 5.45789 osd.1 up 1.00000 1.00000
8 ssd 0.87320 osd.8 up 1.00000 1.00000
-5 11.78897 host pub-ceph-node-02
2 hdd 5.45789 osd.2 up 1.00000 1.00000
3 hdd 5.45789 osd.3 up 1.00000 1.00000
7 ssd 0.87320 osd.7 up 1.00000 1.00000
-7 11.78897 host pub-ceph-node-03
4 hdd 5.45789 osd.4 up 1.00000 1.00000
5 hdd 5.45789 osd.5 up 1.00000 1.00000
6 ssd 0.87320 osd.6 up 1.00000 1.00000
On our first virtualization server we have eight linux guests and two windows guests. The qemu agent is activated on all guests. All guest disks are created as VirtIO drives and are stored on our hdd pool.
A linux guest configuration looks like this:
Code:
# qm config 402
agent: 1
balloon: 0
boot: cdn
bootdisk: virtio0
cores: 2
ide2: none,media=cdrom
memory: 16384
name: hbm-srv-02
net0: virtio=52:54:00:6a:24:0a,bridge=vmbr0
net1: virtio=A2:64:0E:18:02:27,bridge=vmbr1
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=c1587fd0-0b8a-4a84-9d4a-b9b1b919d3c5
sockets: 2
virtio0: pub.hdd.vm:vm-402-disk-0,cache=writeback,iothread=1,size=30G
virtio1: pub.hdd.vm:vm-402-disk-1,cache=writeback,iothread=1,size=500G
vmgenid: bbdb6d92-959f-41fc-951e-442c4cdf3626
Running a fio benchmark on the client with the configuration above, while there was almost no traffic on the other clients, gives the following results:
Code:
# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/var/fio.tmp --bs=4k --iodepth=64 --size=8G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [m(1)] [99.7% done] [56044KB/19020KB/0KB /s] [14.2K/4755/0 iops] [eta 00m:03s]
test: (groupid=0, jobs=1): err= 0: pid=17921: Sun Jun 23 21:21:35 2019
read : io=6142.3MB, bw=6373.6KB/s, iops=1593, runt=986843msec
write: io=2049.8MB, bw=2126.1KB/s, iops=531, runt=986843msec
cpu : usr=1.45%, sys=4.28%, ctx=1218785, majf=0, minf=9
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=1572409/w=524743/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: io=6142.3MB, aggrb=6373KB/s, minb=6373KB/s, maxb=6373KB/s, mint=986843msec, maxt=986843msec
WRITE: io=2049.8MB, aggrb=2126KB/s, minb=2126KB/s, maxb=2126KB/s, mint=986843msec, maxt=986843msec
Disk stats (read/write):
vdb: ios=1572293/525175, merge=0/16, ticks=62668876/392852, in_queue=65241904, util=100.00%
This looks like we are loosing quite a bit of disk performance. But why?
We tried to switch to SCSI disk access in guests, but that doesn't improve anything against VirtIO.
We have actived the extra thread for each disk and set the caching strategy to Writeback for best performance.
What else can we do to improve the disk performance?
How much bandwidth from the host should one expect within a guest?
Why is the %util value 100% while doing the fio test. Is this a hint of the source of the problem?
Any help or ideas are welcome.
Best greets,
Mario Minati
Last edited: