We are building a new PVE cluster with some of our existing servers and experience some troubling disk read/write values.
Our cluster design is as follows:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --filename=/dev/sdb
PVE MH = reference system on Intel NUC with M2 SSD
PVE-CEPH = PVE cluster as described above with vm disk on CEPH
PVE-Local = PVE cluster with vm disk on local SSD (non CEPH)
Our interpretation of these values is that on the local disk on each host the performance is much higher than inside a virtual machine.
The HBA and disks are capable of ~230MB/s read and ~75MB/s write (as seen on row "native" colums "PVE-CEPH").
However when we create a disk inside a virtual machine we have a drop to ~13MB/s read and 4MB/s write at best (see row "iothread + writeback cache" enabled). Other options for the virtual disk show even worse IOPS.
To test CEPH outside of a virtual machine we tried the following:
ceph osd pool create scbench 100 100 ssd
rados bench 60 write -b 4M -t 16 -p scbench --no-cleanup # result: 150MB/s
rados bench 60 seq -t 16 -p scbench # result: 1.300MB/s
We are quite puzzled as to why we have these issues with our cluster and would be grateful for any insight as to why the performance is rather bad inside our VMs whereas it's good on the network/nodes itself
Our cluster design is as follows:
- 3 Servers (will be upgraded to 7 nodes later)
- Node1: 24 CPUs (2 sockets 6 core Intel Xeon), 128 GB RAM
- Node2: 24 CPUs (2 sockets 6 core Intel Xeon), 128 GB RAM
- Node3: 48 CPUs (2 sockets 12 core Intel Xeon), 256 GB RAM
- OS installed on 256 GB SSDs
- dedicated 1Gbit/s network interfaces for
- "coro-sync"
- "VM wan"
- "node management"
- dedicated 10Gbit/s network interfaces for
- ceph (bond active-backup)
- cephfs (bond active-backup)
- 2 x 2TB SSDs (crush rule "ssd", no RAID)
- 2 x 2TB HDDs (crush rule "hdd", no RAID)
- 1 OSD per disk
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75 --filename=/dev/sdb
PVE MH = reference system on Intel NUC with M2 SSD
PVE-CEPH = PVE cluster as described above with vm disk on CEPH
PVE-Local = PVE cluster with vm disk on local SSD (non CEPH)
Our interpretation of these values is that on the local disk on each host the performance is much higher than inside a virtual machine.
The HBA and disks are capable of ~230MB/s read and ~75MB/s write (as seen on row "native" colums "PVE-CEPH").
However when we create a disk inside a virtual machine we have a drop to ~13MB/s read and 4MB/s write at best (see row "iothread + writeback cache" enabled). Other options for the virtual disk show even worse IOPS.
To test CEPH outside of a virtual machine we tried the following:
ceph osd pool create scbench 100 100 ssd
rados bench 60 write -b 4M -t 16 -p scbench --no-cleanup # result: 150MB/s
rados bench 60 seq -t 16 -p scbench # result: 1.300MB/s
We are quite puzzled as to why we have these issues with our cluster and would be grateful for any insight as to why the performance is rather bad inside our VMs whereas it's good on the network/nodes itself
