Hello everyone,
I have been trying to configure Ceph storage for my VMs for some time now, but I am experiencing significant performance issues and cannot seem to resolve them.
I need storage for the VMs that can withstand the loss of a physical node and even the failure of two disks. For example, if I have two physical storage servers, I want the VMs to still be functional even if the first one is down and two disks on the second one have failed. A sort of RAID6 for storage + mirroring on a second storage device.
I have created a test environment with one computational server where I create the VMs and two storage servers with HDD disks, which I use for the Ceph pool. Unfortunately, I only need to use the HDDs at this stage. As for the network, it is currently 1Gbit, but I already plan to upgrade it to 10Gbit in the future. I have also configured both the public network and the cluster network together, as it is a test environment with little traffic.
I am encountering problems especially with Windows VMs, which are almost unusable. I have performed tests with CrystalDisk and, for example, random writes travel at around 20 MB/s in read and 2-3 MB/s in write. The same VM, when saved on an LVM storage of SSD disks of the machine on which it is hosted, goes up to 300 MB/s.
Below are some specifications about my configuration. I hope you can help me understand what I am doing wrong.
Thank you very much in advance!
ceph osd tree
cat /etc/pve/ceph.conf
iperf between the two storage servers
cat /etc/network/interfaces
cat /etc/pve/qemu-server/153.conf
ceph osd pool ls detail
ceph osd pool get storage_raid6-data all
ceph osd crush rule dump
I have been trying to configure Ceph storage for my VMs for some time now, but I am experiencing significant performance issues and cannot seem to resolve them.
I need storage for the VMs that can withstand the loss of a physical node and even the failure of two disks. For example, if I have two physical storage servers, I want the VMs to still be functional even if the first one is down and two disks on the second one have failed. A sort of RAID6 for storage + mirroring on a second storage device.
I have created a test environment with one computational server where I create the VMs and two storage servers with HDD disks, which I use for the Ceph pool. Unfortunately, I only need to use the HDDs at this stage. As for the network, it is currently 1Gbit, but I already plan to upgrade it to 10Gbit in the future. I have also configured both the public network and the cluster network together, as it is a test environment with little traffic.
I am encountering problems especially with Windows VMs, which are almost unusable. I have performed tests with CrystalDisk and, for example, random writes travel at around 20 MB/s in read and 2-3 MB/s in write. The same VM, when saved on an LVM storage of SSD disks of the machine on which it is hosted, goes up to 300 MB/s.
Below are some specifications about my configuration. I hope you can help me understand what I am doing wrong.
Thank you very much in advance!
Code:
rbd: storage_raid6
content images,rootdir
data-pool storage_raid6-data
krbd 1
pool storage_raid6-metadata
ceph osd tree
Code:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 70.95377 root default
-3 32.74789 host chs-beta-bstorage01
2 hdd 5.45798 osd.2 up 0.85004 1.00000
3 hdd 5.45798 osd.3 up 1.00000 1.00000
5 hdd 5.45798 osd.5 up 1.00000 1.00000
7 hdd 5.45798 osd.7 up 1.00000 1.00000
12 hdd 5.45799 osd.12 up 1.00000 1.00000
13 hdd 5.45799 osd.13 up 1.00000 1.00000
-5 38.20587 host chs-beta-bstorage02
1 hdd 5.45798 osd.1 up 1.00000 1.00000
4 hdd 5.45798 osd.4 up 1.00000 1.00000
6 hdd 5.45798 osd.6 up 0.85004 1.00000
8 hdd 5.45798 osd.8 up 0.85004 1.00000
9 hdd 5.45798 osd.9 up 1.00000 1.00000
10 hdd 5.45799 osd.10 up 0.85004 1.00000
11 hdd 5.45799 osd.11 up 1.00000 1.00000
cat /etc/pve/ceph.conf
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.53.53.80/24
fsid = 37f258a3-6b02-471f-a2d9-8f66d8d29444
mon_allow_pool_delete = true
mon_host = 10.53.53.80 10.53.53.71 10.53.53.70 10.53.53.79
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.53.53.80/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mon.chs-beta-bstorage01]
public_addr = 10.53.53.70
[mon.chs-beta-bstorage02]
public_addr = 10.53.53.71
[mon.chs-prmx4]
public_addr = 10.53.53.80
[mon.chs-prmx5]
public_addr = 10.53.53.79
iperf between the two storage servers
Code:
[ 5] 0.00-1.00 sec 115 MBytes 965 Mbits/sec
cat /etc/network/interfaces
Code:
auto vmbr0
iface vmbr0 inet static
address 10.53.53.79/24
gateway 10.53.53.1
bridge-ports eno1np0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9000
cat /etc/pve/qemu-server/153.conf
Code:
agent: 1
bios: ovmf
boot: order=scsi0;ide2;ide0;net0
cores: 8
cpu: host
efidisk0: storage_raid6:vm-153-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: local:iso/virtio-win-0.1.271.iso,media=cdrom,size=709474K
ide2: local:iso/windows_server_2025.iso,media=cdrom,size=5819534K
machine: pc-q35-9.2+pve1
memory: 16384
meta: creation-qemu=9.2.0,ctime=1757080916
name: windows-opt-raid
net0: virtio=BC:24:11:6F:16:40,bridge=vmbr0,mtu=9000
numa: 0
ostype: win11
scsi0: storage_raid6:vm-153-disk-1,aio=threads,discard=on,iothread=1,size=50G
scsihw: virtio-scsi-single
smbios1: uuid=bcf089df-ee56-4b1d-9034-1033af908a00
sockets: 1
vmgenid: f1bad49a-9464-48f5-bc90-52dd41c48895
ceph osd pool ls detail
Code:
pool 6 'storage_raid6-data' erasure profile pve_ec_storage_raid6 size 6 min_size 5 crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 3952 lfor 0/3184/3182 flags hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384 application rbd
pool 7 'storage_raid6-metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 3954 flags hashpspool stripe_width 0 application rbd read_balance_score 1.62
ceph osd pool get storage_raid6-data all
Code:
size: 6
min_size: 5
pg_num: 32
pgp_num: 32
crush_rule: storage_raid6-data
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: pve_ec_storage_raid6
fast_read: 0
pg_autoscale_mode: on
eio: false
bulk: false
ceph osd crush rule dump
Code:
"rule_id": 1,
"rule_name": "storage_raid6-data",
"type": 3,
"steps": [
{
"op": "set_chooseleaf_tries",
"num": 5
},
{
"op": "set_choose_tries",
"num": 100
},
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_indep",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}
]