Windows Guest - Slow disk performance in RBD Pool

PutnamCountyIT · Mar 11, 2022

I recently built a dev cluster to test ceph performance, using a Windows Server 2019 guest with CrystalDiskMark I am getting very very slow speeds in read and write testing.

Reads:140MB/s vs 4000MB/s testing on a disk attached to NFS storage.
Writes: 90MB/s vs 1643MB/s

ceph.conf


[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.10.27.33/27
         fsid = 59b958ee-5d9c-48e0-a59b-ee1f7c6ea1bc
         mon_allow_pool_delete = true
         mon_host = 192.168.2.95 192.168.2.79 192.168.2.69
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 192.168.2.95/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.putsproxp01]
         host = putsproxp01
         mds_standby_for_name = pve

[mds.putsproxp04]
         host = putsproxp04
         mds_standby_for_name = pve

[mds.putsproxp05]
         host = putsproxp05
         mds standby for name = pve

[mon.putsproxp01]
         public_addr = 192.168.2.95

[mon.putsproxp04]
         public_addr = 192.168.2.79

[mon.putsproxp05]
         public_addr = 192.168.2.69

The cluster network is attached to 10gbps ports, iperf shows connectivity between all 5 nodes with 10gb bandwidth. The public network is on 1gb ports which may be the cause but even when changing the network addresses and restarting services I did not see a difference.

All the OSDs are 7200RPM HDDs, the DB is kept on the disk but the WAL is configured on a high performance Nvme SSD.

Rados Bench:


root@putsproxp04:~# rados bench -p vmpool01 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_putsproxp04_1869644
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        35        19   75.9961        76    0.990869    0.536613
    2      16        56        40   79.9941        84    0.152236    0.529228
    3      16        84        68   90.6595       112    0.148355    0.589819
    4      16       120       104   103.991       144    0.487202    0.555359
    5      16       143       127    101.59        92    0.153748    0.559597
    6      16       166       150   99.9903        92    0.169611    0.546414
    7      16       198       182    103.99       128    0.151676    0.597447
    8      16       233       217   108.489       140    0.339944    0.575887
    9      16       257       241   107.101        96     1.18466    0.569925
   10      16       282       266   106.389       100     1.50691     0.57012
Total time run:         10.5944
Total writes made:      282
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     106.471
Stddev Bandwidth:       23.6418
Max bandwidth (MB/sec): 144
Min bandwidth (MB/sec): 76
Average IOPS:           26
Stddev IOPS:            5.91044
Max IOPS:               36
Min IOPS:               19
Average Latency(s):     0.586548
Stddev Latency(s):      0.453353
Max latency(s):         3.15577
Min latency(s):         0.128107


root@putsproxp04:~# rados bench -p vmpool01 10 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        48        32   127.967       128    0.885354    0.323922
    2      16        75        59   117.979       108     0.45419    0.434617
    3      16       110        94   125.313       140    0.652542    0.455451
    4      16       144       128    127.98       136    0.769006     0.45517
    5      16       182       166    132.78       152    0.186844    0.452258
    6      16       211       195   129.982       116    0.268633     0.43869
    7      16       243       227   129.697       128    0.567921    0.439819
    8      16       269       253   126.484       104    0.206097    0.458726
    9       3       282       279   123.985       104     1.11506    0.489577
Total time run:       9.11017
Total reads made:     282
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   123.818
Average IOPS:         30
Stddev IOPS:          4.272
Max IOPS:             38
Min IOPS:             26
Average Latency(s):   0.495371
Max latency(s):       2.49144
Min latency(s):       0.0194348

PveVersion


pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-8
pve-kernel-5.13: 7.1-6
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-2
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
openvswitch-switch: 2.15.0+ds1-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-5
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-1
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1

Otter7721 · Feb 11, 2023

Any news?

gurubert · Feb 11, 2023

PutnamCountyIT said:
All the OSDs are 7200RPM HDDs, the DB is kept on the disk

With this configuration in a three node cluster you cannot expect to get more performance than from a single HDD. Reading 123MB/s is as expected and writing 90MB/s also, as it involves more RocksDB transactions (which are very slow on HDD).

Otter7721 · Feb 11, 2023

gurubert said:
With this configuration in a three node cluster you cannot expect to get more performance than from a single HDD. Reading 123MB/s is as expected and writing 90MB/s also, as it involves more RocksDB transactions (which are very slow on HDD).

Haha, yes, the mechanical hard disk is faster than my nvme. I use three nvme in a single-node ceph cluster, and each nvme has four osds. Then I get 30MB/s write bandwidth in the rados bench. Here is my thread.
https://forum.proxmox.com/threads/c...problems-fast-reading-and-slow-writing.122023

Search

Search

Windows Guest - Slow disk performance in RBD Pool

PutnamCountyIT

Member

Otter7721

New Member

gurubert

Distinguished Member

Otter7721

New Member

We value your privacy