RDB mirroring slow in proxmox

tawh · Apr 28, 2020

Hello all,

I have two proxmox clusters, namely A and B (updated to latest edition)
Cluster A: 3 hosts, 2 hosts with single 10T disk and 256GB SSD as OS, bluestore and bcache. Another host only have minimal hardware configuration for hosting proxmox for cluster quorum only, not intended to host any data and VM.
Cluster B: 1 host, with single 10T disk and 256GB SSD as OS, bluestore and bcache

I configured ceph on both clusters:
Cluster A: 2 OSD, 3 mon, 3 mgr.
I also configured rdbmirror between two ceph cluster.

All hosts are in the same subnet in 1Gbps LAN environment.

Problem:
When I force the resync of image from primary to secondary (bootstrapping)
I could get about 600~700 Mbps utilization in my network, which may be seen as the benchmark for disk speed and network bandwidth.

After the resync, I copy a 4GB file into the image (actually the image is a Windows OS, so I start the VM and copy the file from outside)
In the windows, the copying speed is around 60MBytes per second (~480Mbps).

However, the replay action is very slow and the behavior is very strange.
(a) Without any optimization, the primary host sends ~300Mbps for 1 second and then the network idle for around 10 seconds, and the whole pattern repeat again until the finish of replay.

Code:

[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 172.16.0.10/24
         fsid = e5ab22c7-6876-4b68-9f43-d67edd4175c2
         mon_allow_pool_delete = true
         mon_host =  172.16.0.10
         osd_pool_default_min_size = 2
         osd_pool_default_size = 1
         public_network = 172.16.0.10/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring
         rbd_default_features = 125

(b) With the following "optimization" configuration, the network utilizes at 32Mbps until the end of the replay.

Code:

[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 172.16.0.10/24
         fsid = e5ab22c7-6876-4b68-9f43-d67edd4175c2
         mon_allow_pool_delete = true
         mon_host =  172.16.0.10
         osd_pool_default_min_size = 2
         osd_pool_default_size = 1
         public_network = 172.16.0.10/24
         rbd_journal_max_payload_bytes = 524288
         rbd_mirror_journal_max_fetch_bytes = 1048576
         rbd_mirror_image_state_check_interval = 5
         rbd_mirror_pool_replayers_refresh_interval = 5
         rbd_mirror_sync_point_update_age = 5

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring
         rbd_default_features = 125

In both scenario, it took around 15 minutes to complete the replay.

How can I utilize full bandwidth for replaying (i.e. like the speed during bootstrapping)?

Alwin · Apr 28, 2020

tawh said:
What I want to achieve is to use full bandwidth for replaying (i.e. like the speed during bootstrapping)

Journal based mirroring needs to replay the writes exactly as they happened on the primary image to be crash consistent. This is very likely why the transfer takes its time. The snapshot-based mirroring was only introduced recently in Octopus and could speed things up. But it's not yet available on Proxmox VE.

tawh · Apr 28, 2020

Alwin said:
Journal based mirroring needs to replay the writes exactly as they happened on the primary image to be crash consistent. This is very likely why the transfer takes its time. The snapshot-based mirroring was only introduced recently in Octopus and could speed things up. But it's not yet available on Proxmox VE.

Thanks for your reply.

I understand the "replay" phenomenon. But the matter of fact is I can write to the primary CEPH storage at a speed of 480Mbps while the secondary CEPH can only replays at ~30Mbps. I also configured a dual mirror which I can do the reverse, but the result are the same. I researched this problem in the Internet for around 3 to 4 months but no progress at all.

If that is the fact for the performance of rbd mirroring, I believe there is no use case for that kind of synchronization.

Is there any users built a practicable rbd mirror environment?

Alwin · Apr 28, 2020

tawh said:
After the resync, I copy a 4GB file into the image (actually the image is a Windows OS, so I start the VM and copy the file from outside)
In the windows, the copying speed is around 60MBytes per second (~480Mbps).

If you refer to this speed, then please bear in mind that there are a couple of levels in between till the Ceph storage is reached. What performance does a rados bench show?
https://proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark

tawh · Apr 28, 2020

Alwin said:
If you refer to this speed, then please bear in mind that there are a couple of levels in between till the Ceph storage is reached. What performance does a rados bench show?
https://proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark

Code:

rados bench 60 write -b 4M -t 16 --no-cleanup

Cluster A:

Code:

Total time run:         60.3526
Total writes made:      2798
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     185.443
Stddev Bandwidth:       66.8916
Max bandwidth (MB/sec): 352
Min bandwidth (MB/sec): 64
Average IOPS:           46
Stddev IOPS:            16.7229
Max IOPS:               88
Min IOPS:               16
Average Latency(s):     0.345065
Stddev Latency(s):      0.138755
Max latency(s):         0.959484
Min latency(s):         0.0310857

Cluster B:

Code:

Total time run:         60.6366
Total writes made:      1977
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     130.416
Stddev Bandwidth:       88.9065
Max bandwidth (MB/sec): 492
Min bandwidth (MB/sec): 12
Average IOPS:           32
Stddev IOPS:            22.2266
Max IOPS:               123
Min IOPS:               3
Average Latency(s):     0.490735
Stddev Latency(s):      0.299194
Max latency(s):         1.53251
Min latency(s):         0.02516

Alwin · Apr 28, 2020

There is a big deviation (stddev) on those clusters. This will bring down performance. It is more or less expected from the size and hardware of the cluster.

For journal based mirroring, while the VM writes data onto Ceph, it will be also read from it. At best this cuts the performance in half again.

tawh · Apr 28, 2020

Alwin said:
There is a big deviation (stddev) on those clusters. This will bring down performance. It is more or less expected from the size and hardware of the cluster.

For journal based mirroring, while the VM writes data onto Ceph, it will be also read from it. At best this cuts the performance in half again.

Thus such deviation brings the speed replaying of the mirror to slump to about 1/16 of that of bootstrapping ? From the network utilization graph, The bandwidth used for replaying was very stable to near ~30Mbps.

By the way, with the best or optimum configuration, what is the speed of both bootstrapping and replaying? Is there any real life configuration so that I can reference?

Thanks.

Alwin · Apr 28, 2020

tawh said:
Thus such deviation brings the speed replaying of the mirror to slump to about 1/16 of that of bootstrapping ? From the network utilization graph, The bandwidth used for replaying was very stable to near ~30Mbps.

I suppose the difference is the write pattern. Running along the journal might just introduce many small writes.

tawh said:
By the way, with the best or optimum configuration, what is the speed of both bootstrapping and replaying? Is there any real life configuration so that I can reference?

This may be best answered by other users here in the forum.

tawh · Apr 29, 2020

Is there any member can share any experience on rbd mirror ?
If rbd mirror is not practical regarding its performance, is there any block level real time replication tools can be used in proxmox?

Thanks.

resoli · Jul 9, 2020

tawh said:
Is there any member can share any experience on rbd mirror ?
If rbd mirror is not practical regarding its performance, is there any block level real time replication tools can be used in proxmox?

Thanks.

I am trying to play with rbd-mirror as well, following the howto on the wiki:

https://pve.proxmox.com/wiki/Ceph_RBD_Mirroring

Primary cluster is composed of seven nodes, with four 2TB bluestore OSD with SSD cache each.

Secondary cluster has three nodes with similar configuration. Both clusters share a dedicated 10Gb/s storage network.

The mirrored image is a debian buster installation, with native (without mirroring) ~400MB/s write performance with some GB of zeroes. With mirroring enabled performance decrease to ~ 120MB/s, but this was expected. The replay speed is very slow as @tawh reported; after the write operation (6GB of zeros with dd) complete, i can demote the primary image, and I have to wait several minutes the secondary replaying before I can promote it.

Alwin · Jul 9, 2020

tawh said:
If rbd mirror is not practical regarding its performance, is there any block level real time replication tools can be used in proxmox?

Replication across distances can't be in real time. But anyhow, Ceph Octopus introduces snapshot based mirroring. Packages can be found in our test repository.
https://pve.proxmox.com/wiki/Package_Repositories

resoli · Jul 10, 2020

resoli said:
...
The replay speed is very slow as @tawh reported; after the write operation (6GB of zeros with dd) complete, i can demote the primary image, and I have to wait several minutes the secondary replaying before I can promote it.

After some research, i came on this post of ceph-users ml:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/028898.html

«If you are trying to optimize for 128KiB writes, you might need to tweak
the "rbd_journal_max_payload_bytes" setting since it currently is defaulted
to split journal write events into a maximum of 16KiB payload [1] in order
to optimize the worst-case memory usage of the rbd-mirror daemon for
environments w/ hundreds or thousands of replicated images.»

Other posts mention a tuning of "rbd_mirror_journal_max_fetch_bytes":

https://lists.ceph.io/hyperkitty/li...SKEB5X3N4S4/#IHMGNFLWCD5E5R4W5S2BSSKEB5X3N4S4

Anyway, as Jason Dillaman says in the first post, it seems that the default values,

"rbd_journal_max_payload_bytes": "16384"
"rbd_mirror_journal_max_fetch_bytes": "32768"

Are chosen on purpose for the aforementioned scenario (hundreds or thousands of replicated images)

rob

tawh · Jul 21, 2020

resoli said:
After some research, i came on this post of ceph-users ml:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-August/028898.html

«If you are trying to optimize for 128KiB writes, you might need to tweak
the "rbd_journal_max_payload_bytes" setting since it currently is defaulted
to split journal write events into a maximum of 16KiB payload [1] in order
to optimize the worst-case memory usage of the rbd-mirror daemon for
environments w/ hundreds or thousands of replicated images.»

Other posts mention a tuning of "rbd_mirror_journal_max_fetch_bytes":

https://lists.ceph.io/hyperkitty/li...SKEB5X3N4S4/#IHMGNFLWCD5E5R4W5S2BSSKEB5X3N4S4

Anyway, as Jason Dillaman says in the first post, it seems that the default values,

"rbd_journal_max_payload_bytes": "16384"
"rbd_mirror_journal_max_fetch_bytes": "32768"

Are chosen on purpose for the aforementioned scenario (hundreds or thousands of replicated images)

rob

Wow, I forgot this thread as no one replied for several weeks. Really Thanks for bringing it out again.
I also tried the play with those parameters but no help. As a result, I gave up the rbd_mirror and used linstor.

Search

Search

RDB mirroring slow in proxmox

tawh

Active Member

Alwin

Proxmox Retired Staff

tawh

Active Member

Alwin

Proxmox Retired Staff

tawh

Active Member

Alwin

Proxmox Retired Staff

tawh

Active Member

Alwin

Proxmox Retired Staff

tawh

Active Member

resoli

Renowned Member

Alwin

Proxmox Retired Staff

resoli

Renowned Member

tawh

Active Member

We value your privacy