Hello all,
I have two proxmox clusters, namely A and B (updated to latest edition)
Cluster A: 3 hosts, 2 hosts with single 10T disk and 256GB SSD as OS, bluestore and bcache. Another host only have minimal hardware configuration for hosting proxmox for cluster quorum only, not intended to host any data and VM.
Cluster B: 1 host, with single 10T disk and 256GB SSD as OS, bluestore and bcache
I configured ceph on both clusters:
Cluster A: 2 OSD, 3 mon, 3 mgr.
I also configured rdbmirror between two ceph cluster.
All hosts are in the same subnet in 1Gbps LAN environment.
Problem:
When I force the resync of image from primary to secondary (bootstrapping)
I could get about 600~700 Mbps utilization in my network, which may be seen as the benchmark for disk speed and network bandwidth.
After the resync, I copy a 4GB file into the image (actually the image is a Windows OS, so I start the VM and copy the file from outside)
In the windows, the copying speed is around 60MBytes per second (~480Mbps).
However, the replay action is very slow and the behavior is very strange.
(a) Without any optimization, the primary host sends ~300Mbps for 1 second and then the network idle for around 10 seconds, and the whole pattern repeat again until the finish of replay.
(b) With the following "optimization" configuration, the network utilizes at 32Mbps until the end of the replay.
In both scenario, it took around 15 minutes to complete the replay.
How can I utilize full bandwidth for replaying (i.e. like the speed during bootstrapping)?
I have two proxmox clusters, namely A and B (updated to latest edition)
Cluster A: 3 hosts, 2 hosts with single 10T disk and 256GB SSD as OS, bluestore and bcache. Another host only have minimal hardware configuration for hosting proxmox for cluster quorum only, not intended to host any data and VM.
Cluster B: 1 host, with single 10T disk and 256GB SSD as OS, bluestore and bcache
I configured ceph on both clusters:
Cluster A: 2 OSD, 3 mon, 3 mgr.
I also configured rdbmirror between two ceph cluster.
All hosts are in the same subnet in 1Gbps LAN environment.
Problem:
When I force the resync of image from primary to secondary (bootstrapping)
I could get about 600~700 Mbps utilization in my network, which may be seen as the benchmark for disk speed and network bandwidth.
After the resync, I copy a 4GB file into the image (actually the image is a Windows OS, so I start the VM and copy the file from outside)
In the windows, the copying speed is around 60MBytes per second (~480Mbps).
However, the replay action is very slow and the behavior is very strange.
(a) Without any optimization, the primary host sends ~300Mbps for 1 second and then the network idle for around 10 seconds, and the whole pattern repeat again until the finish of replay.
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.16.0.10/24
fsid = e5ab22c7-6876-4b68-9f43-d67edd4175c2
mon_allow_pool_delete = true
mon_host = 172.16.0.10
osd_pool_default_min_size = 2
osd_pool_default_size = 1
public_network = 172.16.0.10/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
rbd_default_features = 125
(b) With the following "optimization" configuration, the network utilizes at 32Mbps until the end of the replay.
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.16.0.10/24
fsid = e5ab22c7-6876-4b68-9f43-d67edd4175c2
mon_allow_pool_delete = true
mon_host = 172.16.0.10
osd_pool_default_min_size = 2
osd_pool_default_size = 1
public_network = 172.16.0.10/24
rbd_journal_max_payload_bytes = 524288
rbd_mirror_journal_max_fetch_bytes = 1048576
rbd_mirror_image_state_check_interval = 5
rbd_mirror_pool_replayers_refresh_interval = 5
rbd_mirror_sync_point_update_age = 5
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
rbd_default_features = 125
How can I utilize full bandwidth for replaying (i.e. like the speed during bootstrapping)?
Last edited: