OSD rebalance at 1Gb/s over 10Gb/s network?

guerro

Member
Oct 1, 2021
10
0
6
22
Hi, I'm trying to build a hyper-converged 3 node cluster with 4 OSD each on proxmox but I'm having some issues with the OSDs...
First one is the rebalance speed: I've noticed that, even over a 10Gbps network, ceph rebalance my pool at max 1Gbps
1700668312079.png
but iperf3 confirm that the link is effectively 10Gbps (now is a little slower since it's rebalacing, otherwise it's stable at basically 9.90Gbits)
1700668362837.png

All the OSDs are Samsung Enterprise 1.92TB SATA SSD with a theorical speed of 4160Mbps read and 3880Mbps write.
Each node has 128GB RAM and 2x Xeon Gold 6150 18C/36T.
Proxmox version is 8.0.4.

Any type of help is really really appreciated!
 
Please post the contents of
/etc/network/interfaces
/etc/pve/ceph.conf
Hi Alex,
Here's the interfaces file:
Code:
auto lo
iface lo inet loopback

iface eno3 inet manual

auto eno1
iface eno1 inet static
        address 10.10.20.32/24
#Ceph Public

auto eno2
iface eno2 inet static
        address 10.10.30.32/24
#Ceph Cluster

auto eno4
iface eno4 inet static
        address 10.10.10.32/24
#Proxmox Cluster

auto vmbr0
iface vmbr0 inet static
        address 192.168.10.32/24
        gateway 192.168.10.254
        bridge-ports eno3
        bridge-stp off
        bridge-fd 0

auto vmbr500
iface vmbr500 inet manual
        bridge-ports none
        bridge-stp off
        bridge-fd 0
and here's the ceph.conf file:
Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.10.30.31/24
         fsid = 4f64b622-9fc0-4f7d-9b09-765a385c2023
         mon_allow_pool_delete = true
         mon_host = 10.10.20.31 10.10.20.32 10.10.20.33
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.10.20.31/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve1]
         public_addr = 10.10.20.31

[mon.pve2]
         public_addr = 10.10.20.32

[mon.pve3]
         public_addr = 10.10.20.33

Really appreciate your time, thank you.
 
Thanks! that was just to validate the config (eliminate the low hanging fruit)

There are a few things you need to understand about rebalancing.
1. Rebalancing affects only pgs that are missing OSD partners or are in need of moving.
2. Rebalancing will use all affected OSDs IN THE BEGINNING, subject to tunable limits. as rebalancing progresses, fewer PGs will be affected and rebalancing "performance" will begin to drop. Toward the end, rebalancing MB/S can appear to be reduced to a crawl.
3, You can adjust rebalancing tunables but there is a risk of unintended consequences- eg, if you speed up rebalance it will affect the file system performance.

you have 3 nodes with 4 OSDs. This automatically means your rebalances wont be very fast because of the low number of concurrent transactions- ceph really benefits from scaling the number of nodes and OSDs a LOT. If this is a concern and you REALLY want to make rebuilds faster, I'll give you a few tunable- I want to stress that ADJUSTING THESE CAN HAVE NEGATIVE CONSEQUENCES. I'd make changes in small steps and see how it impacts cluster behavior/performance before adjusting further.

adjust these are your own risk.

osd_deep_scrub_stride- default is 512Ki. you can up that to 1024 or 2048
osd_max_backfills- default is 1 (although it may be higher on ssd/nvme class devices.) you can up that to 8 or 16
osd_max_scrubs- default is 3. reduce to 2.
osd_recovery_max_active- default is 10 for ssd. you can up that to 16 or 20

edit- probably simpler to just disable scrubs while rebalancing.

There are more tunables you can turn- I'd suggest going through https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ to see what you can turn, and read up on what they do before touching them.
 
Last edited:
Thanks! that was just to validate the config (eliminate the low hanging fruit)

There are a few things you need to understand about rebalancing.
1. Rebalancing affects only pgs that are missing OSD partners or are in need of moving.
2. Rebalancing will use all affected OSDs IN THE BEGINNING, subject to tunable limits. as rebalancing progresses, fewer PGs will be affected and rebalancing "performance" will begin to drop. Toward the end, rebalancing MB/S can appear to be reduced to a crawl.
3, You can adjust rebalancing tunables but there is a risk of unintended consequences- eg, if you speed up rebalance it will affect the file system performance.

you have 3 nodes with 4 OSDs. This automatically means your rebalances wont be very fast because of the low number of concurrent transactions- ceph really benefits from scaling the number of nodes and OSDs a LOT. If this is a concern and you REALLY want to make rebuilds faster, I'll give you a few tunable- I want to stress that ADJUSTING THESE CAN HAVE NEGATIVE CONSEQUENCES. I'd make changes in small steps and see how it impacts cluster behavior/performance before adjusting further.

adjust these are your own risk.

osd_deep_scrub_stride- default is 512Ki. you can up that to 1024 or 2048
osd_max_backfills- default is 1 (although it may be higher on ssd/nvme class devices.) you can up that to 8 or 16
osd_max_scrubs- default is 3. reduce to 2.
osd_recovery_max_active- default is 10 for ssd. you can up that to 16 or 20

edit- probably simpler to just disable scrubs while rebalancing.

There are more tunables you can turn- I'd suggest going through https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ to see what you can turn, and read up on what they do before touching them.
Hi Alex,
Thank you so much for the kind and exaustive explanation, we'll try to scale up our cluster and look ourselve at how the performance change, but as we don't need extreme recovery speed we'll stick to the current setup.
Have a nice day ahead!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!