Ceph design (Number of OSDs per NVMe Disk), rebalance problems

matsnlc · Jun 16, 2023

Balancer still reports:
{
"active": true,
"last_optimize_duration": "0:00:00.020016",
"last_optimize_started": "Fri Jun 16 14:21:11 2023",
"mode": "upmap",
"optimize_result": "Optimization plan created successfully",
"plans": []
}

ceph osd get-require-min-compat-client reports jewel

and ceph versions
{
"mon": {
"ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)": 3
},
"mgr": {
"ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)": 2
},
"osd": {
"ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)": 140
},
"mds": {},
"overall": {
"ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable)": 145
}
}

Should I try and set ceph osd set-require-min-compat-client luminous?

matsnlc · Jun 16, 2023

the reason I ask is that I'm worried that it might go crazy again and start rebalance stuff killing all bandwidth again. It is Friday

Should I tune mclock settings in any way before doing this?

aaron · Jun 16, 2023

matsnlc said:
Should I try and set ceph osd set-require-min-compat-client luminous?

yes.

matsnlc said:
the reason I ask is that I'm worried that it might go crazy again and start rebalance stuff killing all bandwidth again. It is Friday

The balancer won't cause a lot of load and will move PGs (replicas thereof) slowly between the OSDs.

Changing the pg_num of the pool is what can cause a lot of load

matsnlc · Jun 19, 2023

Tried ceph osd set-require-min-compat-client luminous but the number of backfills increased very fast together with bandwidth so I had to pause it by setting global flag no_backfill.

I ended up tuning the mclock setting as posted earlier, again this setting calmed things down and it was possible to complete, the balance is now much better, 10-12% diff compared to nearly 50% before.

I'm a bit worried though that we have to do this manual mclock tuning every so often to prevent the cluster from going down.

Search

Search

Ceph design (Number of OSDs per NVMe Disk), rebalance problems

matsnlc

New Member

matsnlc

New Member

aaron

Proxmox Staff Member

matsnlc

New Member

We value your privacy