[SOLVED] Proxmox VE + Ceph cluster and two datacenters. Too many objects are misplaced; try again later.

Vasilisc · Mar 13, 2024

Please help me with your advice.

I need to implement fault tolerance at the datacenter level in the Proxmox VE hyperconverged cluster (pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.13-1-pve)) and Ceph Reef 18.2.1.

To test future changes, I created a virtual test bench in VirtualBox that closely mimics my cluster in production.

I have reproduced exactly the number of servers in the cluster (17), completely recreated the network configuration and IP addresses on the network interfaces.
I was unable to recreate the hard drive volumes due to the limitations of my work computer. Real disks with a volume of 4 terabytes were simulated by 4 gigabytes.

According to the official documentation on the Ceph project website and various Internet sources, the following commands were executed.

Bash:

ceph osd crush add-bucket dc1 datacenter
ceph osd crush add-bucket dc2 datacenter
ceph osd crush move dc1 root=default
ceph osd crush move dc2 root=default
ceph osd crush move pn1 datacenter=dc1
...
ceph osd crush move pn2 datacenter=dc2
...
ceph osd crush rule create-replicated StarDCreplicated default datacenter hdd
ceph osd pool set rbd crush_rule StarDCreplicated

As a result, the Ceph cluster issued HEALTH_OK, but data rebalancing was not going to start: pgs: 1100/3306 objects misplaced (33.273%)

The "ceph balancer status" command outputs: "optimize_result": "Too many objects (0.332728 > 0.050000) are misplaced; try again later",

Since the testbed was created to experiment and gain experience from my mistakes, I tried to fix the situation by adding hard drives. It didn't help.

I tried restarting the servers one by one. It didn't help.

I tried to use a different rule.

Bash:

ceph osd crush rule create-simple StarDCsimple default datacenter
ceph osd pool set rbd crush_rule StarDCsimple

But the cluster still doesn't try to finish balancing the data.

My virtual testbed looks like this.

Rich (BB code):

  cluster:
    id:     dfce5bc5-428f-4ede-af8d-2d801e84578e
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pn1,pn2,pn3 (age 2h)
    mgr: pn2(active, since 2h), standbys: pn1, pn3
    osd: 34 osds: 34 up (since 2h), 34 in (since 2h); 128 remapped pgs
 
  data:
    pools:   2 pools, 129 pgs
    objects: 1.10k objects, 4.2 GiB
    usage:   14 GiB used, 116 GiB / 131 GiB avail
    pgs:     1100/3306 objects misplaced (33.273%)
             128 active+clean+remapped
             1   active+clean

pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 19 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr read_balance_score 33.33

pool 2 'rbd' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 2056 lfor 0/866/1096 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd read_balance_score 2.13

Rich (BB code):

ID   CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
 -1         0.12711  root default                                
-33         0.07912      datacenter dc1                          
 -3         0.01556          host pn1                            
  0    hdd  0.00389              osd.0       up   1.00000  1.00000
  1    hdd  0.00389              osd.1       up   1.00000  1.00000
  2    hdd  0.00389              osd.2       up   1.00000  1.00000
  3    hdd  0.00389              osd.3       up   1.00000  1.00000
-23         0.00130          host pn13                           
 25    hdd  0.00130              osd.25      up   1.00000  1.00000
-25         0.00389          host pn16                           
 26    hdd  0.00389              osd.26      up   1.00000  1.00000
-27         0.00389          host pn17                           
 27    hdd  0.00389              osd.27      up   1.00000  1.00000
-29         0.00389          host pn18                           
 28    hdd  0.00389              osd.28      up   1.00000  1.00000
-31         0.00389          host pn19                           
 29    hdd  0.00389              osd.29      up   1.00000  1.00000
 -7         0.01556          host pn3                            
  8    hdd  0.00389              osd.8       up   1.00000  1.00000
  9    hdd  0.00389              osd.9       up   1.00000  1.00000
 10    hdd  0.00389              osd.10      up   1.00000  1.00000
 11    hdd  0.00389              osd.11      up   1.00000  1.00000
 -9         0.01556          host pn4                            
 12    hdd  0.00389              osd.12      up   1.00000  1.00000
 13    hdd  0.00389              osd.13      up   1.00000  1.00000
 14    hdd  0.00389              osd.14      up   1.00000  1.00000
 15    hdd  0.00389              osd.15      up   1.00000  1.00000
-11         0.01556          host pn5                            
 16    hdd  0.00389              osd.16      up   1.00000  1.00000
 17    hdd  0.00389              osd.17      up   1.00000  1.00000
 18    hdd  0.00389              osd.18      up   1.00000  1.00000
 19    hdd  0.00389              osd.19      up   1.00000  1.00000
-34         0.04799      datacenter dc2                          
-17         0.00778          host pn10                           
 22    hdd  0.00389              osd.22      up   1.00000  1.00000
 32    hdd  0.00389              osd.32      up   1.00000  1.00000
-19         0.00778          host pn11                           
 23    hdd  0.00389              osd.23      up   1.00000  1.00000
 33    hdd  0.00389              osd.33      up   1.00000  1.00000
-21         0.00389          host pn12                           
 24    hdd  0.00389              osd.24      up   1.00000  1.00000
 -5         0.01556          host pn2                            
  4    hdd  0.00389              osd.4       up   1.00000  1.00000
  5    hdd  0.00389              osd.5       up   1.00000  1.00000
  6    hdd  0.00389              osd.6       up   1.00000  1.00000
  7    hdd  0.00389              osd.7       up   1.00000  1.00000
-13         0.00908          host pn6                            
 20    hdd  0.00130              osd.20      up   1.00000  1.00000
 30    hdd  0.00389              osd.30      up   1.00000  1.00000
 31    hdd  0.00389              osd.31      up   1.00000  1.00000
-15         0.00389          host pn9                            
 21    hdd  0.00389              osd.21      up   1.00000  1.00000

shanreich · Mar 13, 2024

With this command you set the failure domain to datacenter:

Code:

ceph osd crush rule create-replicated StarDCreplicated default datacenter hdd

This means that Ceph tries to create 3 copies in 3 different datacenters. Your configuration only has 2 datacenters, which means Ceph cannot distribute the data across 3 datacenters (since there are only 2).

Vasilisc · Mar 13, 2024

Thank you very much! Everything worked out: I added a third datacenter, moved some of the servers to it. The Ceph cluster has balanced the data.

Rich (BB code):

cluster:
    id:     dfce5bc5-428f-4ede-af8d-2d801e84578e
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pn1,pn2,pn3 (age 9m)
    mgr: pn2(active, since 4h), standbys: pn1, pn3
    osd: 34 osds: 34 up (since 2m), 34 in (since 2m)
 
  data:
    pools:   2 pools, 129 pgs
    objects: 1.10k objects, 4.2 GiB
    usage:   15 GiB used, 115 GiB / 131 GiB avail
    pgs:     129 active+clean

Rich (BB code):

cluster:
{
    "active": true,
    "last_optimize_duration": "0:00:00.000706",
    "last_optimize_started": "Wed Mar 13 13:46:00 2024",
    "mode": "upmap",
    "no_optimization_needed": true,
    "optimize_result": "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect",
    "plans": []
}

Search

Search

[SOLVED] Proxmox VE + Ceph cluster and two datacenters. Too many objects are misplaced; try again later.

Vasilisc

Well-Known Member

shanreich

Proxmox Staff Member

Vasilisc

Well-Known Member