Hi !
We're being validating a stretched cluster design such as :
- Datacenter 1
- 3 PVE (Dell R650) with 5 NVME (1 OSD per disk)
- Datacenter 2
- 3 PVE (Dell R650) with 5 NVME (1 OSD per disk)
- Datacenter 3
- 1 Virtual PVE as witness
So far so good, the stretched mode work well with the following (specific stretched cluster) configuration :
	
	
	
		
The following crush rule :
	
	
	
		
A pool with 4/2 replication, 128PGs and the stretched_rule as replication policy. Coupled with a proper HA group, loosing a whole datacenter restarts all VMs on the other datacenter, exactly what we needed.
Now, I'd like to add two more pools with Datacenter affinities with a 3/2 crush rule to ensure a VM sticks its datacenter osds to address "native HA applications" such as web servers, active directory ... I tried to add the following crush rules :
	
	
	
		
And created 2 new pools (3/2, 64PGS), each based on crush rules. Unfortunatly, Ceph health reports those 128PGs stucked as clean+peered and never ends as active.
	
	
	
		
Here's the OSD tree :
	
	
	
		
Maybe a clue, but I cannot figure out if it's relevant, I can see blacklisted connection ? The IP corresponds to the witness pve node :
	
	
	
		
I tried to play with PGS (512 for stretched 256 for dc), but no changes.
Does anyone see what I'm missing ?
Thanks !
				
			We're being validating a stretched cluster design such as :
- Datacenter 1
- 3 PVE (Dell R650) with 5 NVME (1 OSD per disk)
- Datacenter 2
- 3 PVE (Dell R650) with 5 NVME (1 OSD per disk)
- Datacenter 3
- 1 Virtual PVE as witness
So far so good, the stretched mode work well with the following (specific stretched cluster) configuration :
		Code:
	
	ceph osd crush add-bucket dc1 datacenter
ceph osd crush add-bucket dc2 datacenter
ceph osd crush move dc1 root=default
ceph osd crush move dc2 root=default
ceph osd crush move dc1pve1 datacenter=dc1
ceph osd crush move dc1pve2 datacenter=dc1
ceph osd crush move dc1pve3 datacenter=dc1
ceph osd crush move dc2pve1 datacenter=dc2
ceph osd crush move dc2pve2 datacenter=dc2
ceph osd crush move dc2pve3 datacenter=dc2
ceph mon set_location dc1pve1 datacenter=dc1
ceph mon set_location dc1pve2 datacenter=dc1
ceph mon set_location dc1pve3 datacenter=dc1
ceph mon set_location dc2pve1 datacenter=dc2
ceph mon set_location dc2pve2 datacenter=dc2
ceph mon set_location dc2pve3 datacenter=dc2
ceph mon set_location dc3pve1 datacenter=dc3
ceph mon set election_strategy connectivity
ceph mon set_location dc3pve1 datacenter=dc3
ceph mon enable_stretch_mode dc3pve1 stretch_rule datacenter
	The following crush rule :
		Code:
	
	rule stretch_rule {
  id 2
  type replicated
  step take default
  step choose firstn 0 type datacenter
  step chooseleaf firstn 2 type host
  step emit
}
	A pool with 4/2 replication, 128PGs and the stretched_rule as replication policy. Coupled with a proper HA group, loosing a whole datacenter restarts all VMs on the other datacenter, exactly what we needed.
Now, I'd like to add two more pools with Datacenter affinities with a 3/2 crush rule to ensure a VM sticks its datacenter osds to address "native HA applications" such as web servers, active directory ... I tried to add the following crush rules :
		Code:
	
	rule dc1_rule {
    id 3
    type replicated
    step take dc1
    step chooseleaf firstn 3 type host
    step emit
}
rule dc2_rule {
    id 4
    type replicated
    step take dc2
    step chooseleaf firstn 3 type host
    step emit
}
	And created 2 new pools (3/2, 64PGS), each based on crush rules. Unfortunatly, Ceph health reports those 128PGs stucked as clean+peered and never ends as active.
		Code:
	
	root@dc1pve1:~# ceph health detail
HEALTH_WARN Reduced data availability: 128 pgs inactive
[WRN] PG_AVAILABILITY: Reduced data availability: 128 pgs inactive
    pg 25.26 is stuck inactive for 3h, current state clean+peered, last acting [4,9,2]
    pg 25.28 is stuck inactive for 3h, current state clean+peered, last acting [3,5,9]
    pg 25.29 is stuck inactive for 3h, current state clean+peered, last acting [1,9,7]
    pg 25.2a is stuck inactive for 3h, current state clean+peered, last acting [0,11,5]
    pg 25.2b is stuck inactive for 3h, current state clean+peered, last acting [10,5,2]
    pg 25.2c is stuck inactive for 3h, current state clean+peered, last acting [2,6,9]
    pg 25.2d is stuck inactive for 3h, current state clean+peered, last acting [10,5,3]
    pg 25.2e is stuck inactive for 3h, current state clean+peered, last acting [7,11,2]
    pg 25.2f is stuck inactive for 3h, current state clean+peered, last acting [2,10,5]
    pg 25.30 is stuck inactive for 3h, current state clean+peered, last acting [10,0,5]
    pg 25.31 is stuck inactive for 3h, current state clean+peered, last acting [6,11,0]
    pg 25.32 is stuck inactive for 3h, current state clean+peered, last acting [5,0,10]
    pg 25.33 is stuck inactive for 3h, current state clean+peered, last acting [9,4,0]
    pg 25.34 is stuck inactive for 3h, current state clean+peered, last acting [9,7,1]
    pg 25.35 is stuck inactive for 3h, current state clean+peered, last acting [4,9,3]
    pg 25.36 is stuck inactive for 3h, current state clean+peered, last acting [0,11,6]
    pg 25.37 is stuck inactive for 3h, current state clean+peered, last acting [5,11,2]
    pg 25.38 is stuck inactive for 3h, current state clean+peered, last acting [8,2,7]
    pg 25.39 is stuck inactive for 3h, current state clean+peered, last acting [4,0,9]
    pg 25.3a is stuck inactive for 3h, current state clean+peered, last acting [1,8,4]
    pg 25.3b is stuck inactive for 3h, current state clean+peered, last acting [9,5,1]
    pg 25.3c is stuck inactive for 3h, current state clean+peered, last acting [6,3,11]
    pg 25.3d is stuck inactive for 3h, current state clean+peered, last acting [11,5,3]
    pg 25.3e is stuck inactive for 3h, current state clean+peered, last acting [6,3,8]
    pg 25.3f is stuck inactive for 3h, current state clean+peered, last acting [7,0,11]
    pg 26.24 is stuck inactive for 3h, current state clean+peered, last acting [13,19,17]
    pg 26.25 is stuck inactive for 3h, current state clean+peered, last acting [13,20,18]
    pg 26.28 is stuck inactive for 3h, current state clean+peered, last acting [20,13,18]
    pg 26.29 is stuck inactive for 3h, current state clean+peered, last acting [22,15,13]
    pg 26.2a is stuck inactive for 3h, current state clean+peered, last acting [19,12,15]
    pg 26.2b is stuck inactive for 3h, current state clean+peered, last acting [16,23,20]
    pg 26.2c is stuck inactive for 3h, current state clean+peered, last acting [14,19,17]
    pg 26.2d is stuck inactive for 3h, current state clean+peered, last acting [19,15,14]
    pg 26.2e is stuck inactive for 3h, current state clean+peered, last acting [21,18,14]
    pg 26.2f is stuck inactive for 3h, current state clean+peered, last acting [12,15,21]
    pg 26.30 is stuck inactive for 3h, current state clean+peered, last acting [17,21,14]
    pg 26.31 is stuck inactive for 3h, current state clean+peered, last acting [15,14,19]
    pg 26.32 is stuck inactive for 3h, current state clean+peered, last acting [12,17,21]
    pg 26.33 is stuck inactive for 3h, current state clean+peered, last acting [17,20,13]
    pg 26.34 is stuck inactive for 3h, current state clean+peered, last acting [12,19,15]
    pg 26.35 is stuck inactive for 3h, current state clean+peered, last acting [21,13,16]
    pg 26.36 is stuck inactive for 3h, current state clean+peered, last acting [13,19,18]
    pg 26.37 is stuck inactive for 3h, current state clean+peered, last acting [19,17,13]
    pg 26.38 is stuck inactive for 3h, current state clean+peered, last acting [13,15,19]
    pg 26.39 is stuck inactive for 3h, current state clean+peered, last acting [16,13,21]
    pg 26.3a is stuck inactive for 3h, current state clean+peered, last acting [14,20,17]
    pg 26.3b is stuck inactive for 3h, current state clean+peered, last acting [20,15,12]
    pg 26.3c is stuck inactive for 3h, current state clean+peered, last acting [16,23,22]
    pg 26.3d is stuck inactive for 3h, current state clean+peered, last acting [23,21,18]
    pg 26.3e is stuck inactive for 3h, current state clean+peered, last acting [20,17,14]
    pg 26.3f is stuck inactive for 3h, current state clean+peered, last acting [12,22,15]
	Here's the OSD tree :
		Code:
	
	root@dc1pve1:~# ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                 STATUS  REWEIGHT  PRI-AFF
 -1         2.34445  root default                                       
-15         1.17223      datacenter dc1                               
 -3         0.39075          host dc1pve1                           
  0    nvme  0.09769              osd.0             up   1.00000  1.00000
  1    nvme  0.09769              osd.1             up   1.00000  1.00000
  2    nvme  0.09769              osd.2             up   1.00000  1.00000
  3    nvme  0.09769              osd.3             up   1.00000  1.00000
 -5         0.39075          host dc1pve2                           
  4    nvme  0.09769              osd.4             up   1.00000  1.00000
  5    nvme  0.09769              osd.5             up   1.00000  1.00000
  6    nvme  0.09769              osd.6             up   1.00000  1.00000
  7    nvme  0.09769              osd.7             up   1.00000  1.00000
 -7         0.39075          host dc1pve3                           
  8    nvme  0.09769              osd.8             up   1.00000  1.00000
  9    nvme  0.09769              osd.9             up   1.00000  1.00000
 10    nvme  0.09769              osd.10            up   1.00000  1.00000
 11    nvme  0.09769              osd.11            up   1.00000  1.00000
-16         1.17223      datacenter dc2                               
 -9         0.39075          host dc2pve1                           
 12    nvme  0.09769              osd.12            up   1.00000  1.00000
 13    nvme  0.09769              osd.13            up   1.00000  1.00000
 14    nvme  0.09769              osd.14            up   1.00000  1.00000
 23    nvme  0.09769              osd.23            up   1.00000  1.00000
-11         0.39075          host dc2pve2                           
 15    nvme  0.09769              osd.15            up   1.00000  1.00000
 16    nvme  0.09769              osd.16            up   1.00000  1.00000
 17    nvme  0.09769              osd.17            up   1.00000  1.00000
 18    nvme  0.09769              osd.18            up   1.00000  1.00000
-13         0.39075          host dc2pve3                           
 19    nvme  0.09769              osd.19            up   1.00000  1.00000
 20    nvme  0.09769              osd.20            up   1.00000  1.00000
 21    nvme  0.09769              osd.21            up   1.00000  1.00000
 22    nvme  0.09769              osd.22            up   1.00000  1.00000
	Maybe a clue, but I cannot figure out if it's relevant, I can see blacklisted connection ? The IP corresponds to the witness pve node :
		Code:
	
	root@dc1pve1:~# ceph osd blacklist ls
192.168.114.2:0/3316608452 2024-11-27T18:19:13.728470+0100
192.168.114.2:0/5968008 2024-11-27T18:19:13.728470+0100
192.168.114.2:0/1051452434 2024-11-27T18:19:13.728470+0100
192.168.114.2:6817/18177 2024-11-27T18:19:13.728470+0100
192.168.114.2:0/2390757461 2024-11-27T18:19:13.728470+0100
192.168.114.2:6816/18177 2024-11-27T18:19:13.728470+0100
192.168.114.2:0/2298991455 2024-11-27T18:18:16.144679+0100
192.168.114.2:0/3816873361 2024-11-27T18:18:16.144679+0100
192.168.114.2:0/3896422733 2024-11-27T18:18:16.144679+0100
192.168.114.2:0/1685705789 2024-11-27T18:18:16.144679+0100
192.168.114.2:6817/1068 2024-11-27T18:18:16.144679+0100
192.168.114.2:6816/1068 2024-11-27T18:18:16.144679+0100
	I tried to play with PGS (512 for stretched 256 for dc), but no changes.
Does anyone see what I'm missing ?
Thanks !
			
				Last edited: