ceph stretch cluster

przemekk

New Member
Apr 4, 2024
1
0
1
We created stretch cluster with 6 nodes , 4 disks in each

Rule is working only on 3 or 4 replicas

rule StretchRuleReadRandom {
id 2
type replicated
step take default
step choose firstn 0 type datacenter
step chooseleaf firstn 2 type host
step emit
}

This rule is not working

rule W1 {
id 3
type replicated
step take W1
step chooseleaf firstn 0 type host
step emit
}

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 11.71912 root default
-19 5.85956 datacenter W1
-3 1.95319 host ceph-w1-01
0 ssd 0.48830 osd.0 up 1.00000 1.00000
1 ssd 0.48830 osd.1 up 1.00000 1.00000
2 ssd 0.48830 osd.2 up 1.00000 1.00000
23 ssd 0.48830 osd.23 up 1.00000 1.00000
-7 1.95319 host ceph-w1-02
3 ssd 0.48830 osd.3 up 1.00000 1.00000
4 ssd 0.48830 osd.4 up 1.00000 1.00000
5 ssd 0.48830 osd.5 up 1.00000 1.00000
6 ssd 0.48830 osd.6 up 1.00000 1.00000
-10 1.95319 host ceph-w1-03
7 ssd 0.48830 osd.7 up 1.00000 1.00000
8 ssd 0.48830 osd.8 up 1.00000 1.00000
9 ssd 0.48830 osd.9 up 1.00000 1.00000
10 ssd 0.48830 osd.10 up 1.00000 1.00000
-20 5.85956 datacenter W2
-13 1.95319 host ceph-w2-01
11 ssd 0.48830 osd.11 up 1.00000 1.00000
12 ssd 0.48830 osd.12 up 1.00000 1.00000
13 ssd 0.48830 osd.13 up 1.00000 1.00000
14 ssd 0.48830 osd.14 up 1.00000 1.00000
-16 1.95319 host ceph-w2-02
15 ssd 0.48830 osd.15 up 1.00000 1.00000
16 ssd 0.48830 osd.16 up 1.00000 1.00000
17 ssd 0.48830 osd.17 up 1.00000 1.00000
18 ssd 0.48830 osd.18 up 1.00000 1.00000
-25 1.95319 host ceph-w2-03
19 ssd 0.48830 osd.19 up 1.00000 1.00000
20 ssd 0.48830 osd.20 up 1.00000 1.00000
21 ssd 0.48830 osd.21 up 1.00000 1.00000
22 ssd 0.48830 osd.22 up 1.00000 1.00000
 
I assume you're asking why rule W1 isn't working? In short, Ceph doesn't know about locality.

Long answer:
  • Every pool has a size and min_size.
    The size tells Ceph how many object copies there need to be, Ceph always strives for this number of copies. The min_size is the number of copies where Ceph still allows IO. Meaning, when the actual count of copies is lower then min_size it will freeze this particular PG.
  • Eg. a Ceph client calculates on the fly where a particular object is (or should be), in which PG it is located.
    Then a client talks to the primary OSD (which is the one responsible for the PG). Since there is always only one primary its location is arbitrary.
  • The crush_rules are how Ceph is distributing objects and its copies.
    Each rule works according to a hierarchy of failure domains (these need to be defined). On your rule StretchRuleReadRandom it is defined that Ceph will choose as many DCs as there are copies and then place 2 copies in each DC on different hosts.
  • Only a pool with size=4,min_size=2 can work, in a stretched cluster with only two locations. As either side needs at least two copies to still facilitate IO when the other location is down.
Aside from that, you need a third location for quorum (PVE & Ceph). Otherwise both sides lose quorum when one side is down.
 
  • Like
Reactions: jsterr and UdoB

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!