CEPH Total size increases when one node is down

carles89

Renowned Member
May 27, 2015
100
14
83
Hello,

I'm testing a 3-node cluster with Ceph, and I noticed that when one node is down, the total size of the ceph storage in the GUI increases. Here's the config for each node:

pve01
2 x 32GB Boot disks with ZFS mirror
2 x 50GB OSDs

pve02
2 x 32GB Boot disks with ZFS mirror
2 x 50GB OSDs

pve03
2 x 32GB Boot disks with ZFS mirror
2 x 50GB OSDs

I've created a Ceph pool called "ceph", using size 3/2.

The storage size shown in the GUI is 96.16GB when the pool is healthy, but it increases to 144.24GB when one node is down. MAX AVAIL (ceph df) also increases.
I don't understand this behaviour. I assume it shows 96.16GB as total space because the pool has 3 copies, but why it increases when one node is down?
Usage also increases, but no new information was written.

Healthy pool
1712560684800.png

Code:
root@pve01:~# ceph -s
  cluster:
    id:     44c1909c-1de4-4933-a606-ceea387e4a03
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pve01,pve02,pve03 (age 2d)
    mgr: pve02(active, since 2d), standbys: pve01, pve03
    mds: 1/1 daemons up, 2 standby
    osd: 6 osds: 6 up (since 2d), 6 in (since 2d)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 73 pgs
    objects: 4.46k objects, 17 GiB
    usage:   52 GiB used, 248 GiB / 300 GiB avail
    pgs:     73 active+clean
 
  io:
    client:   4.7 KiB/s wr, 0 op/s rd, 0 op/s wr

Bash:
root@pve01:~# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL    USED  RAW USED  %RAW USED
ssd    300 GiB  248 GiB  52 GiB    52 GiB      17.44
TOTAL  300 GiB  248 GiB  52 GiB    52 GiB      17.44
 
--- POOLS ---
POOL               ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                1    1  577 KiB        2  1.7 MiB      0     74 GiB
ceph                2   32   15 GiB    4.29k   46 GiB  16.96     74 GiB
ISO_Ceph_data       3   32  598 MiB      150  1.8 GiB   0.78     74 GiB
ISO_Ceph_metadata   4    8  120 KiB       22  441 KiB      0     74 GiB

One node down
1712561127743.png

Bash:
root@pve01:~# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL    USED  RAW USED  %RAW USED
ssd    300 GiB  248 GiB  52 GiB    52 GiB      17.44
TOTAL  300 GiB  248 GiB  52 GiB    52 GiB      17.44
 
--- POOLS ---
POOL               ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr                1    1  865 KiB        2  1.7 MiB      0    112 GiB
ceph                2   32   23 GiB    4.29k   46 GiB  16.96    112 GiB
ISO_Ceph_data       3   32  898 MiB      150  1.8 GiB   0.78    112 GiB
ISO_Ceph_metadata   4    8  180 KiB       22  441 KiB      0    112 GiB

Bash:
root@pve01:~# ceph -s
  cluster:
    id:     44c1909c-1de4-4933-a606-ceea387e4a03
    health: HEALTH_WARN
            1/3 mons down, quorum pve01,pve02
            2 osds down
            1 host (2 osds) down
            Degraded data redundancy: 4460/13380 objects degraded (33.333%), 71 pgs degraded
 
  services:
    mon: 3 daemons, quorum pve01,pve02 (age 58s), out of quorum: pve03
    mgr: pve02(active, since 2d), standbys: pve01
    mds: 1/1 daemons up, 2 standby
    osd: 6 osds: 4 up (since 44s), 6 in (since 2d)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 73 pgs
    objects: 4.46k objects, 17 GiB
    usage:   52 GiB used, 248 GiB / 300 GiB avail
    pgs:     4460/13380 objects degraded (33.333%)
             71 active+undersized+degraded
             2  active+undersized
 
  io:
    client:   11 KiB/s wr, 0 op/s rd, 1 op/s wr


At Ceph GUI, the raw space is being corrected few minutes after one node is down:

Healthy cluster
1712560629812.png


One node down
1712561812347.png

Could anyone put some light?

Thank you!
 
Last edited: