Ceph HEALTH_WARN: Degraded data redundancy: 512 pgs undersized

cmonty14

Well-Known Member
Mar 4, 2014
343
5
58
Hi,
I have configured Ceph on a 3-node-cluster.
Then I created OSDs as follows:
Node 1: 3x 1TB HDD
Node 2: 3x 8TB HDD
Node 3: 4x 8TB HDD

This results in following OSD tree:
root@ld4257:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 54.20874 root default
-3 3.27118 host ld4257
0 hdd 1.09039 osd.0 up 1.00000 1.00000
1 hdd 1.09039 osd.1 up 1.00000 1.00000
2 hdd 1.09039 osd.2 up 1.00000 1.00000
-7 21.83038 host ld4464
3 hdd 7.27679 osd.3 up 1.00000 1.00000
4 hdd 7.27679 osd.4 up 1.00000 1.00000
5 hdd 7.27679 osd.5 up 1.00000 1.00000
-5 29.10718 host ld4465
6 hdd 7.27679 osd.6 up 1.00000 1.00000
7 hdd 7.27679 osd.7 up 1.00000 1.00000
8 hdd 7.27679 osd.8 up 1.00000 1.00000
9 hdd 7.27679 osd.9 up 1.00000 1.00000


After this I created a pool with PG total 256 based on the calculation done here.
(size=3, osd=10, data=100%, target=100)

Ceph health gives me a warning:
root@ld4257:~# ceph -s
cluster:
id: fda2f219-7355-4c46-b300-8a65b3834761
health: HEALTH_WARN
Degraded data redundancy: 12 pgs undersized
clock skew detected on mon.ld4464, mon.ld4465

services:
mon: 3 daemons, quorum ld4257,ld4464,ld4465
mgr: ld4257(active), standbys: ld4465, ld4464
osd: 10 osds: 10 up, 10 in

data:
pools: 1 pools, 256 pgs
objects: 0 objects, 0 bytes
usage: 10566 MB used, 55499 GB / 55509 GB avail
pgs: 244 active+clean
12 active+undersized


And this is the health detail:
root@ld4257:~# ceph health detail
HEALTH_WARN Degraded data redundancy: 12 pgs undersized; clock skew detected on mon.ld4464, mon.ld4465
PG_DEGRADED Degraded data redundancy: 12 pgs undersized
pg 2.1d is stuck undersized for 115.728186, current state active+undersized, last acting [3,7]
pg 2.22 is stuck undersized for 115.737825, current state active+undersized, last acting [6,3]
pg 2.29 is stuck undersized for 115.736686, current state active+undersized, last acting [6,5]
pg 2.31 is stuck undersized for 115.738920, current state active+undersized, last acting [9,5]
pg 2.38 is stuck undersized for 115.728054, current state active+undersized, last acting [3,6]
pg 2.57 is stuck undersized for 115.727351, current state active+undersized, last acting [4,6]
pg 2.65 is stuck undersized for 115.727032, current state active+undersized, last acting [3,6]
pg 2.76 is stuck undersized for 115.727156, current state active+undersized, last acting [4,6]
pg 2.90 is stuck undersized for 115.738454, current state active+undersized, last acting [7,3]
pg 2.cc is stuck undersized for 115.728976, current state active+undersized, last acting [3,6]
pg 2.df is stuck undersized for 115.741311, current state active+undersized, last acting [8,5]
pg 2.e2 is stuck undersized for 115.741280, current state active+undersized, last acting [7,3]
MON_CLOCK_SKEW clock skew detected on mon.ld4464, mon.ld4465
mon.ld4464 addr 192.168.100.12:6789/0 clock skew 0.0825909s > max 0.05s (latency 0.00569053s)
mon.ld4465 addr 192.168.100.13:6789/0 clock skew 0.0824161s > max 0.05s (latency 0.00140001s)


I'm wondering why there are less PGs on the larger disks 6-9 compared to disks 0-3:
root@ld4257:~# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
0 hdd 1.09039 1.00000 1116G 1056M 1115G 0.09 4.97 78
1 hdd 1.09039 1.00000 1116G 1056M 1115G 0.09 4.97 73
2 hdd 1.09039 1.00000 1116G 1056M 1115G 0.09 4.97 93
3 hdd 7.27679 1.00000 7451G 1056M 7450G 0.01 0.74 76
4 hdd 7.27679 1.00000 7451G 1056M 7450G 0.01 0.74 102
5 hdd 7.27679 1.00000 7451G 1056M 7450G 0.01 0.74 78
6 hdd 7.27679 1.00000 7451G 1056M 7450G 0.01 0.74 65
7 hdd 7.27679 1.00000 7451G 1056M 7450G 0.01 0.74 60
8 hdd 7.27679 1.00000 7451G 1056M 7450G 0.01 0.74 67
9 hdd 7.27679 1.00000 7451G 1056M 7450G 0.01 0.74 64
TOTAL 55509G 10566M 55499G 0.02
MIN/MAX VAR: 0.74/4.97 STDDEV: 0.04



How can I fix this?
 
What is the data usage on your osd?
Code:
ceph osd df

Size=3 means, all pg need to be replicated 3 times on 3 node. But your node1 have much less hdd than others.

And first, fix clock skew, check all nodes using the same NTP server and time syncronized.
 
I posted the output of ceph osd df in my initial thread.
There's no data stored.
 
After stopping service ntp I ran
ntpd -gq
on any node.
Then I started ntp again.
 
I have modified CRUSH map and created 2 different buckets for the 2 different HDD types.
This means one bucket for all HDDs of size 1TB and one bucket for all HDDs of size 8TB.
root@ld4257:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-11 0 root ssd
-12 0 host ld4257-ssd
-13 0 host ld4464-ssd
-14 0 host ld4465-ssd
-10 43.66196 root hdd_strgbox
-27 0 host ld4257-hdd_strgbox
-28 21.83098 host ld4464-hdd_strgbox
3 hdd 7.27699 osd.3 up 1.00000 1.00000
4 hdd 7.27699 osd.4 up 1.00000 1.00000
5 hdd 7.27699 osd.5 up 1.00000 1.00000
-29 21.83098 host ld4465-hdd_strgbox
6 hdd 7.27699 osd.6 up 1.00000 1.00000
7 hdd 7.27699 osd.7 up 1.00000 1.00000
8 hdd 7.27699 osd.8 up 1.00000 1.00000
-9 3.26999 root hdd
-15 3.26999 host ld4257-hdd
0 hdd 1.09000 osd.0 up 1.00000 1.00000
1 hdd 1.09000 osd.1 up 1.00000 1.00000
2 hdd 1.09000 osd.2 up 1.00000 1.00000
-16 0 host ld4464-hdd
-17 0 host ld4465-hdd
-1 0 root default
-3 0 host ld4257
-7 0 host ld4464
-5 0 host ld4465


Then I created relevant pools:
pveceph createpool hdd -crush_rule hdd_rule -pg_num 256 -size 2
pveceph createpool hddstrgbox -crush_rule hddstrgbox_rule -pg_num 512 -size 2


If I use another pg_num the error message is always:
mon_command failed - pg_num 1024 size 3 would mean 3584 total pgs, which exceeds max 2000 (mon_max_pg_per_osd 200 * num_in_osds 10)

And as a consequence the Health Status reports this:
root@ld4257:~# ceph -s
cluster:
id: fda2f219-7355-4c46-b300-8a65b3834761
health: HEALTH_WARN
Reduced data availability: 512 pgs inactive
Degraded data redundancy: 512 pgs undersized

services:
mon: 3 daemons, quorum ld4257,ld4464,ld4465
mgr: ld4257(active), standbys: ld4465, ld4464
osd: 10 osds: 10 up, 10 in

data:
pools: 2 pools, 512 pgs
objects: 0 objects, 0 bytes
usage: 10765 MB used, 55499 GB / 55509 GB avail
pgs: 100.000% pgs not active
512 undersized+peered


What mus be considered to overcome this warning?

THX
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!