[SOLVED] HEALTH_WARN clock skew detected on mon.pve3 after last updates

MLOrion

Member
Feb 8, 2021
14
12
8
51
Hamburg
www.orionbulkers.com
I have a 3 node cluster
after migrating all vm's from pve3 to pve2 and appling the latest updates i got his error

Code:
HEALTH_WARN clock skew detected on mon.pve3; Degraded data redundancy: 111714/335142 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized; 235 slow ops, oldest one blocked for 774 sec, mon.pve3 has slow ops

time is current on all 3 nodes and time sync works

Code:
root@pve3:~# ceph -s
  cluster:
    id:     71c8b8a8-e63a-4f6f-a887-3bf9eb7c9448
    health: HEALTH_WARN
            clock skew detected on mon.pve3
            Degraded data redundancy: 111714/335142 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized
            297 slow ops, oldest one blocked for 994 sec, mon.pve3 has slow ops
 
  services:
    mon: 3 daemons, quorum pve1,pve2,pve3 (age 16m)
    mgr: pve2(active, since 43m), standbys: pve3, pve1
    osd: 6 osds: 4 up (since 71m), 4 in (since 61m)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 111.71k objects, 436 GiB
    usage:   868 GiB used, 6.4 TiB / 7.3 TiB avail
    pgs:     111714/335142 objects degraded (33.333%)
             33 active+undersized+degraded
 
  io:
    client:   16 KiB/s rd, 85 KiB/s wr, 0 op/s rd, 13 op/s wr

Code:
root@pve3:~# ping 10.10..100.201
ping: 10.10..100.201: Name or service not known
root@pve3:~# ping 10.10.100.201
PING 10.10.100.201 (10.10.100.201) 56(84) bytes of data.
64 bytes from 10.10.100.201: icmp_seq=1 ttl=64 time=0.157 ms
64 bytes from 10.10.100.201: icmp_seq=2 ttl=64 time=0.218 ms
^C
--- 10.10.100.201 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1048ms
rtt min/avg/max/mdev = 0.157/0.187/0.218/0.030 ms
root@pve3:~# ping 10.10.100.202
PING 10.10.100.202 (10.10.100.202) 56(84) bytes of data.
64 bytes from 10.10.100.202: icmp_seq=1 ttl=64 time=0.138 ms
64 bytes from 10.10.100.202: icmp_seq=2 ttl=64 time=0.198 ms
^C
--- 10.10.100.202 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1038ms
rtt min/avg/max/mdev = 0.138/0.168/0.198/0.030 ms
root@pve3:~# ping 10.10.100.203
PING 10.10.100.203 (10.10.100.203) 56(84) bytes of data.
64 bytes from 10.10.100.203: icmp_seq=1 ttl=64 time=0.027 ms
64 bytes from 10.10.100.203: icmp_seq=2 ttl=64 time=0.010 ms
^C

on node 3 the crush map is empty

2024-09-04 09_53_38-pve1 - Proxmox Virtual Environment – Mozilla Firefox.png

2024-09-04 09_51_46-pve1 - Proxmox Virtual Environment – Mozilla Firefox.png

how do i get this fixed ?
until i applied the updates, the system was rocksolid
 
some more input

Code:
root@pve3:~# systemctl restart ceph-mon@pve3.service
root@pve3:~# ceph osd lspools
1 .mgr
2 cephpool1
root@pve3:~# ceph health detail
HEALTH_WARN clock skew detected on mon.pve3; Degraded data redundancy: 111714/335142 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized
[WRN] MON_CLOCK_SKEW: clock skew detected on mon.pve3
    mon.pve3 clock skew 28.3758s > max 0.05s (latency 0.00161601s)
[WRN] PG_DEGRADED: Degraded data redundancy: 111714/335142 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized
    pg 1.0 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,5]
    pg 2.0 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,5]
    pg 2.1 is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,3]
    pg 2.2 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,5]
    pg 2.3 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,5]
    pg 2.4 is stuck undersized for 76m, current state active+undersized+degraded, last acting [4,2]
    pg 2.5 is stuck undersized for 76m, current state active+undersized+degraded, last acting [4,2]
    pg 2.6 is stuck undersized for 76m, current state active+undersized+degraded, last acting [4,3]
    pg 2.7 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,4]
    pg 2.8 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,5]
    pg 2.9 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,4]
    pg 2.a is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,3]
    pg 2.b is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,3]
    pg 2.c is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,2]
    pg 2.d is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,3]
    pg 2.e is stuck undersized for 76m, current state active+undersized+degraded, last acting [4,2]
    pg 2.f is stuck undersized for 76m, current state active+undersized+degraded, last acting [4,3]
    pg 2.10 is stuck undersized for 76m, current state active+undersized+degraded, last acting [4,2]
    pg 2.11 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,5]
    pg 2.12 is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,2]
    pg 2.13 is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,3]
    pg 2.14 is stuck undersized for 76m, current state active+undersized+degraded, last acting [2,4]
    pg 2.15 is stuck undersized for 76m, current state active+undersized+degraded, last acting [4,3]
    pg 2.16 is stuck undersized for 76m, current state active+undersized+degraded, last acting [2,5]
    pg 2.17 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,5]
    pg 2.18 is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,2]
    pg 2.19 is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,4]
    pg 2.1a is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,4]
    pg 2.1b is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,2]
    pg 2.1c is stuck undersized for 76m, current state active+undersized+degraded, last acting [4,2]
    pg 2.1d is stuck undersized for 76m, current state active+undersized+degraded, last acting [4,3]
    pg 2.1e is stuck undersized for 76m, current state active+undersized+degraded, last acting [5,3]
    pg 2.1f is stuck undersized for 76m, current state active+undersized+degraded, last acting [3,4]
root@pve3:~#

Code:
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Wed Jul 31 10:08:26 CEST 2024 on pts/0
root@pve1:~# ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME      STATUS  REWEIGHT  PRI-AFF
-1         10.91638  root default                           
-7          3.63879      host pve1                           
 4    ssd   1.81940          osd.4      up   1.00000  1.00000
 5    ssd   1.81940          osd.5      up   1.00000  1.00000
-5          3.63879      host pve2                           
 2    ssd   1.81940          osd.2      up   1.00000  1.00000
 3    ssd   1.81940          osd.3      up   1.00000  1.00000
-3          3.63879      host pve3                           
 0    ssd   1.81940          osd.0    down         0  1.00000
 1    ssd   1.81940          osd.1    down         0  1.00000
root@pve1:~#
 
Last edited:
ok i did pveceph purge
and everything went back to normal and the OSD's are rebuilding now

Code:
root@pve1:~# ceph -s
  cluster:
    id:     71c8b8a8-e63a-4f6f-a887-3bf9eb7c9448
    health: HEALTH_WARN
            1/3 mons down, quorum pve1,pve2
            Degraded data redundancy: 69878/335214 objects degraded (20.846%), 22 pgs degraded, 22 pgs undersized
            102413 slow ops, oldest one blocked for 84625 sec, mon.pve3 has slow ops
 
  services:
    mon: 3 daemons, quorum pve1,pve2 (age 4m), out of quorum: pve3
    mgr: pve2(active, since 31h), standbys: pve1
    osd: 6 osds: 6 up (since 3m), 6 in (since 3m); 31 remapped pgs
 
  data:
    pools:   2 pools, 33 pgs
    objects: 111.74k objects, 436 GiB
    usage:   1.0 TiB used, 9.9 TiB / 11 TiB avail
    pgs:     3.030% pgs not active
             69878/335214 objects degraded (20.846%)
             35141/335214 objects misplaced (10.483%)
             20 active+undersized+degraded+remapped+backfill_wait
             9  active+remapped+backfill_wait
             2  active+undersized+degraded+remapped+backfilling
             1  peering
             1  active+clean
 
  io:
    client:   26 KiB/s wr, 0 op/s rd, 5 op/s wr
    recovery: 331 MiB/s, 84 objects/s
 
root@pve1:~#
 
mon.pve3 clock skew 28.3758s > max 0.05s (latency 0.00161601s)
That is quite a bit off and maybe the reason why the OSDs on node 3 were down. Is NTP (chrony) working in, is chrony allowed to acces the (default) NTP servers?

i see that each node in this 3-node cluster has 2 OSDs. Before you build a production cluster, keep in mind, that a 3-node cluster is an edge case that needs careful consideration.
Given that you store 3 replicas by default, there will be one replica per node.

If a node fails completely, you have a situation like you had now, data integrity is reduced, but the PGs will still be "active" as there are 2 replicas available. Once the node is back/replaced/fixed, Ceph can get back to 3 replicas.

If only one OSD in a node fails, Ceph will try to recover the data. It can still do that and follow the failure domain of host (only one replica / host). The remaining OSDs in the same node will be used to recover the lost data.
With only 2 OSDs / node and one failing, the remaining OSD will get all the data. So unless you are well below 50% usage, the remaining OSD will run almost or completely full.

Therefore, I recommend to have at least 4 OSDs / node, so that the loss of one OSD will not as likely fill up the remaining OSDs on recovery.
If you have more than 3 nodes in the cluster, Ceph can also recover the data to other nodes, reducing the impact on the node that lost the OSD.
 
Hi Aaron,
NTP is working fine and the server is reachable for the nodes. There are other system is the same subent, using this time source.
this is a testsystem and we are still learning how "handle" pve since we are rolling out the system to our bulker fleet.
We just finshed MV Moonbeam today with the migration from ESX to pve !

zero issues ;-)

geting back what you said regarding Ceph and having only 2 OSD's per node raises now some light.

This system is a blueprint for a 3 node cluster (maybe 4 with 12 OSD's per node, 100Gbit LACP based network for Ceph to repace our current vmware system running on a Dell VRTX with 3 nodes

i'm really astounded about Ceph and his selfhealing

2024-09-05 16_55_33-pve1 - Proxmox Virtual Environment – Mozilla Firefox.png
 
This system is a blueprint for a 3 node cluster (maybe 4 with 12 OSD's per node, 100Gbit LACP based network for Ceph to repace our current vmware system running on a Dell VRTX with 3 nodes
Sounds good. Keep in mind that you should allocate at least 1 CPU core per Ceph service (OSD, MON, MGR, MDS). With 12 OSDs per server, that means 12 real cores just for them. Luckily getting CPUs with many cores is not a problem anymore. Just keep it in mind when you decide on the hardware as the Proxmox VE host will also need a core or two and then of course whatever you throw on the cluster regarding VM workload.

Smaller but more resources make it easier for Ceph to recover, but you have to weigh it against the additional CPU and memory they will consume and find a balance.

i'm really astounded about Ceph and his selfhealing
Hehe, yes, it can feel like magic watching how it recovers.

If you did ceph purge node 3, you probably want to clean up the MON on it. It will be mentioned in the `ceph.conf` file. In the global section remove it's IP in the mon_host line. And remove the full section for the mon itself: [mon.pve3]
Then remove it's local directory on pve3 and disable the systemd service:
Code:
rm -rf var/lib/ceph/mon/ceph-pve3
systemctl disable ceph-mon@pve3.service

After this, you should be able to recreate the MON.
 
  • Like
Reactions: MLOrion
i did that but in getting now this in the GUI

2024-09-05 21_36_17-Window.png

Code:
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 10.10.100.201/24
        fsid = 71c8b8a8-e63a-4f6f-a887-3bf9eb7c9448
        mon_allow_pool_delete = true
        mon_host = 10.10.100.201 10.10.100.202
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 10.10.100.201/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
        keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve1]
        host = pve1
        mds_standby_for_name = pve

[mds.pve2]
        host = pve2
        mds_standby_for_name = pve

[mds.pve3]
        host = pve3
        mds_standby_for_name = pve

[mon.pve1]
        public_addr = 10.10.100.201

[mon.pve2]
        public_addr = 10.10.100.202

Code:
root@pve3:~# ls var/lib/ceph/mon/ceph-pve3
ls: cannot access 'var/lib/ceph/mon/ceph-pve3': No such file or directory
root@pve3:~#

Code:
1/3 mons down, quorum pve1,pve2
    
    

mon.pve3 (rank 2) addr [v2:10.10.100.203:3300/0,v1:10.10.100.203:6789/0] is down (out of quorum)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!