[SOLVED] ceph clock skew issue - no way out?

Knuuut

Member
Jun 7, 2018
91
9
8
59
Hi,

there are plenty of posts about clock skew issues within this forum. I'm affected too.

So, I've tried different actions to get 4 Nodes with identical hardware permanently in sync with no success.

Even this post https://forum.proxmox.com/threads/proxmoxve-ceph-clock-issue.20684/#post-105441 made things worse, so I switched back to timedatectl, where the count of clock skew incidents is less than with ntpd.

With ntpd, I've seen a high jitter value (>200ms) on every node.

With timedatectl, there is no way to get the jitter value afaik.

Maybe a change of the Linux clocksource from tsc to hpet would be a solution?

Any help would be appreciated.

Cheers Knuuut
 
Last edited:
Use a time source that is on the local ceph network and on hardware (not virtual).
 
I'm using the local ntp servers from my Datacenter-Provider.

Again, with the same ntp servers and with ntpd on nodes, things got worse and I can't reproduce this behavior ond other hardware.

So, my guess is an unstable clocksource (tsc) on all 4 nodes...?

Does anybody has experiences about switching the clocksource from tsc to hpet?

Cheers Knuuut
 
we install ntp on each node .
then edit /etc/ntp.conf to use the router to internet as ntp server [ the router is pfsense which is running on hardware ].
 
  • Like
Reactions: AlexLup
Try this.

Code:
echo "NTP=10.1.1.11 10.1.1.12 10.1.1.13" >> /etc/systemd/timesyncd.conf

timedatectl set-ntp true
systemctl restart systemd-timesyncd

systemctl status systemd-timesyncd

date

hwclock -w

Replace "10.1.1.11 10.1.1.12 10.1.1.13" with your own ntp server ip addresses
 
Last edited:
Try this.

Code:
echo "Servers=10.1.1.11 10.1.1.12 10.1.1.13" >> /etc/systemd/timesyncd.conf

timedatectl set-ntp true
systemctl restart systemd-timesyncd

systemctl status systemd-timesyncd

date

hwclock -w
Replace "10.1.1.11 10.1.1.12 10.1.1.13" with your own ntp server ip addresses

I think you mean "NTP=" instead of "Servers="

Anyway, thats like my current configuration.

we install ntp on each node .
then edit /etc/ntp.conf to use the router to internet as ntp server [ the router is pfsense which is running on hardware ].

I've already tried this with the ntp servers inside my Datacenter, also with the debian pool servers. This configaration never got "health ok".
 
As I wrote before, I can't reproduce this on other (older) hardware.

So my focus is on the current (new) hardware:

Intel S2600STB Mainboards with dual Xenons
Intel X520-DA2 dual 10Gb SFP+ nics
LSI 9341-4i

No exotic components at all

Any ideas anybody?
 
I had the same problem on my cluster for a few hours. My firewall was blocking 123 as I place all my servers in a management subnet with minimal access to the internet. Honestly I used the default time server, updated the time and the issue was resolved.
 
There is no firewall issue, because ntpq -pn (in case of running ntpd) and also systemctl status systemd-timesyncd.service is giving me positive output.
 
Finally, I solved this issue by myself.

What I did:

Set
Code:
NTP=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org
in /etc/systemd/timesync.conf on every node.

But important was this:
Code:
hwclock -w
several times on every node.

No more clock skew issues since Friday.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!