[SOLVED] ceph clock skew issue - no way out?

Knuuut · Sep 13, 2018

Hi,

there are plenty of posts about clock skew issues within this forum. I'm affected too.

So, I've tried different actions to get 4 Nodes with identical hardware permanently in sync with no success.

Even this post https://forum.proxmox.com/threads/proxmoxve-ceph-clock-issue.20684/#post-105441 made things worse, so I switched back to timedatectl, where the count of clock skew incidents is less than with ntpd.

With ntpd, I've seen a high jitter value (>200ms) on every node.

With timedatectl, there is no way to get the jitter value afaik.

Maybe a change of the Linux clocksource from tsc to hpet would be a solution?

Any help would be appreciated.

Cheers Knuuut

Alwin · Sep 13, 2018

Use a time source that is on the local ceph network and on hardware (not virtual).

Knuuut · Sep 13, 2018

I'm using the local ntp servers from my Datacenter-Provider.

Again, with the same ntp servers and with ntpd on nodes, things got worse and I can't reproduce this behavior ond other hardware.

So, my guess is an unstable clocksource (tsc) on all 4 nodes...?

Does anybody has experiences about switching the clocksource from tsc to hpet?

Cheers Knuuut

RobFantini · Sep 14, 2018

we install ntp on each node .
then edit /etc/ntp.conf to use the router to internet as ntp server [ the router is pfsense which is running on hardware ].

judexzhu · Sep 14, 2018

Try this.

Code:

echo "NTP=10.1.1.11 10.1.1.12 10.1.1.13" >> /etc/systemd/timesyncd.conf

timedatectl set-ntp true
systemctl restart systemd-timesyncd

systemctl status systemd-timesyncd

date

hwclock -w

Replace "10.1.1.11 10.1.1.12 10.1.1.13" with your own ntp server ip addresses

Knuuut · Sep 14, 2018

judexzhu said:
Try this.

Code:

echo "Servers=10.1.1.11 10.1.1.12 10.1.1.13" >> /etc/systemd/timesyncd.conf timedatectl set-ntp true systemctl restart systemd-timesyncd systemctl status systemd-timesyncd date hwclock -w

Replace "10.1.1.11 10.1.1.12 10.1.1.13" with your own ntp server ip addresses

I think you mean "NTP=" instead of "Servers="

Anyway, thats like my current configuration.

RobFantini said:
we install ntp on each node .
then edit /etc/ntp.conf to use the router to internet as ntp server [ the router is pfsense which is running on hardware ].

I've already tried this with the ntp servers inside my Datacenter, also with the debian pool servers. This configaration never got "health ok".

RobFantini · Sep 14, 2018

I think that systemd ntp is buggy, or there are world wide operator errors.

search around.

RobFantini · Sep 14, 2018

also ntp just works , like dns and dhcp. if issues after 'apt install ntp' and configure - look at network

Knuuut · Sep 14, 2018

As I wrote before, I can't reproduce this on other (older) hardware.

So my focus is on the current (new) hardware:

Intel S2600STB Mainboards with dual Xenons
Intel X520-DA2 dual 10Gb SFP+ nics
LSI 9341-4i

No exotic components at all

Any ideas anybody?

mcdowellster · Sep 14, 2018

I had the same problem on my cluster for a few hours. My firewall was blocking 123 as I place all my servers in a management subnet with minimal access to the internet. Honestly I used the default time server, updated the time and the issue was resolved.

Knuuut · Sep 14, 2018

There is no firewall issue, because ntpq -pn (in case of running ntpd) and also systemctl status systemd-timesyncd.service is giving me positive output.

Knuuut · Sep 17, 2018

Finally, I solved this issue by myself.

What I did:

Set

Code:

NTP=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org

in /etc/systemd/timesync.conf on every node.

But important was this:

Code:

hwclock -w

several times on every node.

No more clock skew issues since Friday.

lucaferr · Sep 17, 2018

I had the same issues and solved using "chrony". My 5 nodes production cluster running Proxmox VE 5.2 and Ceph 12.2 has run for 9 months until now with no more "clock skew" problems since I installed chrony. Please see: https://forum.proxmox.com/threads/proxmox-ceph-issues-with-ntp.37919/#post-197460

Search

Search

[SOLVED] ceph clock skew issue - no way out?

Knuuut

Member

Alwin

Proxmox Retired Staff

Knuuut

Member

RobFantini

Famous Member

judexzhu

Member

Knuuut

Member

RobFantini

Famous Member

RobFantini

Famous Member

Knuuut

Member

mcdowellster

Well-Known Member

Knuuut

Member

Knuuut

Member

lucaferr

Renowned Member

We value your privacy