[SOLVED] ceph clock skew issue - no way out?

Knuuut · Sep 13, 2018

Hi,

there are plenty of posts about clock skew issues within this forum. I'm affected too.

So, I've tried different actions to get 4 Nodes with identical hardware permanently in sync with no success.

Even this post https://forum.proxmox.com/threads/proxmoxve-ceph-clock-issue.20684/#post-105441 made things worse, so I switched back to timedatectl, where the count of clock skew incidents is less than with ntpd.

With ntpd, I've seen a high jitter value (>200ms) on every node.

With timedatectl, there is no way to get the jitter value afaik.

Maybe a change of the Linux clocksource from tsc to hpet would be a solution?

Any help would be appreciated.

Cheers Knuuut

Alwin · Sep 13, 2018

Use a time source that is on the local ceph network and on hardware (not virtual).

Knuuut · Sep 13, 2018

I'm using the local ntp servers from my Datacenter-Provider.

Again, with the same ntp servers and with ntpd on nodes, things got worse and I can't reproduce this behavior ond other hardware.

So, my guess is an unstable clocksource (tsc) on all 4 nodes...?

Does anybody has experiences about switching the clocksource from tsc to hpet?

Cheers Knuuut

RobFantini · Sep 14, 2018

we install ntp on each node .
then edit /etc/ntp.conf to use the router to internet as ntp server [ the router is pfsense which is running on hardware ].

judexzhu · Sep 14, 2018

Try this.

Code:

echo "NTP=10.1.1.11 10.1.1.12 10.1.1.13" >> /etc/systemd/timesyncd.conf

timedatectl set-ntp true
systemctl restart systemd-timesyncd

systemctl status systemd-timesyncd

date

hwclock -w

Replace "10.1.1.11 10.1.1.12 10.1.1.13" with your own ntp server ip addresses

Knuuut · Sep 14, 2018

judexzhu said:
Try this.

Code:

echo "Servers=10.1.1.11 10.1.1.12 10.1.1.13" >> /etc/systemd/timesyncd.conf timedatectl set-ntp true systemctl restart systemd-timesyncd systemctl status systemd-timesyncd date hwclock -w

Replace "10.1.1.11 10.1.1.12 10.1.1.13" with your own ntp server ip addresses

I think you mean "NTP=" instead of "Servers="

Anyway, thats like my current configuration.

RobFantini said:
we install ntp on each node .
then edit /etc/ntp.conf to use the router to internet as ntp server [ the router is pfsense which is running on hardware ].

I've already tried this with the ntp servers inside my Datacenter, also with the debian pool servers. This configaration never got "health ok".

RobFantini · Sep 14, 2018

I think that systemd ntp is buggy, or there are world wide operator errors.

search around.

RobFantini · Sep 14, 2018

also ntp just works , like dns and dhcp. if issues after 'apt install ntp' and configure - look at network

Knuuut · Sep 14, 2018

As I wrote before, I can't reproduce this on other (older) hardware.

So my focus is on the current (new) hardware:

Intel S2600STB Mainboards with dual Xenons
Intel X520-DA2 dual 10Gb SFP+ nics
LSI 9341-4i

No exotic components at all

Any ideas anybody?

mcdowellster · Sep 14, 2018

I had the same problem on my cluster for a few hours. My firewall was blocking 123 as I place all my servers in a management subnet with minimal access to the internet. Honestly I used the default time server, updated the time and the issue was resolved.

Knuuut · Sep 14, 2018

There is no firewall issue, because ntpq -pn (in case of running ntpd) and also systemctl status systemd-timesyncd.service is giving me positive output.

Knuuut · Sep 17, 2018

Finally, I solved this issue by myself.

What I did:

Set

Code:

NTP=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org

in /etc/systemd/timesync.conf on every node.

But important was this:

Code:

hwclock -w

several times on every node.

No more clock skew issues since Friday.

lucaferr · Sep 17, 2018

I had the same issues and solved using "chrony". My 5 nodes production cluster running Proxmox VE 5.2 and Ceph 12.2 has run for 9 months until now with no more "clock skew" problems since I installed chrony. Please see: https://forum.proxmox.com/threads/proxmox-ceph-issues-with-ntp.37919/#post-197460

Search

Search

[SOLVED] ceph clock skew issue - no way out?

Knuuut

Member

Alwin

Proxmox Retired Staff

Knuuut

Member

RobFantini

Famous Member

judexzhu

Member

Knuuut

Member

RobFantini

Famous Member

RobFantini

Famous Member

Knuuut

Member

mcdowellster

Active Member

Knuuut

Member

Knuuut

Member

lucaferr

Renowned Member