PVE 4.1, systemd-timesyncd and CEPH (clock skew)

wosp

Renowned Member
Apr 18, 2015
203
23
83
38
The Netherlands
On a 3 node cluster we have a issue since we use PVE 4.1-22 (did a clean install, not upgraded). The PVE cluster is also the CEPH storage cluster (same nodes). Systemd-timesyncd seems to be less stable/accurate then NTP, therefor the CEPH cluster sees "clock skews" every couple of hours (we noticed this because we run a script every minute and if health is not "HEALTH_OK", the script send us an e-mail). So I installed NTP on all nodes and run:

# timedatectl set-ntp false
# systemctl stop systemd-timesyncd
# systemctl stop systemd-timedated
# systemctl disable systemd-timesyncd
# systemctl disable systemd-timedated

This works excellent as long as systemd-timesyncd is not running and the node isn't rebooted. When the node is rebooted, despite the "systemctl disable systemd-timesyncd", systemd-timesyncd is started and we've got "clock skews" again after some hours (until we stop systemd-timesyncd manually and let NTP do his job).

"systemctl status systemd-timesyncd" after reboot:

● systemd-timesyncd.service - Network Time Synchronization
Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; disabled)
Active: active (running) since Fri 2016-04-22 11:37:13 CEST; 1min 55s ago
Docs: man:systemd-timesyncd.service(8)
Main PID: 767 (systemd-timesyn)
Status: "Using Time Server 146.185.139.19:123 (2.debian.pool.ntp.org)."
CGroup: /system.slice/systemd-timesyncd.service
└─767 /lib/systemd/systemd-timesyncd

"systemctl status systemd-timesyncd" after I stopped systemd-timesyncd manually:

● systemd-timesyncd.service - Network Time Synchronization
Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; disabled)
Active: inactive (dead) since Fri 2016-04-22 11:40:35 CEST; 6s ago
Docs: man:systemd-timesyncd.service(8)
Process: 767 ExecStart=/lib/systemd/systemd-timesyncd (code=exited, status=0/SUCCESS)
Main PID: 767 (code=exited, status=0/SUCCESS)
Status: "Idle."

Any ideas? Is systemd-timesyncd depended/started by a PVE daemon?
 
Last edited:
Yes, systemd-timesyncd is "Want"ed by pve-cluster.service, since pve-cluster.service is also "After" systemd-timesyncd.service this means that if possible, systemd-timesyncd will be started before the cluster file system is started. You can override this locally if you are using another ntp daemon to synchronize the time, by copying /lib/systemd/system/pve-cluster.service to /etc/lib/systemd/system/pve-cluster.service and removing the "Wants=systemd-timesyncd.service" line. You need to do "systemctl daemon-reload" afterwards to reload the units.

Warning: you might miss future changes to the pve-cluster service file provided by proxmox, so be sure to check with "systemd-delta -t overridden" after updates and redo the above steps if the diff shows more than the single removed "Wants:" line:
Code:
# systemd-delta -t overridden
[OVERRIDDEN] /etc/systemd/system/pve-cluster.service → /lib/systemd/system/pve-cluster.service

--- /lib/systemd/system/pve-cluster.service     2016-01-07 11:04:50.000000000 +0100
+++ /etc/systemd/system/pve-cluster.service     2016-04-22 12:16:16.199232820 +0200
@@ -2,7 +2,6 @@
Description=The Proxmox VE cluster filesystem
ConditionFileIsExecutable=/usr/bin/pmxcfs
Wants=corosync.service
-Wants=systemd-timesyncd.service
Wants=rrdcached.service
Before=corosync.service
Before=ceph.service


1 overridden configuration files found.
 
  • Like
Reactions: wosp
Thanks! Just for the record /etc/lib/systemd/system/pve-cluster.service = /etc/systemd/system/pve-cluster.service :)

What I did was the following:

# cp /lib/systemd/system/pve-cluster.service /etc/systemd/system/pve-cluster.service
# nano /etc/systemd/system/pve-cluster.service

Changed:

Wants=systemd-timesyncd.service

To:

Wants=ntp.service

# systemctl daemon-reload

Now when I run "systemd-delta -t overridden" (I will run after every update) I see:

Code:
[OVERRIDDEN] /etc/systemd/system/pve-cluster.service → /lib/systemd/system/pve-cluster.service

--- /lib/systemd/system/pve-cluster.service  2015-12-04 13:20:40.000000000 +0100
+++ /etc/systemd/system/pve-cluster.service  2016-04-22 12:58:48.553393521 +0200
@@ -2,7 +2,7 @@
 Description=The Proxmox VE cluster filesystem
 ConditionFileIsExecutable=/usr/bin/pmxcfs
 Wants=corosync.service
-Wants=systemd-timesyncd.service
+Wants=ntp.service
 Wants=rrdcached.service
 Before=corosync.service
 Before=ceph.service


1 overridden configuration files found.

And after reboot it worked. :) Thanks again!
 
I don't think the systemd timesync thing is very accurate either. When I live migrate a running VM of FreeBSD, it complains that the clock went backwards. Sometimes this causes processes to crash.
 
for pve-cluster.service unit,
Wants=systemd-timesyncd.service

It could be replaced by

Wants=time-sync.target


This allow to use any timesync service unit (ntd, systemd-timesyncd,...), which is part of time-sync.target


Ceph will add this soon in his unit files

http://tracker.ceph.com/issues/15419
 
It seems like it still not being given the attention it deserves. Been a few months since they commented on the bug. =(
 
  • Like
Reactions: wosp
After a while with the cluster up and running, the start to appear in the syslog messages listed below and the cluster stops functioning. How can I fix this ??

Code:
proxmox01 systemd-timesyncd[642]: Using NTP server 200.189.40.8:123 (2.debian.pool.ntp.org).
Oct 01 02:14:19 proxmox01 systemd-timesyncd[642]: Timed out waiting for reply from 200.189.40.8:123 (2.debian.pool.ntp.org).
Oct 01 02:14:19 proxmox01 systemd-timesyncd[642]: Using NTP server [2600:3c02::13:221]:123 (2.debian.pool.ntp.org).
Oct 01 02:14:19 proxmox01 systemd-timesyncd[642]: Using NTP server [2a01:4f8:162:51e2::2]:123 (2.debian.pool.ntp.org).
Oct 01 02:14:19 proxmox01 systemd-timesyncd[642]: Using NTP server [2001:12ff:0:7::193]:123 (2.debian.pool.ntp.org).
Oct 01 02:14:19 proxmox01 systemd-timesyncd[642]: Using NTP server [2001:440:1880:5555::2]:123 (2.debian.pool.ntp.org).
Oct 01 02:14:19 proxmox01 systemd-timesyncd[642]: Using NTP server 200.160.0.8:123 (3.debian.pool.ntp.org).
Oct 01 02:14:29 proxmox01 systemd-timesyncd[642]: Timed out waiting for reply from 200.160.0.8:123 (3.debian.pool.ntp.org).
Oct 01 02:14:29 proxmox01 systemd-timesyncd[642]: Using NTP server 200.192.232.8:123 (3.debian.pool.ntp.org).
Oct 01 02:14:39 proxmox01 systemd-timesyncd[642]: Timed out waiting for reply from 200.192.232.8:123 (3.debian.pool.ntp.org).

thanks!!!!
 
This problem doesn't seems to be related to the topic. Looks like you just have no connection to 2 NTP servers anymore. Both NTP servers are from the same company according to Whois, so probably a problem at their side, the transit between you and them or your internet connection.
 
  • Like
Reactions: rafafell
This problem doesn't seems to be related to the topic. Looks like you just have no connection to 2 NTP servers anymore. Both NTP servers are from the same company according to Whois, so probably a problem at their side, the transit between you and them or your internet connection.
thanks