NTP service does not start on some nodes

VictorSTS · May 26, 2021

Hello,

We usually setup plain old "ntp" service in our Proxmox nodes and sync time of the VM from them. The configuration is the default that comes with the package. Today I got a couple warnings regarding clock skew on 2 of 3 nodes of a cluster. Checking the time on them, I saw that ntp was not running, but it was up and running in the third node. This cluster runs on Proxmox 6.4-5.

Tried to start ntp as usual:

systemctl restart ntp

Service stayed up for like 20 seconds:

systemctl status ntp
● ntp.service - Network Time Service
   Loaded: loaded (/lib/systemd/system/ntp.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2021-05-26 17:55:42 CEST; 1s ago
     Docs: man:ntpd(8)
  Process: 8326 ExecStart=/usr/lib/ntp/ntp-systemd-wrapper (code=exited, status=0/SUCCESS)
 Main PID: 8339 (ntpd)
    Tasks: 2 (limit: 4915)
   Memory: 2.2M
   CGroup: /system.slice/ntp.service
           └─8339 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 110:117

Then it exited:

systemctl status ntp
● ntp.service - Network Time Service
   Loaded: loaded (/lib/systemd/system/ntp.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Wed 2021-05-26 17:55:56 CEST; 33s ago
     Docs: man:ntpd(8)
  Process: 8326 ExecStart=/usr/lib/ntp/ntp-systemd-wrapper (code=exited, status=0/SUCCESS)
 Main PID: 8339 (code=exited, status=0/SUCCESS)

While it was running, it adjusted the time on both servers and I could query ntp with "ntpq -p".

Checked /var/log/syslog and saw that for some reason systemd thinks that something is not ok and stops ntp (Proxmox node IPs removed from log):

Code:

May 26 17:59:35 maigmo02 systemd[1]: Starting Network Time Service...
May 26 17:59:35 maigmo02 ntpd[560155]: ntpd 4.2.8p12@1.3728-o (1): Starting
May 26 17:59:35 maigmo02 ntpd[560155]: Command line: /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 110:117
May 26 17:59:35 maigmo02 systemd[1]: Started Network Time Service.
May 26 17:59:35 maigmo02 ntpd[560168]: proto: precision = 0.050 usec (-24)
May 26 17:59:35 maigmo02 ntpd[560168]: leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): good hash signature
May 26 17:59:35 maigmo02 ntpd[560168]: leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): loaded, expire=2021-12-28T00:00:00Z last=2017-01-01T00:00:00Z ofs=37
May 26 17:59:35 maigmo02 ntpd[560168]: Listen and drop on 0 v6wildcard [::]:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen and drop on 1 v4wildcard 0.0.0.0:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 2 lo 127.0.0.1:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 3 vmbr0 :123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 4 vmbr110 :123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 5 vmbr111 :123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 6 vmbr31 :123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 7 vmbr112 :123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 8 vmbr113 :123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 9 lo [::1]:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 10 vmbr0 []:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 11 vmbr1 []:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 12 vmbr110 []:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 13 vmbr111 []:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 14 vmbr31 []:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 15 vmbr112 []:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listen normally on 16 vmbr113 []:123
May 26 17:59:35 maigmo02 ntpd[560168]: Listening on routing socket on fd #33 for interface updates
May 26 17:59:35 maigmo02 ntpd[560168]: kernel reports TIME_ERROR: 0x6041: Clock Unsynchronized
May 26 17:59:35 maigmo02 ntpd[560168]: kernel reports TIME_ERROR: 0x6041: Clock Unsynchronized
May 26 17:59:35 maigmo02 kernel: [1277187.992065] audit: type=1400 audit(1622044775.617:23): apparmor="DENIED" operation="connect" profile="/usr/sbin/ntpd" name="/run/dbus/system_bus_socket" pid=560168 comm="ntpd" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
May 26 17:59:36 maigmo02 ntpd[560168]: Soliciting pool server 146.59.16.42
May 26 17:59:37 maigmo02 ntpd[560168]: Soliciting pool server 5.196.160.139
May 26 17:59:37 maigmo02 ntpd[560168]: Soliciting pool server 51.77.221.70
May 26 17:59:38 maigmo02 ntpd[560168]: Soliciting pool server 91.224.149.41
May 26 17:59:38 maigmo02 ntpd[560168]: Soliciting pool server 62.210.136.184
May 26 17:59:38 maigmo02 ntpd[560168]: Soliciting pool server 62.210.244.146
May 26 17:59:39 maigmo02 ntpd[560168]: Soliciting pool server 151.80.211.8
May 26 17:59:39 maigmo02 ntpd[560168]: Soliciting pool server 212.83.158.83
May 26 17:59:39 maigmo02 ntpd[560168]: Soliciting pool server 51.68.44.27
May 26 17:59:39 maigmo02 ntpd[560168]: Soliciting pool server 95.81.173.8
May 26 17:59:40 maigmo02 ntpd[560168]: Soliciting pool server 94.23.21.189
May 26 17:59:40 maigmo02 ntpd[560168]: Soliciting pool server 37.187.122.11
May 26 17:59:40 maigmo02 ntpd[560168]: Soliciting pool server 194.177.34.116
May 26 17:59:41 maigmo02 ntpd[560168]: Soliciting pool server 51.15.157.154
May 26 17:59:41 maigmo02 ntpd[560168]: Soliciting pool server 162.159.200.1
May 26 17:59:42 maigmo02 ntpd[560168]: Soliciting pool server 2a01:e34:ec2a:94a0::2
May 26 18:00:30 maigmo02 dbus-daemon[2018]: [system] Activating via systemd: service name='org.freedesktop.timesync1' unit='dbus-org.freedesktop.timesync1.service' requested by ':1.145936' (uid=0 pid=562385 comm="timedatectl timesync-status " label="unconfined")
May 26 18:00:30 maigmo02 ntpd[560168]: ntpd exiting on signal 15 (Terminated)
May 26 18:00:30 maigmo02 ntpd[560168]: 146.59.16.42 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 systemd[1]: Stopping Network Time Service...
May 26 18:00:30 maigmo02 ntpd[560168]: 5.196.160.139 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 systemd[1]: Condition check resulted in Network Time Synchronization being skipped.
May 26 18:00:30 maigmo02 ntpd[560168]: 51.77.221.70 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 62.210.136.184 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 62.210.244.146 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 91.224.149.41 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 51.68.44.27 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 151.80.211.8 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 212.83.158.83 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 95.81.173.8 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 systemd[1]: ntp.service: Succeeded.
May 26 18:00:30 maigmo02 ntpd[560168]: 37.187.122.11 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 94.23.21.189 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 194.177.34.116 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 51.15.157.154 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 ntpd[560168]: 162.159.200.1 local addr 51.178.75.34 -> <null>
May 26 18:00:30 maigmo02 systemd[1]: Stopped Network Time Service.

Both the "kernel" and "dbus-daemon" messages can be ignored as the show in the third node and in at least 3 other clusters I've checked, were ntp runs properly.

Then, I checked the systemd file for the NTP service (vi /lib/systemd/system/ntp.service). it was the same in all nodes (as expected). That file specifies a "Conflicts=" directive with systemd-timesyncd.service, as expected, too.

I wanted to be sure if the problem was ntp itself or systemd, so I started ntp using the wrapper script configured in the systemd file /usr/lib/ntp/ntp-systemd-wrapper. It did run correctly and did not stop. So the problem lies somewhere in systemd

As I need ntp to be running, I've end up disabling systemd-timesyncd service and then I can start ntp using systemd properly.

Why is systemd thinking that ntp isn't starting properly ("Condition check resulted in Network Time Synchronization being skipped")?
Why this happens only on two of the three servers of this cluster?

Thanks

t.lamprecht · May 26, 2021

Hi,

VictorSTS said:
May 26 18:00:30 maigmo02 dbus-daemon[2018]: [system] Activating via systemd: service name='org.freedesktop.timesync1' unit='dbus-org.freedesktop.timesync1.service' requested by ':1.145936' (uid=0 pid=562385 comm="timedatectl timesync-status " label="unconfined")

That timedatectl timesync-status command is pretty dangerous, it triggers a manual start through dbus which systemd tries to fulfill with priority and thus stop the existing ntp service, as it conflicts with the "requested" action.

The log line looked like it but was IMO to weird to be the cause, so I just tried here with a cluster with chrony setup, and it stopped it too - the "best" part, as the systemd-timesyncd ships a much less flexible condition script:

Code:

# /usr/lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf
[Unit]
# don't run timesyncd if we have another NTP daemon installed
ConditionFileIsExecutable=!/usr/sbin/ntpd
ConditionFileIsExecutable=!/usr/sbin/openntpd
ConditionFileIsExecutable=!/usr/sbin/chronyd
ConditionFileIsExecutable=!/usr/sbin/VBoxService

Which makes it fail to start (as in your case ntpd and in my case chronyd exists and is executable), effectively leaving one with no NTP daemon at all

Code:

May 26 20:02:29 prod1 dbus-daemon[1869]: [system] Activating via systemd: service name='org.freedesktop.timesync1' unit='dbus-org.freedesktop.timesync1.service' requested by ':1.3068' (uid=0 pid=2883170 comm="timedatectl timesync-status " label="unconfined")
May 26 20:02:29 prod1 chronyd[2876097]: chronyd exiting
May 26 20:02:29 prod1 systemd[1]: Stopping chrony, an NTP client/server...
May 26 20:02:29 prod1 systemd[1]: Condition check resulted in Network Time Synchronization being skipped.

I mean, the docs (man timedatectl) hint that those command is for systemd-timesyncd only:

systemd-timesyncd Commands
The following commands are specific to systemd-timesyncd.service(8).

timesync-status
Show current status of systemd-timesyncd.service(8). If --monitor is specified, then this will monitor
the status updates.

But still IMO very dangerous behaviour to trigger such a autostart when most users probably are just executing it to get the status of their actually used NTP daemon.

So, as workaround for now ban that command from being executing anywhere near your server.
Did really not expect that to happen...

VictorSTS · May 27, 2021

But I did not use explicitly the command timedatectl timesync-status. It is triggered somehow by systemd when restarting ntp via systemctl restart ntp and it shows in /var/log/messages.

In fact, in server 3 of this cluster ntp is working as expected until I issue timedatectl timesync-status, as it simply outputs:

Code:

Failed to query server: Connection timed out

Obviously because systemd-timesync stops ntp

. At least in this node I can start ntp again in this node without having to disable systemd-timesyncd service as in the other two.

What really puzzles me is that this seems to happend only on these two servers and not in the third node, which use the same package verions except a few related to IPMI as the this third node does not have such feature.

Search

Search

NTP service does not start on some nodes

VictorSTS

Famous Member

t.lamprecht

Proxmox Staff Member

VictorSTS

Famous Member