Proxmox Ceph issues with NTP

brucexx · Nov 6, 2017

On PVE 5.1 with ceph does the ntp still persists and we need to switch to the NTP server and disable what I think was ntpd taht came with system to prevent clock skew from happening ?

Thank you

Alwin · Nov 7, 2017

Since PVE4.x the systemd-timesyncd is used and as default ntp servers (/etc/systemd/timesyncd.conf) the servers from ntp.org are set. But if you want to use ntpd, you still can.

brucexx · Nov 7, 2017

Right, the systemd-time is used by default and is not working right at least on 4.x with clock skew I had to disable it and use ntpd instead and it's been working with no issues for last year or so. Many users complained about it, it also came out as an issue while live-migrating - kind of surprised by your answer. Let me dig more for threads...

brucexx · Nov 7, 2017

That's what I was referring to: https://forum.proxmox.com/threads/proxmoxve-ceph-clock-issue.20684/#post-153248

I followed the instruction and since then no issues and at the time there were many other users claiming that systemd is no go for time yet.

Had the same issue as described there. Maybe I was missing something with systemd ? Are you guys (Alwin form Proxmox) running a ceph cluster in live environment (not for testing) using systemd time with no issues ?

Thank you

Alwin · Nov 8, 2017

Yes, from personal experience, I even didn't have a issue on PVE4.x, mostly because I had a local NTP server running, where all my system got its time from and as timesyncd is only taking to one ntp server (SNTP) at a time (takes the first one that responds), it can lead to different time stamps. Alone for loss of internet local time servers are a good idea.

brucexx · Nov 8, 2017

I am not sure what you mean, could you elaborate. What you are saying is that when using timesynd and a local NTP server you had no issues with clock skew on ceph - is that right ?

We had two local NTP server (in case one server dies, one of them was standalone and not a VM) and with Systemd time sync I was getting clock skew, once we switched to ntpd the issue went away. We had cluster of 3 proxmox servers for running VMs and a separate cluster with 3 nodes running ceph on top of proxmox with 5TB of storage spread among 18 hard drives and 60 to 70 Virtual machines running using this solution.

Alwin · Nov 8, 2017

brucexx said:
I am not sure what you mean, could you elaborate. What you are saying is that when using timesynd and a local NTP server you had no issues with clock skew on ceph - is that right ?

Yes, besides rebooting the server and it disappeared after a minute or two.

NTPd queries 3 servers to get the accurate time, this may be the difference, why you get clock skew with timesyncd. Also there was/is (don't recall if resolved) a issue where it seems, that timesyncd doesn't keep track of time as good as ntpd. Only judging from my experience, that I didn't have issues, but this also be only in my case.

brucexx · Nov 8, 2017

ok thank you - we will stick to ntpd for now - seems safer...

RobFantini · Nov 9, 2017

do you all just use the debian package default /etc/ntp.conf - NTP server configuration file ?

brucexx · Nov 9, 2017

I am not sure about others but i did.

Alwin · Nov 9, 2017

RobFantini said:
do you all just use the debian package default /etc/ntp.conf - NTP server configuration file ?

Well, if I use ntpd, then I at least set time server closer to the location of my ntpd server.

lucaferr · Jan 30, 2018

I just built a brand new 3-nodes Proxmox 5.1 + Ceph cluster and had severe clock skew problems. timesyncd was not precise enough. Even ntp failed. After a lot of research and testing, I installed chrony and everything is finally stable! Here you are the steps in Proxmox 5.1 to reliably disable timesyncd and replace it with chrony:

timedatectl set-ntp false
systemctl stop systemd-timesyncd
systemctl stop systemd-timedated
systemctl disable systemd-timesyncd
systemctl disable systemd-timedated
apt-get install chrony
cp /lib/systemd/system/pve-cluster.service /etc/systemd/system/pve-cluster.service
Edit /etc/systemd/system/pve-cluster.service and replace "Wants=systemd-timesyncd.service" with "Wants=chrony.service"
Reboot node
Repeat 1 by 1 for each node
Ceph is finally "HEALTH_OK", no more clock skews!

Alwin · Jan 30, 2018

@lucaferr, timesyncd and ntp failed and crony worked. So, you configured crony differently from timesyncd & ntp?

As for the first two, those services use a pool of server, where the get their time from. Also NTP uses three different sources to calculate a median time to use. On a different host ntp can use different time sources to sync. This makes clock skew more likely.

For all cluster setups, ceph, pve, or whatever else, it is recommended to use a local time source (hardware) and all servers get their time from this source. The local ntp server then can use a pool of servers to get its time from.

lucaferr · Jan 30, 2018

@Alwin I didn't configure chrony at all, leaving all as default. It uses "2.debian.pool.ntp.org" (I see that the sort algorithm works perfectly, since it automatically picks Italian NTP sources (my server infrastructure is in Italy)).
With NTPD I had weird results, with disalignments of several seconds (even 10 seconds!) between the nodes...even configuring a single NTP source, synchronization failed...very strange, never seen before...and after several hours debugging and trying different configurations with no success I fixed using chrony with its default config...

aderumier · Jan 30, 2018

I'm using chrony in production, it's faster to sync clock than ntpd, openntpd. (and timesyncd is only like an cron ntpdate, really not enough precision for ceph)

Alwin · Jan 30, 2018

My point was going, to the fact that you need one time source that is close to your cluster and let all servers sync from it. Locality is important, as you both stated, ceph needs a precise time on all its servers. From my experience, I had no issues with timesyncd or ntpd, but that said, I always had my time server close to the ceph &/ pve clusters.

lucaferr · Jan 30, 2018

I get your point and it does make sense. But I tried to synchronize 3 different nodes with a single NTP source a few hundreds kilometers away from the nodes and they got a time difference of 10 seconds among them: this is impossible, so probably my system had some sort of conflict with ntpd I guess...this happened both with default ntpd config (and debian NTP pools) and with custom config (single NTP source close to servers). In all cases every source was then marked as "rejected" by ntpd. Please note that timesyncd had been disabled so could not be the cause of the problems.
I'm sure that ntpd works perfectly fine in thousands of servers...I just wanted to tell to anyone having severe clock skews like I did that before banging your head against ntpd for hours, there is also an excellent alternative called chrony, which I didn't know about before today ;-)

brucexx · Jan 30, 2018

See this post: https://forum.proxmox.com/threads/proxmoxve-ceph-clock-issue.20684/#post-105441 go to 6th post from the top from stevendemetrius. This is the instruction that I followed and had a caph cluster running for 2 years with no clock skew issues.

I have two local NTP servers that I point each node to.

Hope this helps somebody

- it did help me a lot.

brucexx · Jan 30, 2018

lucaferr wrote: I tried to synchronize 3 different nodes with a single NTP source a few hundreds kilometers away from the nodes...

You need to use local NTP servers for synchronization, I would not recommend using any outside/public servers.

Frederico Siena · Mar 5, 2020

My Solution

-- /etc/systemd/timesyncd.conf --
[Time]
NTP=LOCAL_NTP_IP NTP1.com NTP2.com NTP3.com
FallbackNTP=PROXMOX_HOST1 PROXMOX_HOST2 PROXMOX_HOST3 PROXMOX_HOST[N]
RootDistanceMaxSec=5
PollIntervalMinSec=32
PollIntervalMaxSec=2048

# timedatectl set-ntp true
# systemctl restart systemd-timesyncd.service systemd-timedated.service
# systemctl restart ceph-mon.target
# hwclock -w
# timedatectl status
# journalctl --since -1h -u systemd-timesyncd
# ceph mon sync force --yes-i-really-mean-it --i-know-what-i-am-doing
# ceph healt status

Proxmox Ceph issues with NTP

Renowned Member

Proxmox Retired Staff

Renowned Member

Renowned Member

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

Renowned Member

Famous Member

Renowned Member

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

Renowned Member

Renowned Member

Proxmox Retired Staff

Renowned Member

Renowned Member

Renowned Member

Active Member