Proxmox Ceph issues with NTP

brucexx

Renowned Member
Mar 19, 2015
229
9
83
On PVE 5.1 with ceph does the ntp still persists and we need to switch to the NTP server and disable what I think was ntpd taht came with system to prevent clock skew from happening ?

Thank you
 
Since PVE4.x the systemd-timesyncd is used and as default ntp servers (/etc/systemd/timesyncd.conf) the servers from ntp.org are set. But if you want to use ntpd, you still can.
 
  • Like
Reactions: bizzarrone
Right, the systemd-time is used by default and is not working right at least on 4.x with clock skew I had to disable it and use ntpd instead and it's been working with no issues for last year or so. Many users complained about it, it also came out as an issue while live-migrating - kind of surprised by your answer. Let me dig more for threads...
 
That's what I was referring to: https://forum.proxmox.com/threads/proxmoxve-ceph-clock-issue.20684/#post-153248

I followed the instruction and since then no issues and at the time there were many other users claiming that systemd is no go for time yet.

Had the same issue as described there. Maybe I was missing something with systemd ? Are you guys (Alwin form Proxmox) running a ceph cluster in live environment (not for testing) using systemd time with no issues ?

Thank you
 
Yes, from personal experience, I even didn't have a issue on PVE4.x, mostly because I had a local NTP server running, where all my system got its time from and as timesyncd is only taking to one ntp server (SNTP) at a time (takes the first one that responds), it can lead to different time stamps. Alone for loss of internet local time servers are a good idea.
 
I am not sure what you mean, could you elaborate. What you are saying is that when using timesynd and a local NTP server you had no issues with clock skew on ceph - is that right ?

We had two local NTP server (in case one server dies, one of them was standalone and not a VM) and with Systemd time sync I was getting clock skew, once we switched to ntpd the issue went away. We had cluster of 3 proxmox servers for running VMs and a separate cluster with 3 nodes running ceph on top of proxmox with 5TB of storage spread among 18 hard drives and 60 to 70 Virtual machines running using this solution.
 
I am not sure what you mean, could you elaborate. What you are saying is that when using timesynd and a local NTP server you had no issues with clock skew on ceph - is that right ?
Yes, besides rebooting the server and it disappeared after a minute or two.

NTPd queries 3 servers to get the accurate time, this may be the difference, why you get clock skew with timesyncd. Also there was/is (don't recall if resolved) a issue where it seems, that timesyncd doesn't keep track of time as good as ntpd. Only judging from my experience, that I didn't have issues, but this also be only in my case.
 
do you all just use the debian package default /etc/ntp.conf - NTP server configuration file ?
Well, if I use ntpd, then I at least set time server closer to the location of my ntpd server.
 
I just built a brand new 3-nodes Proxmox 5.1 + Ceph cluster and had severe clock skew problems. timesyncd was not precise enough. Even ntp failed. After a lot of research and testing, I installed chrony and everything is finally stable! Here you are the steps in Proxmox 5.1 to reliably disable timesyncd and replace it with chrony:
  1. timedatectl set-ntp false
  2. systemctl stop systemd-timesyncd
  3. systemctl stop systemd-timedated
  4. systemctl disable systemd-timesyncd
  5. systemctl disable systemd-timedated
  6. apt-get install chrony
  7. cp /lib/systemd/system/pve-cluster.service /etc/systemd/system/pve-cluster.service
  8. Edit /etc/systemd/system/pve-cluster.service and replace "Wants=systemd-timesyncd.service" with "Wants=chrony.service"
  9. Reboot node
  10. Repeat 1 by 1 for each node
  11. Ceph is finally "HEALTH_OK", no more clock skews!
 
  • Like
Reactions: RokaKen
@lucaferr, timesyncd and ntp failed and crony worked. So, you configured crony differently from timesyncd & ntp?

As for the first two, those services use a pool of server, where the get their time from. Also NTP uses three different sources to calculate a median time to use. On a different host ntp can use different time sources to sync. This makes clock skew more likely.

For all cluster setups, ceph, pve, or whatever else, it is recommended to use a local time source (hardware) and all servers get their time from this source. The local ntp server then can use a pool of servers to get its time from.
 
@Alwin I didn't configure chrony at all, leaving all as default. It uses "2.debian.pool.ntp.org" (I see that the sort algorithm works perfectly, since it automatically picks Italian NTP sources (my server infrastructure is in Italy)).
With NTPD I had weird results, with disalignments of several seconds (even 10 seconds!) between the nodes...even configuring a single NTP source, synchronization failed...very strange, never seen before...and after several hours debugging and trying different configurations with no success I fixed using chrony with its default config...
 
I'm using chrony in production, it's faster to sync clock than ntpd, openntpd. (and timesyncd is only like an cron ntpdate, really not enough precision for ceph)
 
My point was going, to the fact that you need one time source that is close to your cluster and let all servers sync from it. Locality is important, as you both stated, ceph needs a precise time on all its servers. From my experience, I had no issues with timesyncd or ntpd, but that said, I always had my time server close to the ceph &/ pve clusters.
 
I get your point and it does make sense. But I tried to synchronize 3 different nodes with a single NTP source a few hundreds kilometers away from the nodes and they got a time difference of 10 seconds among them: this is impossible, so probably my system had some sort of conflict with ntpd I guess...this happened both with default ntpd config (and debian NTP pools) and with custom config (single NTP source close to servers). In all cases every source was then marked as "rejected" by ntpd. Please note that timesyncd had been disabled so could not be the cause of the problems.
I'm sure that ntpd works perfectly fine in thousands of servers...I just wanted to tell to anyone having severe clock skews like I did that before banging your head against ntpd for hours, there is also an excellent alternative called chrony, which I didn't know about before today ;-)
 
  • Like
Reactions: Alwin
lucaferr wrote: I tried to synchronize 3 different nodes with a single NTP source a few hundreds kilometers away from the nodes...

You need to use local NTP servers for synchronization, I would not recommend using any outside/public servers.
 
My Solution


-- /etc/systemd/timesyncd.conf --
[Time]
NTP=LOCAL_NTP_IP NTP1.com NTP2.com NTP3.com
FallbackNTP=PROXMOX_HOST1 PROXMOX_HOST2 PROXMOX_HOST3 PROXMOX_HOST[N]
RootDistanceMaxSec=5
PollIntervalMinSec=32
PollIntervalMaxSec=2048


# timedatectl set-ntp true
# systemctl restart systemd-timesyncd.service systemd-timedated.service
# systemctl restart ceph-mon.target
# hwclock -w
# timedatectl status
# journalctl --since -1h -u systemd-timesyncd
# ceph mon sync force --yes-i-really-mean-it --i-know-what-i-am-doing
# ceph healt status
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!