[SOLVED] ceph clock skew issue - no way out?

Discussion in 'Proxmox VE: Installation and configuration' started by Knuuut, Sep 13, 2018.

  1. Knuuut

    Knuuut Member

    Joined:
    Jun 7, 2018
    Messages:
    36
    Likes Received:
    3
    Hi,

    there are plenty of posts about clock skew issues within this forum. I'm affected too.

    So, I've tried different actions to get 4 Nodes with identical hardware permanently in sync with no success.

    Even this post https://forum.proxmox.com/threads/proxmoxve-ceph-clock-issue.20684/#post-105441 made things worse, so I switched back to timedatectl, where the count of clock skew incidents is less than with ntpd.

    With ntpd, I've seen a high jitter value (>200ms) on every node.

    With timedatectl, there is no way to get the jitter value afaik.

    Maybe a change of the Linux clocksource from tsc to hpet would be a solution?

    Any help would be appreciated.

    Cheers Knuuut
     
    #1 Knuuut, Sep 13, 2018
    Last edited: Sep 13, 2018
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,571
    Likes Received:
    138
    Use a time source that is on the local ceph network and on hardware (not virtual).
     
  3. Knuuut

    Knuuut Member

    Joined:
    Jun 7, 2018
    Messages:
    36
    Likes Received:
    3
    I'm using the local ntp servers from my Datacenter-Provider.

    Again, with the same ntp servers and with ntpd on nodes, things got worse and I can't reproduce this behavior ond other hardware.

    So, my guess is an unstable clocksource (tsc) on all 4 nodes...?

    Does anybody has experiences about switching the clocksource from tsc to hpet?

    Cheers Knuuut
     
  4. RobFantini

    RobFantini Active Member
    Proxmox VE Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,375
    Likes Received:
    16
    we install ntp on each node .
    then edit /etc/ntp.conf to use the router to internet as ntp server [ the router is pfsense which is running on hardware ].
     
    AlexLup likes this.
  5. judexzhu

    judexzhu New Member

    Joined:
    Aug 22, 2018
    Messages:
    10
    Likes Received:
    0
    Try this.

    Code:
    echo "NTP=10.1.1.11 10.1.1.12 10.1.1.13" >> /etc/systemd/timesyncd.conf
    
    timedatectl set-ntp true
    systemctl restart systemd-timesyncd
    
    systemctl status systemd-timesyncd
    
    date
    
    hwclock -w
    Replace "10.1.1.11 10.1.1.12 10.1.1.13" with your own ntp server ip addresses
     
    #5 judexzhu, Sep 14, 2018
    Last edited: Sep 14, 2018
  6. Knuuut

    Knuuut Member

    Joined:
    Jun 7, 2018
    Messages:
    36
    Likes Received:
    3
    I think you mean "NTP=" instead of "Servers="

    Anyway, thats like my current configuration.

    I've already tried this with the ntp servers inside my Datacenter, also with the debian pool servers. This configaration never got "health ok".
     
  7. RobFantini

    RobFantini Active Member
    Proxmox VE Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,375
    Likes Received:
    16
    I think that systemd ntp is buggy, or there are world wide operator errors.

    search around.
     
  8. RobFantini

    RobFantini Active Member
    Proxmox VE Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,375
    Likes Received:
    16
    also ntp just works , like dns and dhcp. if issues after 'apt install ntp' and configure - look at network
     
  9. Knuuut

    Knuuut Member

    Joined:
    Jun 7, 2018
    Messages:
    36
    Likes Received:
    3
    As I wrote before, I can't reproduce this on other (older) hardware.

    So my focus is on the current (new) hardware:

    Intel S2600STB Mainboards with dual Xenons
    Intel X520-DA2 dual 10Gb SFP+ nics
    LSI 9341-4i

    No exotic components at all

    Any ideas anybody?
     
  10. mcdowellster

    mcdowellster New Member

    Joined:
    Jun 13, 2018
    Messages:
    12
    Likes Received:
    3
    I had the same problem on my cluster for a few hours. My firewall was blocking 123 as I place all my servers in a management subnet with minimal access to the internet. Honestly I used the default time server, updated the time and the issue was resolved.
     
  11. Knuuut

    Knuuut Member

    Joined:
    Jun 7, 2018
    Messages:
    36
    Likes Received:
    3
    There is no firewall issue, because ntpq -pn (in case of running ntpd) and also systemctl status systemd-timesyncd.service is giving me positive output.
     
  12. Knuuut

    Knuuut Member

    Joined:
    Jun 7, 2018
    Messages:
    36
    Likes Received:
    3
    Finally, I solved this issue by myself.

    What I did:

    Set
    Code:
    NTP=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org
    in /etc/systemd/timesync.conf on every node.

    But important was this:
    Code:
    hwclock -w
    several times on every node.

    No more clock skew issues since Friday.
     
  13. lucaferr

    lucaferr Member

    Joined:
    Jun 21, 2011
    Messages:
    37
    Likes Received:
    1
    #13 lucaferr, Sep 17, 2018
    Last edited: Sep 17, 2018
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice