Hello.
I've just had another random reboot of one of my proxmox servers after a few weeks of stability, but this time I had a centralized syslog which allowed to capture the whole thing. Here is what I got:
And then I get the stuff after the reboot.
The whole loop of systemd-timesyncd stuff started at around 12:44 today. Only this server got this, all 4 others were fine.
Am I right in thinking that this time adjustments lead to the pve-ha daemon thinking it could not lock its file for long enough (or whatever it's trying to do) because the time changed when it tried to do his thing?
I actually have no idea what systemd-timesyncd is trying to do as its servers configuration is commented:
#Servers=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org
But timedatectl shows it's enabled:
I don't have ntpd installed either.
Did anyone ever get hit by something like this before?
I actually would very much like for this to be the issue as random reboots isn't exactly something I want to ever see on my hypervisors
I've just had another random reboot of one of my proxmox servers after a few weeks of stability, but this time I had a centralized syslog which allowed to capture the whole thing. Here is what I got:
Code:
Jul 19 15:42:33 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.001s/+13ppm
Jul 19 15:43:04 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.001s/+13ppm
Jul 19 15:43:04 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:43:36 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.000s/+13ppm
Jul 19 15:43:36 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:44:07 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.000s/+13ppm
Jul 19 15:44:07 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:44:39 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:44:39 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.000s/+13ppm
Jul 19 15:45:10 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.000s/+13ppm
Jul 19 15:45:10 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:45:41 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:45:41 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.000s/+13ppm
Jul 19 15:45:59 def92-proxmox4 pveproxy[24082]: Clearing outdated entries from certificate cache
Jul 19 15:46:13 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:46:13 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.000s/+13ppm
Jul 19 15:46:44 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:46:44 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.000s/+13ppm
Jul 19 15:47:15 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/-0.872s/0.002s/0.000s/+13ppm
Jul 19 15:47:15 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:47:26 def92-proxmox4 pveproxy[24082]: worker exit
Jul 19 15:47:26 def92-proxmox4 pveproxy[18869]: worker 24082 finished
Jul 19 15:47:26 def92-proxmox4 pveproxy[18869]: starting 1 worker(s)
Jul 19 15:47:26 def92-proxmox4 pveproxy[18869]: worker 28777 started
Jul 19 15:48:17 def92-proxmox4 pvedaemon[2237]: <root@pam> successful auth for user 'root@pam'
Jul 19 15:48:17 def92-proxmox4 pveproxy[28777]: Clearing outdated entries from certificate cache
Jul 19 15:48:20 def92-proxmox4 systemd-timesyncd[1366]: Using NTP server 5.196.160.139:123 (2.debian.pool.ntp.org).
Jul 19 15:53:38 def92-proxmox4 systemd[1]: Time has been changed
Jul 19 15:53:38 def92-proxmox4 systemd-timesyncd[1366]: interval/delta/delay/jitter/drift 32s/+317.737s/0.005s/120.423s/+13ppm
Jul 19 15:53:38 def92-proxmox4 pve-ha-lrm[2320]: loop take too long (322 seconds)
Jul 19 15:53:38 def92-proxmox4 pve-ha-lrm[2320]: lost lock 'ha_agent_def92-proxmox4_lock - can't get cfs lock
Jul 19 15:53:38 def92-proxmox4 watchdog-mux[1899]: client watchdog expired - disable watchdog updates
Jul 19 15:53:38 def92-proxmox4 watchdog-mux[1899]: client watchdog expired - disable watchdog updates
Jul 19 15:53:39 def92-proxmox4 pve-ha-crm[2310]: loop take too long (327 seconds)
Jul 19 15:53:39 def92-proxmox4 pve-ha-crm[2310]: lost lock 'ha_manager_lock - can't get cfs lock
Jul 19 15:53:40 def92-proxmox4 pve-ha-lrm[2320]: successfully acquired lock 'ha_agent_def92-proxmox4_lock'
Jul 19 15:53:40 def92-proxmox4 watchdog-mux[1899]: exit watchdog-mux with active connections
Jul 19 15:53:40 def92-proxmox4 kernel: [2358541.231300] IPMI Watchdog: Unexpected close, not stopping watchdog!
Jul 19 15:53:44 def92-proxmox4 pve-ha-crm[2310]: status change master => lost_manager_lock
Jul 19 15:53:44 def92-proxmox4 pve-ha-crm[2310]: watchdog closed (disabled)
Jul 19 15:53:44 def92-proxmox4 pve-ha-crm[2310]: status change lost_manager_lock => wait_for_quorum
Jul 19 15:53:49 def92-proxmox4 pve-ha-crm[2310]: status change wait_for_quorum => slave
Jul 19 15:53:50 def92-proxmox4 pve-ha-lrm[2320]: watchdog update failed - Broken pipe
The whole loop of systemd-timesyncd stuff started at around 12:44 today. Only this server got this, all 4 others were fine.
Am I right in thinking that this time adjustments lead to the pve-ha daemon thinking it could not lock its file for long enough (or whatever it's trying to do) because the time changed when it tried to do his thing?
I actually have no idea what systemd-timesyncd is trying to do as its servers configuration is commented:
#Servers=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org
But timedatectl shows it's enabled:
Code:
Local time: Wed 2017-07-19 17:08:23 CEST
Universal time: Wed 2017-07-19 15:08:23 UTC
RTC time: Wed 2017-07-19 15:08:23
Time zone: Europe/Paris (CEST, +0200)
NTP enabled: yes
NTP synchronized: yes
RTC in local TZ: no
DST active: yes
Last DST change: DST began at
Sun 2017-03-26 01:59:59 CET
Sun 2017-03-26 03:00:00 CEST
Next DST change: DST ends (the clock jumps one hour backwards) at
Sun 2017-10-29 02:59:59 CEST
Sun 2017-10-29 02:00:00 CET
I don't have ntpd installed either.
Did anyone ever get hit by something like this before?
I actually would very much like for this to be the issue as random reboots isn't exactly something I want to ever see on my hypervisors