Hello,
I just realized that a node of my PVE HA cluster rebooted w/out apparent reason so I wanted to make some troubleshooting. I have the feeling that the soft watchdog could be responsible for this, maybe due to some misconfiguration with the time server. Is it possible to understand if the softdog issued a reboot on the machine by the logs? I've looked on /var/log/syslog but nothing significant shows up.
Edit: Apparently other people are reporting this behavior recently, so I'm posting some more info here:
I just realized that a node of my PVE HA cluster rebooted w/out apparent reason so I wanted to make some troubleshooting. I have the feeling that the soft watchdog could be responsible for this, maybe due to some misconfiguration with the time server. Is it possible to understand if the softdog issued a reboot on the machine by the logs? I've looked on /var/log/syslog but nothing significant shows up.
Edit: Apparently other people are reporting this behavior recently, so I'm posting some more info here:
proxmox-ve: 4.4-77 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-5 (running version: 4.4-5/c43015a5)
pve-kernel-4.4.35-1-pve: 4.4.35-77
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-102
pve-firmware: 1.1-10
libpve-common-perl: 4.0-85
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-71
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-10
pve-container: 1.0-90
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-5
lxcfs: 2.0.5-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
pve-manager: 4.4-5 (running version: 4.4-5/c43015a5)
pve-kernel-4.4.35-1-pve: 4.4.35-77
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-102
pve-firmware: 1.1-10
libpve-common-perl: 4.0-85
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-71
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-10
pve-container: 1.0-90
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-5
lxcfs: 2.0.5-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
Feb 12 15:26:00 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.253s/0.000s/0.001s/-500ppm
Feb 12 15:26:32 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.253s/0.000s/0.001s/-500ppm
Feb 12 15:27:04 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.253s/0.000s/0.001s/-500ppm
Feb 12 15:27:36 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.252s/0.000s/0.001s/-500ppm
Feb 12 15:28:09 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.252s/0.000s/0.002s/-500ppm
Feb 12 15:28:41 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.251s/0.000s/0.002s/-500ppm
Feb 12 15:29:13 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.251s/0.000s/0.002s/-500ppm
Feb 12 15:29:45 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.250s/0.000s/0.001s/-500ppm
Feb 12 15:30:18 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.250s/0.000s/0.002s/-500ppm
Feb 12 15:30:50 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.250s/0.000s/0.001s/-500ppm
Feb 12 15:31:22 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.249s/0.000s/0.001s/-500ppm
Feb 12 15:31:54 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.249s/0.000s/0.001s/-500ppm
Feb 12 15:32:27 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.249s/0.000s/0.001s/-500ppm
Feb 12 15:32:59 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.248s/0.000s/0.001s/-500ppm
Feb 12 15:33:31 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.248s/0.000s/0.001s/-500ppm
Feb 12 15:34:36 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:35:40 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:37:48 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:42:04 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:44:11 pve2 rrdcached[1752]: flushing old values
Feb 12 15:44:11 pve2 rrdcached[1752]: rotating journals
Feb 12 15:44:11 pve2 rrdcached[1752]: started new journal /var/lib/rrdcached/journal/rrd.journal.1486910651.026202
Feb 12 15:44:11 pve2 rrdcached[1752]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1486903451.026154
Feb 12 15:44:47 pve2 pmxcfs[1884]: [dcdb] notice: data verification successful
Feb 12 15:50:37 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 16:07:41 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 16:17:01 pve2 CRON[1058]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 12 16:41:49 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 16:44:11 pve2 rrdcached[1752]: flushing old values
Feb 12 16:44:11 pve2 rrdcached[1752]: rotating journals
Feb 12 16:44:11 pve2 rrdcached[1752]: started new journal /var/lib/rrdcached/journal/rrd.journal.1486914251.026149
Feb 12 16:44:11 pve2 rrdcached[1752]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1486907051.026145
Feb 12 16:44:47 pve2 pmxcfs[1884]: [dcdb] notice: data verification successful
Feb 12 17:15:57 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 17:17:01 pve2 CRON[7908]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 12 17:44:11 pve2 rrdcached[1752]: flushing old values
Feb 12 17:44:11 pve2 rrdcached[1752]: rotating journals
Feb 12 17:44:11 pve2 rrdcached[1752]: started new journal /var/lib/rrdcached/journal/rrd.journal.1486917851.026207
Feb 12 17:44:11 pve2 rrdcached[1752]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1486910651.026202
Feb 12 17:44:47 pve2 pmxcfs[1884]: [dcdb] notice: data verification successful
--- REBOOT HAPPENED HERE ---
Feb 12 17:52:30 pve2 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1777" x-info="http://www.rsyslog.com"] start
Feb 12 17:52:30 pve2 systemd-modules-load[262]: Module 'fuse' is builtin
Feb 12 17:52:30 pve2 systemd-modules-load[262]: Inserted module 'vhost_net'
Feb 12 17:52:30 pve2 systemd[1]: Started LSB: Tune IDE hard disks.
Feb 12 17:52:30 pve2 hdparm[295]: Setting parameters of disc: (none).
Feb 12 17:52:30 pve2 keyboard-setup[294]: Setting preliminary keymap...done.
Feb 12 17:52:30 pve2 systemd[1]: Started LSB: Set preliminary keymap.
Feb 12 17:52:30 pve2 systemd[1]: Starting Remount Root and Kernel File Systems...
Feb 12 17:52:30 pve2 systemd[1]: Started Remount Root and Kernel File Systems.
Feb 12 17:52:30 pve2 systemd[1]: Started Various fixups to make systemd work better on Debian.
Feb 12 17:52:30 pve2 systemd[1]: Starting Load/Save Random Seed...
Feb 12 17:52:30 pve2 systemd[1]: Starting Local File Systems (Pre).
Feb 12 17:52:30 pve2 systemd[1]: Reached target Local File Systems (Pre).
Feb 12 17:52:30 pve2 systemd[1]: Started Load/Save Random Seed.
Feb 12 17:52:30 pve2 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
Feb 12 17:52:30 pve2 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Feb 12 17:52:30 pve2 systemd[1]: Starting LVM2 PV scan on device 8:4...
Feb 12 17:52:30 pve2 systemd[1]: Starting LVM2 PV scan on device 8:3...
Feb 12 17:52:30 pve2 systemd[1]: Started udev Wait for Complete Device Initialization.
Feb 12 17:52:30 pve2 systemd[1]: Starting Activation of LVM2 logical volumes...
Feb 12 17:52:30 pve2 systemd[1]: Starting Copy rules generated while the root was ro...
Feb 12 17:52:30 pve2 systemd[1]: Started Copy rules generated while the root was ro.
Feb 12 17:52:30 pve2 systemd[1]: Started LVM2 PV scan on device 8:3.
Feb 12 17:52:30 pve2 systemd[1]: Started LVM2 PV scan on device 8:4.
Feb 12 17:52:30 pve2 lvm[515]: 0 logical volume(s) in volume group "localvm" now active
Feb 12 17:52:30 pve2 lvm[515]: 2 logical volume(s) in volume group "pve" now active
Feb 12 17:52:30 pve2 systemd[1]: Found device /dev/pve/swap.
Feb 12 17:52:30 pve2 systemd[1]: Started Activation of LVM2 logical volumes.
Feb 12 17:52:30 pve2 systemd[1]: Starting Encrypted Volumes.
Feb 12 17:52:30 pve2 systemd[1]: Reached target Encrypted Volumes.
Feb 12 17:52:30 pve2 systemd[1]: Starting Activation of LVM2 logical volumes...
Feb 12 17:52:30 pve2 systemd[1]: Started Import ZFS pools by cache file.
Feb 12 17:52:30 pve2 systemd[1]: Starting Import ZFS pools by device scanning...
Feb 12 17:52:30 pve2 systemd[1]: Activating swap /dev/pve/swap...
Feb 12 17:52:30 pve2 systemd[1]: Activated swap /dev/pve/swap.
Feb 12 17:52:30 pve2 systemd[1]: Starting Swap.
Feb 12 17:52:30 pve2 systemd[1]: Reached target Swap.
Feb 12 17:52:30 pve2 zpool[619]: no pools available to import
Feb 12 17:52:30 pve2 systemd[1]: Started Import ZFS pools by device scanning.
Feb 12 17:52:30 pve2 systemd[1]: Starting Mount ZFS filesystems...
Feb 12 17:52:30 pve2 systemd[1]: Started Mount ZFS filesystems.
Feb 12 17:52:30 pve2 lvm[532]: 0 logical volume(s) in volume group "localvm" now active
Feb 12 17:52:30 pve2 lvm[532]: 2 logical volume(s) in volume group "pve" now active
Feb 12 17:52:30 pve2 systemd[1]: Started Activation of LVM2 logical volumes.
Feb 12 17:52:30 pve2 systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Feb 12 17:52:30 pve2 lvm[626]: 2 logical volume(s) in volume group "pve" monitored
Feb 12 17:52:30 pve2 systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Feb 12 17:52:30 pve2 systemd[1]: Starting Local File Systems.
Feb 12 17:52:30 pve2 systemd[1]: Reached target Local File Systems.
Feb 12 17:52:30 pve2 systemd[1]: Starting Proxmox VE firewall logger...
Feb 12 17:52:30 pve2 systemd[1]: Starting Proxmox VE Login Banner...
Feb 12 15:26:32 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.253s/0.000s/0.001s/-500ppm
Feb 12 15:27:04 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.253s/0.000s/0.001s/-500ppm
Feb 12 15:27:36 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.252s/0.000s/0.001s/-500ppm
Feb 12 15:28:09 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.252s/0.000s/0.002s/-500ppm
Feb 12 15:28:41 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.251s/0.000s/0.002s/-500ppm
Feb 12 15:29:13 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.251s/0.000s/0.002s/-500ppm
Feb 12 15:29:45 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.250s/0.000s/0.001s/-500ppm
Feb 12 15:30:18 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.250s/0.000s/0.002s/-500ppm
Feb 12 15:30:50 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.250s/0.000s/0.001s/-500ppm
Feb 12 15:31:22 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.249s/0.000s/0.001s/-500ppm
Feb 12 15:31:54 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.249s/0.000s/0.001s/-500ppm
Feb 12 15:32:27 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.249s/0.000s/0.001s/-500ppm
Feb 12 15:32:59 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.248s/0.000s/0.001s/-500ppm
Feb 12 15:33:31 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.248s/0.000s/0.001s/-500ppm
Feb 12 15:34:36 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:35:40 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:37:48 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:42:04 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:44:11 pve2 rrdcached[1752]: flushing old values
Feb 12 15:44:11 pve2 rrdcached[1752]: rotating journals
Feb 12 15:44:11 pve2 rrdcached[1752]: started new journal /var/lib/rrdcached/journal/rrd.journal.1486910651.026202
Feb 12 15:44:11 pve2 rrdcached[1752]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1486903451.026154
Feb 12 15:44:47 pve2 pmxcfs[1884]: [dcdb] notice: data verification successful
Feb 12 15:50:37 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 16:07:41 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 16:17:01 pve2 CRON[1058]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 12 16:41:49 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 16:44:11 pve2 rrdcached[1752]: flushing old values
Feb 12 16:44:11 pve2 rrdcached[1752]: rotating journals
Feb 12 16:44:11 pve2 rrdcached[1752]: started new journal /var/lib/rrdcached/journal/rrd.journal.1486914251.026149
Feb 12 16:44:11 pve2 rrdcached[1752]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1486907051.026145
Feb 12 16:44:47 pve2 pmxcfs[1884]: [dcdb] notice: data verification successful
Feb 12 17:15:57 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 17:17:01 pve2 CRON[7908]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 12 17:44:11 pve2 rrdcached[1752]: flushing old values
Feb 12 17:44:11 pve2 rrdcached[1752]: rotating journals
Feb 12 17:44:11 pve2 rrdcached[1752]: started new journal /var/lib/rrdcached/journal/rrd.journal.1486917851.026207
Feb 12 17:44:11 pve2 rrdcached[1752]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1486910651.026202
Feb 12 17:44:47 pve2 pmxcfs[1884]: [dcdb] notice: data verification successful
--- REBOOT HAPPENED HERE ---
Feb 12 17:52:30 pve2 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1777" x-info="http://www.rsyslog.com"] start
Feb 12 17:52:30 pve2 systemd-modules-load[262]: Module 'fuse' is builtin
Feb 12 17:52:30 pve2 systemd-modules-load[262]: Inserted module 'vhost_net'
Feb 12 17:52:30 pve2 systemd[1]: Started LSB: Tune IDE hard disks.
Feb 12 17:52:30 pve2 hdparm[295]: Setting parameters of disc: (none).
Feb 12 17:52:30 pve2 keyboard-setup[294]: Setting preliminary keymap...done.
Feb 12 17:52:30 pve2 systemd[1]: Started LSB: Set preliminary keymap.
Feb 12 17:52:30 pve2 systemd[1]: Starting Remount Root and Kernel File Systems...
Feb 12 17:52:30 pve2 systemd[1]: Started Remount Root and Kernel File Systems.
Feb 12 17:52:30 pve2 systemd[1]: Started Various fixups to make systemd work better on Debian.
Feb 12 17:52:30 pve2 systemd[1]: Starting Load/Save Random Seed...
Feb 12 17:52:30 pve2 systemd[1]: Starting Local File Systems (Pre).
Feb 12 17:52:30 pve2 systemd[1]: Reached target Local File Systems (Pre).
Feb 12 17:52:30 pve2 systemd[1]: Started Load/Save Random Seed.
Feb 12 17:52:30 pve2 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
Feb 12 17:52:30 pve2 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Feb 12 17:52:30 pve2 systemd[1]: Starting LVM2 PV scan on device 8:4...
Feb 12 17:52:30 pve2 systemd[1]: Starting LVM2 PV scan on device 8:3...
Feb 12 17:52:30 pve2 systemd[1]: Started udev Wait for Complete Device Initialization.
Feb 12 17:52:30 pve2 systemd[1]: Starting Activation of LVM2 logical volumes...
Feb 12 17:52:30 pve2 systemd[1]: Starting Copy rules generated while the root was ro...
Feb 12 17:52:30 pve2 systemd[1]: Started Copy rules generated while the root was ro.
Feb 12 17:52:30 pve2 systemd[1]: Started LVM2 PV scan on device 8:3.
Feb 12 17:52:30 pve2 systemd[1]: Started LVM2 PV scan on device 8:4.
Feb 12 17:52:30 pve2 lvm[515]: 0 logical volume(s) in volume group "localvm" now active
Feb 12 17:52:30 pve2 lvm[515]: 2 logical volume(s) in volume group "pve" now active
Feb 12 17:52:30 pve2 systemd[1]: Found device /dev/pve/swap.
Feb 12 17:52:30 pve2 systemd[1]: Started Activation of LVM2 logical volumes.
Feb 12 17:52:30 pve2 systemd[1]: Starting Encrypted Volumes.
Feb 12 17:52:30 pve2 systemd[1]: Reached target Encrypted Volumes.
Feb 12 17:52:30 pve2 systemd[1]: Starting Activation of LVM2 logical volumes...
Feb 12 17:52:30 pve2 systemd[1]: Started Import ZFS pools by cache file.
Feb 12 17:52:30 pve2 systemd[1]: Starting Import ZFS pools by device scanning...
Feb 12 17:52:30 pve2 systemd[1]: Activating swap /dev/pve/swap...
Feb 12 17:52:30 pve2 systemd[1]: Activated swap /dev/pve/swap.
Feb 12 17:52:30 pve2 systemd[1]: Starting Swap.
Feb 12 17:52:30 pve2 systemd[1]: Reached target Swap.
Feb 12 17:52:30 pve2 zpool[619]: no pools available to import
Feb 12 17:52:30 pve2 systemd[1]: Started Import ZFS pools by device scanning.
Feb 12 17:52:30 pve2 systemd[1]: Starting Mount ZFS filesystems...
Feb 12 17:52:30 pve2 systemd[1]: Started Mount ZFS filesystems.
Feb 12 17:52:30 pve2 lvm[532]: 0 logical volume(s) in volume group "localvm" now active
Feb 12 17:52:30 pve2 lvm[532]: 2 logical volume(s) in volume group "pve" now active
Feb 12 17:52:30 pve2 systemd[1]: Started Activation of LVM2 logical volumes.
Feb 12 17:52:30 pve2 systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Feb 12 17:52:30 pve2 lvm[626]: 2 logical volume(s) in volume group "pve" monitored
Feb 12 17:52:30 pve2 systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Feb 12 17:52:30 pve2 systemd[1]: Starting Local File Systems.
Feb 12 17:52:30 pve2 systemd[1]: Reached target Local File Systems.
Feb 12 17:52:30 pve2 systemd[1]: Starting Proxmox VE firewall logger...
Feb 12 17:52:30 pve2 systemd[1]: Starting Proxmox VE Login Banner...
Last edited: