Node rebooting without apparent reason

Kei · Feb 12, 2017

Hello,
I just realized that a node of my PVE HA cluster rebooted w/out apparent reason so I wanted to make some troubleshooting. I have the feeling that the soft watchdog could be responsible for this, maybe due to some misconfiguration with the time server. Is it possible to understand if the softdog issued a reboot on the machine by the logs? I've looked on /var/log/syslog but nothing significant shows up.

Edit: Apparently other people are reporting this behavior recently, so I'm posting some more info here:

proxmox-ve: 4.4-77 (running kernel: 4.4.35-1-pve)
pve-manager: 4.4-5 (running version: 4.4-5/c43015a5)
pve-kernel-4.4.35-1-pve: 4.4.35-77
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-102
pve-firmware: 1.1-10
libpve-common-perl: 4.0-85
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-71
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-10
pve-container: 1.0-90
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-5
lxcfs: 2.0.5-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80

Feb 12 15:26:00 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.253s/0.000s/0.001s/-500ppm
Feb 12 15:26:32 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.253s/0.000s/0.001s/-500ppm
Feb 12 15:27:04 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.253s/0.000s/0.001s/-500ppm
Feb 12 15:27:36 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.252s/0.000s/0.001s/-500ppm
Feb 12 15:28:09 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.252s/0.000s/0.002s/-500ppm
Feb 12 15:28:41 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.251s/0.000s/0.002s/-500ppm
Feb 12 15:29:13 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.251s/0.000s/0.002s/-500ppm
Feb 12 15:29:45 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.250s/0.000s/0.001s/-500ppm
Feb 12 15:30:18 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.250s/0.000s/0.002s/-500ppm
Feb 12 15:30:50 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.250s/0.000s/0.001s/-500ppm
Feb 12 15:31:22 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.249s/0.000s/0.001s/-500ppm
Feb 12 15:31:54 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.249s/0.000s/0.001s/-500ppm
Feb 12 15:32:27 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.249s/0.000s/0.001s/-500ppm
Feb 12 15:32:59 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.248s/0.000s/0.001s/-500ppm
Feb 12 15:33:31 pve2 systemd-timesyncd[657]: interval/delta/delay/jitter/drift 32s/-0.248s/0.000s/0.001s/-500ppm
Feb 12 15:34:36 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:35:40 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:37:48 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:42:04 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 15:44:11 pve2 rrdcached[1752]: flushing old values
Feb 12 15:44:11 pve2 rrdcached[1752]: rotating journals
Feb 12 15:44:11 pve2 rrdcached[1752]: started new journal /var/lib/rrdcached/journal/rrd.journal.1486910651.026202
Feb 12 15:44:11 pve2 rrdcached[1752]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1486903451.026154
Feb 12 15:44:47 pve2 pmxcfs[1884]: [dcdb] notice: data verification successful
Feb 12 15:50:37 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 16:07:41 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 16:17:01 pve2 CRON[1058]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 12 16:41:49 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 16:44:11 pve2 rrdcached[1752]: flushing old values
Feb 12 16:44:11 pve2 rrdcached[1752]: rotating journals
Feb 12 16:44:11 pve2 rrdcached[1752]: started new journal /var/lib/rrdcached/journal/rrd.journal.1486914251.026149
Feb 12 16:44:11 pve2 rrdcached[1752]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1486907051.026145
Feb 12 16:44:47 pve2 pmxcfs[1884]: [dcdb] notice: data verification successful
Feb 12 17:15:57 pve2 systemd-timesyncd[657]: Using NTP server 192.168.7.1:123 (192.168.7.1).
Feb 12 17:17:01 pve2 CRON[7908]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Feb 12 17:44:11 pve2 rrdcached[1752]: flushing old values
Feb 12 17:44:11 pve2 rrdcached[1752]: rotating journals
Feb 12 17:44:11 pve2 rrdcached[1752]: started new journal /var/lib/rrdcached/journal/rrd.journal.1486917851.026207
Feb 12 17:44:11 pve2 rrdcached[1752]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1486910651.026202
Feb 12 17:44:47 pve2 pmxcfs[1884]: [dcdb] notice: data verification successful

--- REBOOT HAPPENED HERE ---

Feb 12 17:52:30 pve2 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1777" x-info="http://www.rsyslog.com"] start
Feb 12 17:52:30 pve2 systemd-modules-load[262]: Module 'fuse' is builtin
Feb 12 17:52:30 pve2 systemd-modules-load[262]: Inserted module 'vhost_net'
Feb 12 17:52:30 pve2 systemd[1]: Started LSB: Tune IDE hard disks.
Feb 12 17:52:30 pve2 hdparm[295]: Setting parameters of disc: (none).
Feb 12 17:52:30 pve2 keyboard-setup[294]: Setting preliminary keymap...done.
Feb 12 17:52:30 pve2 systemd[1]: Started LSB: Set preliminary keymap.
Feb 12 17:52:30 pve2 systemd[1]: Starting Remount Root and Kernel File Systems...
Feb 12 17:52:30 pve2 systemd[1]: Started Remount Root and Kernel File Systems.
Feb 12 17:52:30 pve2 systemd[1]: Started Various fixups to make systemd work better on Debian.
Feb 12 17:52:30 pve2 systemd[1]: Starting Load/Save Random Seed...
Feb 12 17:52:30 pve2 systemd[1]: Starting Local File Systems (Pre).
Feb 12 17:52:30 pve2 systemd[1]: Reached target Local File Systems (Pre).
Feb 12 17:52:30 pve2 systemd[1]: Started Load/Save Random Seed.
Feb 12 17:52:30 pve2 systemd[1]: Starting system-lvm2\x2dpvscan.slice.
Feb 12 17:52:30 pve2 systemd[1]: Created slice system-lvm2\x2dpvscan.slice.
Feb 12 17:52:30 pve2 systemd[1]: Starting LVM2 PV scan on device 8:4...
Feb 12 17:52:30 pve2 systemd[1]: Starting LVM2 PV scan on device 8:3...
Feb 12 17:52:30 pve2 systemd[1]: Started udev Wait for Complete Device Initialization.
Feb 12 17:52:30 pve2 systemd[1]: Starting Activation of LVM2 logical volumes...
Feb 12 17:52:30 pve2 systemd[1]: Starting Copy rules generated while the root was ro...
Feb 12 17:52:30 pve2 systemd[1]: Started Copy rules generated while the root was ro.
Feb 12 17:52:30 pve2 systemd[1]: Started LVM2 PV scan on device 8:3.
Feb 12 17:52:30 pve2 systemd[1]: Started LVM2 PV scan on device 8:4.
Feb 12 17:52:30 pve2 lvm[515]: 0 logical volume(s) in volume group "localvm" now active
Feb 12 17:52:30 pve2 lvm[515]: 2 logical volume(s) in volume group "pve" now active
Feb 12 17:52:30 pve2 systemd[1]: Found device /dev/pve/swap.
Feb 12 17:52:30 pve2 systemd[1]: Started Activation of LVM2 logical volumes.
Feb 12 17:52:30 pve2 systemd[1]: Starting Encrypted Volumes.
Feb 12 17:52:30 pve2 systemd[1]: Reached target Encrypted Volumes.
Feb 12 17:52:30 pve2 systemd[1]: Starting Activation of LVM2 logical volumes...
Feb 12 17:52:30 pve2 systemd[1]: Started Import ZFS pools by cache file.
Feb 12 17:52:30 pve2 systemd[1]: Starting Import ZFS pools by device scanning...
Feb 12 17:52:30 pve2 systemd[1]: Activating swap /dev/pve/swap...
Feb 12 17:52:30 pve2 systemd[1]: Activated swap /dev/pve/swap.
Feb 12 17:52:30 pve2 systemd[1]: Starting Swap.
Feb 12 17:52:30 pve2 systemd[1]: Reached target Swap.
Feb 12 17:52:30 pve2 zpool[619]: no pools available to import
Feb 12 17:52:30 pve2 systemd[1]: Started Import ZFS pools by device scanning.
Feb 12 17:52:30 pve2 systemd[1]: Starting Mount ZFS filesystems...
Feb 12 17:52:30 pve2 systemd[1]: Started Mount ZFS filesystems.
Feb 12 17:52:30 pve2 lvm[532]: 0 logical volume(s) in volume group "localvm" now active
Feb 12 17:52:30 pve2 lvm[532]: 2 logical volume(s) in volume group "pve" now active
Feb 12 17:52:30 pve2 systemd[1]: Started Activation of LVM2 logical volumes.
Feb 12 17:52:30 pve2 systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Feb 12 17:52:30 pve2 lvm[626]: 2 logical volume(s) in volume group "pve" monitored
Feb 12 17:52:30 pve2 systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Feb 12 17:52:30 pve2 systemd[1]: Starting Local File Systems.
Feb 12 17:52:30 pve2 systemd[1]: Reached target Local File Systems.
Feb 12 17:52:30 pve2 systemd[1]: Starting Proxmox VE firewall logger...
Feb 12 17:52:30 pve2 systemd[1]: Starting Proxmox VE Login Banner...

gkovacs · Feb 16, 2017

Kei said:
Hello,
I just realized that a node of my PVE HA cluster rebooted w/out apparent reason so I wanted to make some troubleshooting. I have the feeling that the soft watchdog could be responsible for this, maybe due to some misconfiguration with the time server. Is it possible to understand if the softdog issued a reboot on the machine by the logs? I've looked on /var/log/syslog but nothing significant shows up.

Are you on ZFS? Is your system low on free memory? If your answer to both questions is yes, then your spontaneous reboot can be prevented by the following:

1. ZFS ARC size
We aggressively limit the ZFS ARC size, as it has led to several spontaneous reboots in the past when left unlimited. Basically, we add up all the memory the system uses without caches and buffers (like all the KVM maximum RAM combined), subtract that from total host RAM, and set the ARC to something a bit less than that, so it has to compete with system cache only. For example: on a 32GB server the maximum RAM allocation of KVM guests is 25 GB, so we set the ARC to max out at 5GB (leaving 2GB for anything else). We also set a lower limit of 1GB to the ARC, as it has been reported that it helps performance.

To do that, you have add the following lines to /etc/modprobe.d/zfs.conf

Code:

options zfs zfs_arc_max=5368709120
options zfs zfs_arc_min=1073741824

and after that run and reboot:

Code:

# update-initramfs -u

Looking at the ARC of this very server with arc_summary.py you can see it stays between the limits:

Code:

ARC Size:                               30.72%  1.54    GiB
        Target Size: (Adaptive)         30.72%  1.54    GiB
        Min Size (Hard Limit):          20.00%  1.00    GiB
        Max Size (High Water):          5:1     5.00    GiB

ARC Size Breakdown:
        Recently Used Cache Size:       35.27%  554.85  MiB
        Frequently Used Cache Size:     64.73%  1018.10 MiB

2. SWAP on ZFS zvol
You also have to make sure that swap behaves well if it resides on a ZFS zvol (default installation places it there). Most important is disabling ARC caching the swap volume, but the other tweaks are important as well (and endorsed by the ZFS on Linux community):
https://github.com/zfsonlinux/zfs/wiki/FAQ

Execute these commands in your shell (left out the # so you can copy all lines at once):

Code:

zfs set primarycache=metadata rpool/swap
zfs set secondarycache=metadata rpool/swap
zfs set compression=zle rpool/swap
zfs set checksum=off rpool/swap
zfs set sync=always rpool/swap
zfs set logbias=throughput rpool/swap

You can verify these settings by running:

Code:

# zfs get all rpool/swap

Kei · Feb 16, 2017

Thank you for your reply.
I'm not sure I have ZFS. I would say no, because I'm supposed to only have logical volumes. I do use ZFS, but on the NAS

I might have figured out what was happening btw. The problem seemed to be cause by some misconfiguration on the internal NTP server. For some reason, when I set all the nodes to use a NTP server on the internet, all errors in /var/log/syslog ceased to grow. No more "time has changed" or time missmatch related issues. It's too soon to say that the problem is no longer persistent, but 30 hours w/out reboot is encouraging.

Search

Search

Node rebooting without apparent reason

Kei

Well-Known Member

gkovacs

Renowned Member

Kei

Well-Known Member