Wanted to roll on last weeks changes to PVE 4.3:
so migrated all VMs of first node and ran patch through apt-get upgrade.
SW watchdog then fired a NMI during patching of pve-cluster package and node rebooted, came up fine and we finished it with: dpkg --configure -a and another apt-get upgrade plus a new manual reboot.
At the same time we saw all our other nodes were rebooting and restarting all VMs What's F...
Wondering what might have caused this... and how to avoid this from happing again when patching the rest of the nodes. Hints very much appreciated!
This is snip of syslog of one rebooting node:
Snip from syslog around NMI fired shortly after last VM 312 is migrated off @11:53:20:
grub-common: 2.02-pve4 ==> 2.02-pve5
grub-efi-amd64-bin: 2.02-pve4 ==> 2.02-pve5
grub-efi-ia32-bin: 2.02-pve4 ==> 2.02-pve5
grub-pc: 2.02-pve4 ==> 2.02-pve5
grub-pc-bin: 2.02-pve4 ==> 2.02-pve5
grub2-common: 2.02-pve4 ==> 2.02-pve5
libpve-common-perl: 4.0-79 ==> 4.0-83
lxc-pve: 2.0.5-1 ==> 2.0.6-1
lxcfs: 2.0.4-pve2 ==> 2.0.5-pve1
openvswitch-common: 2.5.0-1 ==> 2.6.0-2
openvswitch-switch: 2.5.0-1 ==> 2.6.0-2
proxmox-ve: 4.3-71 ==> 4.3-72
pve-cluster: 4.0-46 ==> 4.0-47
pve-container: 1.0-80 ==> 1.0-85
pve-docs: 4.3-12 ==> 4.3-17
pve-ha-manager: 1.0-35 ==> 1.0-38
pve-kernel-4.4.24-1-pve: 4.4.24-72 (new)
pve-manager: 4.3-9 ==> 4.3-12
pve-qemu-kvm: 2.7.0-4 ==> 2.7.0-8
qemu-server: 4.0-92 ==> 4.0-96
so migrated all VMs of first node and ran patch through apt-get upgrade.
SW watchdog then fired a NMI during patching of pve-cluster package and node rebooted, came up fine and we finished it with: dpkg --configure -a and another apt-get upgrade plus a new manual reboot.
At the same time we saw all our other nodes were rebooting and restarting all VMs What's F...
Wondering what might have caused this... and how to avoid this from happing again when patching the rest of the nodes. Hints very much appreciated!
This is snip of syslog of one rebooting node:
Dec 5 11:53:09 n2 pmxcfs[4715]: [status] notice: received log
Dec 5 11:53:20 n2 pmxcfs[4715]: [status] notice: received log
Dec 5 11:57:06 n2 pmxcfs[4715]: [dcdb] notice: members: 1/5577, 2/4715, 3/4773, 4/4774, 5/4729, 6/4745
Dec 5 11:57:06 n2 pmxcfs[4715]: [dcdb] notice: starting data syncronisation
Dec 5 11:57:06 n2 pmxcfs[4715]: [status] notice: members: 1/5577, 2/4715, 3/4773, 4/4774, 5/4729, 6/4745
Dec 5 11:57:06 n2 pmxcfs[4715]: [status] notice: starting data syncronisation
Dec 5 11:57:07 n2 pmxcfs[4715]: [status] notice: received sync request (epoch 1/5577/0000000C)
Dec 5 11:57:07 n2 pmxcfs[4715]: [dcdb] notice: received sync request (epoch 1/5577/0000000C)
Dec 5 11:57:08 n2 pmxcfs[4715]: [dcdb] notice: members: 1/5577, 2/4715, 3/4773, 4/4774, 5/4729, 6/4745, 7/21451
Dec 5 11:57:08 n2 pmxcfs[4715]: [dcdb] notice: queue not emtpy - resening 17 messages
Dec 5 11:57:08 n2 pmxcfs[4715]: [status] notice: members: 1/5577, 2/4715, 3/4773, 4/4774, 5/4729, 6/4745, 7/21451
Dec 5 11:57:08 n2 pmxcfs[4715]: [status] notice: queue not emtpy - resening 9 messages
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Dec 5 12:02:32 n2 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="4520" x-info="http://www.rsyslog.com"] start
Dec 5 12:02:32 n2 systemd-modules-load[612]: Module 'fuse' is builtin
Dec 5 12:02:32 n2 systemd-modules-load[612]: Inserted module '8021q'
Dec 5 12:02:32 n2 systemd-modules-load[612]: Inserted module 'bonding'
Dec 5 12:02:32 n2 systemd-modules-load[612]: Inserted module 'vhost_net'
Dec 5 12:02:32 n2 systemd[1]: Started Load Kernel Modules.
Snip from syslog around NMI fired shortly after last VM 312 is migrated off @11:53:20:
Dec 5 11:53:10 n7 pve-ha-lrm[15775]: Task 'UPID:n7:00003DA0:0E585480:584546F3:qmigrate:312:root@pam:' still active, waiting
Dec 5 11:53:13 n7 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap312i0
Dec 5 11:53:13 n7 kernel: [2406823.648467] fwbr312i1: port 1(tap312i1) entered disabled state
Dec 5 11:53:13 n7 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln312o1
Dec 5 11:53:13 n7 kernel: [2406823.670292] fwbr312i1: port 2(fwln312o1) entered disabled state
Dec 5 11:53:13 n7 kernel: [2406823.670487] device fwln312o1 left promiscuous mode
Dec 5 11:53:13 n7 kernel: [2406823.670494] fwbr312i1: port 2(fwln312o1) entered disabled state
Dec 5 11:53:14 n7 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap312i1
Dec 5 11:53:14 n7 ovs-vsctl: ovs|00002|db_ctl_base|ERR|no port named tap312i1
Dec 5 11:53:15 n7 pve-ha-lrm[15775]: Task 'UPID:n7:00003DA0:0E585480:584546F3:qmigrate:312:root@pam:' still active, waiting
Dec 5 11:53:16 n7 multipathd: dm-25: remove map (uevent)
Dec 5 11:53:16 n7 multipathd: dm-25: devmap not registered, can't remove
Dec 5 11:53:16 n7 multipathd: dm-25: remove map (uevent)
Dec 5 11:53:18 n7 multipathd: dm-26: remove map (uevent)
Dec 5 11:53:18 n7 multipathd: dm-26: devmap not registered, can't remove
Dec 5 11:53:18 n7 multipathd: dm-26: remove map (uevent)
Dec 5 11:53:19 n7 multipathd: dm-24: remove map (uevent)
Dec 5 11:53:19 n7 multipathd: dm-24: devmap not registered, can't remove
Dec 5 11:53:19 n7 multipathd: dm-24: remove map (uevent)
Dec 5 11:53:20 n7 pve-ha-lrm[15775]: Task 'UPID:n7:00003DA0:0E585480:584546F3:qmigrate:312:root@pam:' still active, waiting
Dec 5 11:53:20 n7 pve-ha-lrm[15775]: <root@pam> end task UPID:n7:00003DA0:0E585480:584546F3:qmigrate:312:root@pam: OK
Dec 5 11:54:38 n7 kernel: [2406907.934905] usb 3-1: new full-speed USB device number 3 using uhci_hcd
Dec 5 11:54:38 n7 kernel: [2406908.080372] usb 3-1: New USB device found, idVendor=03f0, idProduct=7029
Dec 5 11:54:38 n7 kernel: [2406908.080375] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Dec 5 11:54:38 n7 kernel: [2406908.080376] usb 3-1: Product: Virtual Keyboard
Dec 5 11:54:38 n7 kernel: [2406908.080378] usb 3-1: Manufacturer: BMC
Dec 5 11:54:38 n7 kernel: [2406908.086659] input: BMC Virtual Keyboard as /devices/pci0000:00/0000:00:1c.2/0000:01:00.4/usb3/3-1/3-1:1.0/0003:03F0:7029.0003/input/input4