Patching PVE 4.3 on one node made hole cluster reboot

stefws

Renowned Member
Jan 29, 2015
302
4
83
Denmark
siimnet.dk
Wanted to roll on last weeks changes to PVE 4.3:

grub-common: 2.02-pve4 ==> 2.02-pve5
grub-efi-amd64-bin: 2.02-pve4 ==> 2.02-pve5
grub-efi-ia32-bin: 2.02-pve4 ==> 2.02-pve5
grub-pc: 2.02-pve4 ==> 2.02-pve5
grub-pc-bin: 2.02-pve4 ==> 2.02-pve5
grub2-common: 2.02-pve4 ==> 2.02-pve5
libpve-common-perl: 4.0-79 ==> 4.0-83
lxc-pve: 2.0.5-1 ==> 2.0.6-1
lxcfs: 2.0.4-pve2 ==> 2.0.5-pve1
openvswitch-common: 2.5.0-1 ==> 2.6.0-2
openvswitch-switch: 2.5.0-1 ==> 2.6.0-2
proxmox-ve: 4.3-71 ==> 4.3-72
pve-cluster: 4.0-46 ==> 4.0-47
pve-container: 1.0-80 ==> 1.0-85
pve-docs: 4.3-12 ==> 4.3-17
pve-ha-manager: 1.0-35 ==> 1.0-38
pve-kernel-4.4.24-1-pve: 4.4.24-72 (new)
pve-manager: 4.3-9 ==> 4.3-12
pve-qemu-kvm: 2.7.0-4 ==> 2.7.0-8
qemu-server: 4.0-92 ==> 4.0-96

so migrated all VMs of first node and ran patch through apt-get upgrade.

SW watchdog then fired a NMI during patching of pve-cluster package and node rebooted, came up fine and we finished it with: dpkg --configure -a and another apt-get upgrade plus a new manual reboot.

At the same time we saw all our other nodes were rebooting and restarting all VMs :( What's F...

Wondering what might have caused this... and how to avoid this from happing again when patching the rest of the nodes. Hints very much appreciated!

This is snip of syslog of one rebooting node:
Dec 5 11:53:09 n2 pmxcfs[4715]: [status] notice: received log
Dec 5 11:53:20 n2 pmxcfs[4715]: [status] notice: received log
Dec 5 11:57:06 n2 pmxcfs[4715]: [dcdb] notice: members: 1/5577, 2/4715, 3/4773, 4/4774, 5/4729, 6/4745
Dec 5 11:57:06 n2 pmxcfs[4715]: [dcdb] notice: starting data syncronisation
Dec 5 11:57:06 n2 pmxcfs[4715]: [status] notice: members: 1/5577, 2/4715, 3/4773, 4/4774, 5/4729, 6/4745
Dec 5 11:57:06 n2 pmxcfs[4715]: [status] notice: starting data syncronisation
Dec 5 11:57:07 n2 pmxcfs[4715]: [status] notice: received sync request (epoch 1/5577/0000000C)
Dec 5 11:57:07 n2 pmxcfs[4715]: [dcdb] notice: received sync request (epoch 1/5577/0000000C)
Dec 5 11:57:08 n2 pmxcfs[4715]: [dcdb] notice: members: 1/5577, 2/4715, 3/4773, 4/4774, 5/4729, 6/4745, 7/21451
Dec 5 11:57:08 n2 pmxcfs[4715]: [dcdb] notice: queue not emtpy - resening 17 messages
Dec 5 11:57:08 n2 pmxcfs[4715]: [status] notice: members: 1/5577, 2/4715, 3/4773, 4/4774, 5/4729, 6/4745, 7/21451
Dec 5 11:57:08 n2 pmxcfs[4715]: [status] notice: queue not emtpy - resening 9 messages
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Dec 5 12:02:32 n2 rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="4520" x-info="http://www.rsyslog.com"] start
Dec 5 12:02:32 n2 systemd-modules-load[612]: Module 'fuse' is builtin
Dec 5 12:02:32 n2 systemd-modules-load[612]: Inserted module '8021q'
Dec 5 12:02:32 n2 systemd-modules-load[612]: Inserted module 'bonding'
Dec 5 12:02:32 n2 systemd-modules-load[612]: Inserted module 'vhost_net'
Dec 5 12:02:32 n2 systemd[1]: Started Load Kernel Modules.

Snip from syslog around NMI fired shortly after last VM 312 is migrated off @11:53:20:
Dec 5 11:53:10 n7 pve-ha-lrm[15775]: Task 'UPID:n7:00003DA0:0E585480:584546F3:qmigrate:312:root@pam:' still active, waiting
Dec 5 11:53:13 n7 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap312i0
Dec 5 11:53:13 n7 kernel: [2406823.648467] fwbr312i1: port 1(tap312i1) entered disabled state
Dec 5 11:53:13 n7 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln312o1
Dec 5 11:53:13 n7 kernel: [2406823.670292] fwbr312i1: port 2(fwln312o1) entered disabled state
Dec 5 11:53:13 n7 kernel: [2406823.670487] device fwln312o1 left promiscuous mode
Dec 5 11:53:13 n7 kernel: [2406823.670494] fwbr312i1: port 2(fwln312o1) entered disabled state
Dec 5 11:53:14 n7 ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap312i1
Dec 5 11:53:14 n7 ovs-vsctl: ovs|00002|db_ctl_base|ERR|no port named tap312i1
Dec 5 11:53:15 n7 pve-ha-lrm[15775]: Task 'UPID:n7:00003DA0:0E585480:584546F3:qmigrate:312:root@pam:' still active, waiting
Dec 5 11:53:16 n7 multipathd: dm-25: remove map (uevent)
Dec 5 11:53:16 n7 multipathd: dm-25: devmap not registered, can't remove
Dec 5 11:53:16 n7 multipathd: dm-25: remove map (uevent)
Dec 5 11:53:18 n7 multipathd: dm-26: remove map (uevent)
Dec 5 11:53:18 n7 multipathd: dm-26: devmap not registered, can't remove
Dec 5 11:53:18 n7 multipathd: dm-26: remove map (uevent)
Dec 5 11:53:19 n7 multipathd: dm-24: remove map (uevent)
Dec 5 11:53:19 n7 multipathd: dm-24: devmap not registered, can't remove
Dec 5 11:53:19 n7 multipathd: dm-24: remove map (uevent)
Dec 5 11:53:20 n7 pve-ha-lrm[15775]: Task 'UPID:n7:00003DA0:0E585480:584546F3:qmigrate:312:root@pam:' still active, waiting
Dec 5 11:53:20 n7 pve-ha-lrm[15775]: <root@pam> end task UPID:n7:00003DA0:0E585480:584546F3:qmigrate:312:root@pam: OK
Dec 5 11:54:38 n7 kernel: [2406907.934905] usb 3-1: new full-speed USB device number 3 using uhci_hcd
Dec 5 11:54:38 n7 kernel: [2406908.080372] usb 3-1: New USB device found, idVendor=03f0, idProduct=7029
Dec 5 11:54:38 n7 kernel: [2406908.080375] usb 3-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Dec 5 11:54:38 n7 kernel: [2406908.080376] usb 3-1: Product: Virtual Keyboard
Dec 5 11:54:38 n7 kernel: [2406908.080378] usb 3-1: Manufacturer: BMC
Dec 5 11:54:38 n7 kernel: [2406908.086659] input: BMC Virtual Keyboard as /devices/pci0000:00/0000:00:1c.2/0000:01:00.4/usb3/3-1/3-1:1.0/0003:03F0:7029.0003/input/input4
 
It fails to migrate a VM on to the patched HN node:

Dec 05 14:46:07 n7 pmxcfs[4754]: [status] notice: received log
Dec 05 14:46:28 n7 pmxcfs[4754]: [status] notice: received log
Dec 05 14:46:28 n7 sshd[12339]: Accepted publickey for root from 10.45.71.1 port 51296 ssh2: RSA c6:64:bc:18:01:11:53:9c:ef:f4:08:00:b7:c0:85:69
Dec 05 14:46:28 n7 sshd[12339]: pam_unix(sshd:session): session opened for user root by (uid=0)
Dec 05 14:46:28 n7 sshd[12339]: Received disconnect from 10.45.71.1: 11: disconnected by user
Dec 05 14:46:28 n7 sshd[12339]: pam_unix(sshd:session): session closed for user root
Dec 05 14:46:28 n7 sshd[12342]: Accepted publickey for root from 10.45.71.1 port 51298 ssh2: RSA c6:64:bc:18:01:11:53:9c:ef:f4:08:00:b7:c0:85:69
Dec 05 14:46:28 n7 sshd[12342]: pam_unix(sshd:session): session opened for user root by (uid=0)
Dec 05 14:46:28 n7 qm[12344]: <root@pam> starting task UPID:n7:00003039:000E7A3F:58456FB4:qmstart:203:root@pam:
Dec 05 14:46:28 n7 qm[12345]: start VM 203: UPID:n7:00003039:000E7A3F:58456FB4:qmstart:203:root@pam:
Dec 05 14:46:33 n7 systemd[1]: Starting qemu.slice.
Dec 05 14:46:33 n7 systemd[1]: Created slice qemu.slice.
Dec 05 14:46:33 n7 systemd[1]: Starting 203.scope.
Dec 05 14:46:33 n7 systemd[1]: Started 203.scope.
Dec 05 14:46:33 n7 systemd-sysctl[12384]: Line is not an assignment in file '/etc/sysctl.d/99-sysctl.conf': (null)
Dec 05 14:46:33 n7 systemd-sysctl[12384]: Overwriting earlier assignment of net/ipv4/conf/all/rp_filter in file '/usr/lib/sysctl.d/pve-firewall.conf'.
Dec 05 14:46:34 n7 kernel: device tap203i0 entered promiscuous mode
Dec 05 14:46:34 n7 ovs-vsctl[12388]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap203i0
Dec 05 14:46:34 n7 ovs-vsctl[12388]: ovs|00002|db_ctl_base|ERR|no port named tap203i0
Dec 05 14:46:34 n7 ovs-vsctl[12389]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl add-port vmbr1 tap203i0 tag=30
Dec 05 14:46:34 n7 systemd-sysctl[12405]: Line is not an assignment in file '/etc/sysctl.d/99-sysctl.conf': (null)
Dec 05 14:46:34 n7 systemd-sysctl[12405]: Overwriting earlier assignment of net/ipv4/conf/all/rp_filter in file '/usr/lib/sysctl.d/pve-firewall.conf'.
Dec 05 14:46:34 n7 kernel: device tap203i1 entered promiscuous mode
Dec 05 14:46:34 n7 ovs-vsctl[12408]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap203i1
Dec 05 14:46:34 n7 ovs-vsctl[12408]: ovs|00002|db_ctl_base|ERR|no port named tap203i1
Dec 05 14:46:34 n7 systemd-sysctl[12414]: Line is not an assignment in file '/etc/sysctl.d/99-sysctl.conf': (null)
Dec 05 14:46:34 n7 systemd-sysctl[12414]: Overwriting earlier assignment of net/ipv4/conf/all/rp_filter in file '/usr/lib/sysctl.d/pve-firewall.conf'.
Dec 05 14:46:34 n7 kernel: fwbr203i1: port 1(tap203i1) entered forwarding state
Dec 05 14:46:34 n7 kernel: fwbr203i1: port 1(tap203i1) entered forwarding state
Dec 05 14:46:34 n7 ovs-vsctl[12419]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl add-port vmbr1 fwln203o1 tag=41 -- set Interface fwln203o1 type=internal
Dec 05 14:46:34 n7 systemd-sysctl[12427]: Line is not an assignment in file '/etc/sysctl.d/99-sysctl.conf': (null)
Dec 05 14:46:34 n7 systemd-sysctl[12427]: Overwriting earlier assignment of net/ipv4/conf/all/rp_filter in file '/usr/lib/sysctl.d/pve-firewall.conf'.
Dec 05 14:46:34 n7 kernel: device fwln203o1 entered promiscuous mode
Dec 05 14:46:34 n7 kernel: fwbr203i1: port 2(fwln203o1) entered forwarding state
Dec 05 14:46:34 n7 kernel: fwbr203i1: port 2(fwln203o1) entered forwarding state
Dec 05 14:46:34 n7 systemd-sysctl[12445]: Line is not an assignment in file '/etc/sysctl.d/99-sysctl.conf': (null)
Dec 05 14:46:34 n7 systemd-sysctl[12445]: Overwriting earlier assignment of net/ipv4/conf/all/rp_filter in file '/usr/lib/sysctl.d/pve-firewall.conf'.
Dec 05 14:46:34 n7 kernel: device tap203i2 entered promiscuous mode
Dec 05 14:46:34 n7 ovs-vsctl[12448]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap203i2
Dec 05 14:46:34 n7 ovs-vsctl[12448]: ovs|00002|db_ctl_base|ERR|no port named tap203i2
Dec 05 14:46:34 n7 ovs-vsctl[12449]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl add-port vmbr1 tap203i2 tag=32
Dec 05 14:46:35 n7 qm[12344]: <root@pam> end task UPID:n7:00003039:000E7A3F:58456FB4:qmstart:203:root@pam: OK
Dec 05 14:46:35 n7 sshd[12342]: Received disconnect from 10.45.71.1: 11: disconnected by user
Dec 05 14:46:35 n7 sshd[12342]: pam_unix(sshd:session): session closed for user root
Dec 05 14:46:35 n7 pmxcfs[4754]: [status] notice: received log
Dec 05 14:46:37 n7 sshd[12480]: Accepted publickey for root from 10.45.71.1 port 51308 ssh2: RSA c6:64:bc:18:01:11:53:9c:ef:f4:08:00:b7:c0:85:69
Dec 05 14:46:37 n7 sshd[12480]: pam_unix(sshd:session): session opened for user root by (uid=0)
Dec 05 14:46:37 n7 qm[12482]: <root@pam> starting task UPID:n7:000030C3:000E7DB4:58456FBD:qmstop:203:root@pam:
Dec 05 14:46:37 n7 qm[12483]: stop VM 203: UPID:n7:000030C3:000E7DB4:58456FBD:qmstop:203:root@pam:
Dec 05 14:46:37 n7 ovs-vsctl[12494]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap203i0
Dec 05 14:46:37 n7 kernel: fwbr203i1: port 1(tap203i1) entered disabled state
Dec 05 14:46:37 n7 ovs-vsctl[12506]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln203o1
Dec 05 14:46:37 n7 kernel: fwbr203i1: port 2(fwln203o1) entered disabled state
Dec 05 14:46:37 n7 kernel: device fwln203o1 left promiscuous mode
Dec 05 14:46:37 n7 kernel: fwbr203i1: port 2(fwln203o1) entered disabled state
Dec 05 14:46:38 n7 ovs-vsctl[12513]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap203i1
Dec 05 14:46:38 n7 ovs-vsctl[12513]: ovs|00002|db_ctl_base|ERR|no port named tap203i1
Dec 05 14:46:38 n7 ovs-vsctl[12517]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap203i2
Dec 05 14:46:39 n7 multipathd[4576]: dm-15: remove map (uevent)
Dec 05 14:46:39 n7 multipathd[4576]: dm-15: devmap not registered, can't remove
Dec 05 14:46:39 n7 multipathd[4576]: dm-15: remove map (uevent)
Dec 05 14:46:39 n7 qm[12482]: <root@pam> end task UPID:n7:000030C3:000E7DB4:58456FBD:qmstop:203:root@pam: OK
Dec 05 14:46:39 n7 sshd[12480]: Received disconnect from 10.45.71.1: 11: disconnected by user
Dec 05 14:46:39 n7 sshd[12480]: pam_unix(sshd:session): session closed for user root
Dec 05 14:46:39 n7 pmxcfs[4754]: [status] notice: received log
 
Any clues as to why all our other hypervisor nodes rebooted at the same time (rather critical thing to happen :(), while we were patching one node (that got a watchdog NMI and thus rebooted during unpacking of pve-cluster, properly due to some networking issue because of openvswitch 2.6...)?
 
there seems to be an issue on some OVS setups that we could not reproduce so far which leads to the restart on upgrade taking way too long. if your cluster network is on OVS, this can lead to a timeout and watchdog trigger on the upgrading node. normally this should not affect other hosts as long as the rest of the cluster is still quorate. could you post more complete logs of at least the upgrading and one of the other nodes?
 
What more log than given initially do you prefer to see?

We've set up two corosync rings to try and avoid issues with loosing quorate when one network might fail, ring 0 through OVS on bonded 10Gbs and a ring 1 through bonded 1Gbs.

root@n1:~# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 10.45.71.1
status = ring 0 active with no faults
RING ID 1
id = <public ip>
status = ring 1 active with no faults

root@n1:~# corosync-quorumtool status
Quorum information
------------------
Date: Tue Dec 6 08:57:41 2016
Quorum provider: corosync_votequorum
Nodes: 7
Node ID: 1
Ring ID: 1/8340
Quorate: Yes

Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 7
Quorum: 4
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
1 1 n1 (local)
2 1 n2
3 1 n3
4 1 n4
5 1 n5
6 1 n6
7 1 n7

root@n1:~# more /etc/pve/corosync.conf
logging {
debug: off
syslog_facility: local7
to_syslog: yes
}

nodelist {
node {
nodeid: 1
quorum_votes: 1
ring1_addr: n1.fqdn
ring0_addr: n1
}

node {
nodeid: 2
quorum_votes: 1
ring1_addr: n2.fqdn
ring0_addr: n2
}

node {
nodeid: 3
quorum_votes: 1
ring1_addr: n3.fqdn
ring0_addr: n3
}

node {
nodeid: 4
quorum_votes: 1
ring1_addr: n4.fqdn
ring0_addr: n4
}

node {
nodeid: 5
quorum_votes: 1
ring1_addr: n5.fqdn
ring0_addr: n5
}

node {
nodeid: 6
quorum_votes: 1
ring1_addr: n6.fqdn
ring0_addr: n6
}

node {
nodeid: 7
quorum_votes: 1
ring1_addr: n7.fqdn
ring0_addr: n7
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: pve-clst
config_version: 19
ip_version: ipv4
secauth: off
rrp_mode: active
version: 2
interface {
bindnetaddr: <public ip lan>
ringnumber: 1
broadcast: yes
mcastport: 5405
transport: udp
netmtu: 9000
}
interface {
bindnetaddr: 10.45.71.0
ringnumber: 0
broadcast: yes
mcastport: 5407
transport: udp
netmtu: 9000
}
}
 
Turned out the other HN nodes all got a SW watchdog NMI as openvswitch patching somehow also breaks OVS network on other nodes for too long (+60 sec) and one of our dual corosync rings is via the OVS network and same time we used active rrp_mode (corosync.conf: rrp_mode: active). http://lists.corosync.org/pipermail/discuss/2015-July/003609.html seem to say about RRP active:

"Only (but big) problem with active is that active doesn't make progress if one of link fails until link is marked as failed. Passive always makes progress (= works) even link is failed and failure is not yet detected."

Hence HA cluster lost quorate and rebooted :(

Reverted our corosync rings to rrp_mode: passive to avoid future similar loss of quorate.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!