dist-upgrade / reboot / watchdog

grin

Renowned Member
Dec 8, 2008
159
16
83
Hungary
grin.hu
As we have already talked about reboots, here's one fresh. From 4.4-5 to 4.4-13. Reboot is at the end.

Apr 12 14:55:47 srv-01-szd systemd[1]: Stopped Corosync Cluster Engine.
Apr 12 14:55:47 srv-01-szd systemd[1]: Starting Corosync Cluster Engine...
Apr 12 14:55:47 srv-01-szd corosync[26358]: [MAIN ] Corosync Cluster Engine ('2.4.2'): started and ready to provide service.
Apr 12 14:55:47 srv-01-szd corosync[26358]: [MAIN ] Corosync built-in features: augeas systemd pie relro bindnow
Apr 12 14:55:47 srv-01-szd corosync[26359]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Apr 12 14:55:47 srv-01-szd corosync[26359]: [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Apr 12 14:55:47 srv-01-szd corosync[26359]: [TOTEM ] The network interface [10.63.1.211] is now up.
Apr 12 14:55:47 srv-01-szd corosync[26359]: [SERV ] Service engine loaded: corosync configuration map access [0]
Apr 12 14:55:47 srv-01-szd corosync[26359]: [QB ] server name: cmap
Apr 12 14:55:47 srv-01-szd corosync[26359]: [SERV ] Service engine loaded: corosync configuration service [1]
Apr 12 14:55:47 srv-01-szd corosync[26359]: [QB ] server name: cfg
Apr 12 14:55:47 srv-01-szd corosync[26359]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Apr 12 14:55:47 srv-01-szd corosync[26359]: [QB ] server name: cpg
Apr 12 14:55:47 srv-01-szd corosync[26359]: [SERV ] Service engine loaded: corosync profile loading service [4]
Apr 12 14:55:47 srv-01-szd corosync[26359]: [QUORUM] Using quorum provider corosync_votequorum
Apr 12 14:55:47 srv-01-szd corosync[26359]: [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
Apr 12 14:55:47 srv-01-szd corosync[26359]: [QB ] server name: votequorum
Apr 12 14:55:47 srv-01-szd corosync[26359]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Apr 12 14:55:47 srv-01-szd corosync[26359]: [QB ] server name: quorum
Apr 12 14:55:48 srv-01-szd corosync[26359]: [TOTEM ] A new membership (10.63.1.211:308) was formed. Members joined: 1 2 3 4
Apr 12 14:55:48 srv-01-szd corosync[26359]: [QUORUM] This node is within the primary component and will provide service.
Apr 12 14:55:48 srv-01-szd corosync[26359]: [QUORUM] Members[4]: 1 2 3 4
Apr 12 14:55:48 srv-01-szd corosync[26359]: [MAIN ] Completed service synchronization, ready to provide service.
Apr 12 14:55:48 srv-01-szd corosync[26352]: Starting Corosync Cluster Engine (corosync): [ OK ]
Apr 12 14:55:48 srv-01-szd systemd[1]: Started Corosync Cluster Engine.
Apr 12 14:55:48 srv-01-szd systemd[1]: Reloading PVE API Daemon.
Apr 12 14:55:50 srv-01-szd pvedaemon[26381]: send HUP to 3437
Apr 12 14:55:50 srv-01-szd pvedaemon[3437]: received signal HUP
Apr 12 14:55:50 srv-01-szd pvedaemon[3437]: server closing
Apr 12 14:55:50 srv-01-szd pvedaemon[3437]: server shutdown (restart)
Apr 12 14:55:50 srv-01-szd pvedaemon[15325]: worker exit
Apr 12 14:55:50 srv-01-szd pvedaemon[15327]: worker exit
Apr 12 14:55:50 srv-01-szd systemd[1]: Reloaded PVE API Daemon.
Apr 12 14:55:50 srv-01-szd systemd[1]: Reloading PVE Status Daemon.
Apr 12 14:55:50 srv-01-szd pvestatd[26389]: send HUP to 3412
Apr 12 14:55:50 srv-01-szd pvestatd[3412]: received signal HUP
Apr 12 14:55:50 srv-01-szd pvestatd[3412]: server shutdown (restart)
Apr 12 14:55:50 srv-01-szd systemd[1]: Reloaded PVE Status Daemon.
Apr 12 14:55:50 srv-01-szd systemd[1]: Reloading PVE API Proxy Server.
Apr 12 14:55:51 srv-01-szd pvedaemon[26385]: worker exit
Apr 12 14:55:51 srv-01-szd pvedaemon[3437]: restarting server
Apr 12 14:55:51 srv-01-szd pvedaemon[3437]: worker 15326 finished
Apr 12 14:55:51 srv-01-szd pvedaemon[3437]: worker 15327 finished
Apr 12 14:55:51 srv-01-szd pvedaemon[3437]: worker 15325 finished
Apr 12 14:55:51 srv-01-szd pvedaemon[3437]: starting 3 worker(s)
Apr 12 14:55:51 srv-01-szd pvedaemon[3437]: worker 26398 started
Apr 12 14:55:51 srv-01-szd pvedaemon[3437]: worker 26399 started
Apr 12 14:55:51 srv-01-szd pvedaemon[3437]: worker 26400 started

Apr 12 14:55:51 srv-01-szd watchdog-mux[1605]: client watchdog expired - disable watchdog updates
Apr 12 14:55:51 srv-01-szd pvestatd[3412]: restarting server
Apr 12 14:55:51 srv-01-szd pveproxy[26394]: send HUP to 40759
Apr 12 14:55:51 srv-01-szd pveproxy[40759]: received signal HUP
Apr 12 14:55:51 srv-01-szd pveproxy[40759]: server closing
Apr 12 14:55:51 srv-01-szd pveproxy[40759]: server shutdown (restart)
Apr 12 14:55:51 srv-01-szd pveproxy[15355]: worker exit
Apr 12 14:55:51 srv-01-szd pveproxy[15353]: worker exit
Apr 12 14:55:51 srv-01-szd systemd[1]: Reloaded PVE API Proxy Server.
Apr 12 14:55:51 srv-01-szd systemd[1]: Reloading PVE SPICE Proxy Server.
Apr 12 14:55:52 srv-01-szd pveproxy[40759]: Using '/etc/pve/local/pveproxy-ssl.pem' as certificate for the web interface.
Apr 12 14:55:52 srv-01-szd pveproxy[40759]: restarting server
Apr 12 14:55:52 srv-01-szd pveproxy[40759]: worker 15355 finished
Apr 12 14:55:52 srv-01-szd pveproxy[40759]: worker 15353 finished
Apr 12 14:55:52 srv-01-szd pveproxy[40759]: worker 15354 finished
Apr 12 14:55:52 srv-01-szd pveproxy[40759]: starting 3 worker(s)
Apr 12 14:55:52 srv-01-szd spiceproxy[26409]: send HUP to 40788
Apr 12 14:55:52 srv-01-szd spiceproxy[40788]: received signal HUP
Apr 12 14:55:52 srv-01-szd spiceproxy[40788]: server closing
Apr 12 14:55:52 srv-01-szd spiceproxy[40788]: server shutdown (restart)
Apr 12 14:55:52 srv-01-szd pveproxy[40759]: worker 26418 started
Apr 12 14:55:52 srv-01-szd spiceproxy[16296]: worker exit
Apr 12 14:55:52 srv-01-szd pveproxy[40759]: worker 26419 started
Apr 12 14:55:52 srv-01-szd pveproxy[40759]: worker 26420 started
Apr 12 14:55:52 srv-01-szd systemd[1]: Reloaded PVE SPICE Proxy Server.
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [status] notice: update cluster info (cluster name nmscluster, version = 4)
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: start cluster connection
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [status] notice: start cluster connection
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [status] notice: node has quorum
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: members: 1/6099, 2/3032, 3/25030, 4/3147
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: starting data syncronisation
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: received sync request (epoch 1/6099/00000003)
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [status] notice: members: 1/6099, 2/3032, 3/25030, 4/3147
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [status] notice: starting data syncronisation
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [status] notice: received sync request (epoch 1/6099/00000003)
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: received all states
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: leader is 2/3032
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: synced members: 2/3032, 3/25030, 4/3147
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: waiting for updates from leader
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [status] notice: received all states
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [status] notice: all data is up to date
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: update complete - trying to commit (got 10 inode updates)
Apr 12 14:55:53 srv-01-szd pmxcfs[6099]: [dcdb] notice: all data is up to date
Apr 12 14:55:53 srv-01-szd spiceproxy[40788]: restarting server
Apr 12 14:55:53 srv-01-szd spiceproxy[40788]: worker 16296 finished
Apr 12 14:55:53 srv-01-szd spiceproxy[40788]: starting 1 worker(s)
Apr 12 14:55:53 srv-01-szd spiceproxy[40788]: worker 26962 started
Apr 12 14:55:53 srv-01-szd pveproxy[26405]: worker exit
Apr 12 14:55:55 srv-01-szd pve-ha-lrm[7431]: successfully acquired lock 'ha_agent_srv-01-szd_lock'
Apr 12 14:55:55 srv-01-szd pve-ha-lrm[7431]: status change lost_agent_lock => active
Apr 12 14:55:55 srv-01-szd watchdog-mux[1605]: exit watchdog-mux with active connections

Apr 12 14:55:55 srv-01-szd kernel: [94765.098998] watchdog watchdog0: watchdog did not stop!
Apr 12 14:56:01 srv-01-szd cron[3043]: (*system*zfsutils-linux) RELOAD (/etc/cron.d/zfsutils-linux)
Apr 12 14:56:01 srv-01-szd pve-ha-crm[7657]: status change wait_for_quorum => slave
REBOOT


This is clearly not optimal.

As a sidenote we upgrade the system in three steps, to prevent the cluster from falling apart, but it seems more granularity is needed... *sigh*
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!