Fabian,
I manually migrated vm100 to node4 at 10:25,
Sadly, after 2 hours node1 restarted for a unknown reason...
how to solve this ? i have definitely no glue
root@pve03:~# journalctl --since "2017-06-12" --until "2017-06-13" -u 'pve-ha-*'
-- Logs begin at Tue 2017-06-06 21:07:08 CEST, end at Mon 2017-06-12 15:22:17 CEST. --
Jun 12 10:00:17 pve03 pve-ha-crm[2155]: successfully acquired lock 'ha_manager_lock'
Jun 12 10:00:17 pve03 pve-ha-crm[2155]: watchdog active
Jun 12 10:00:17 pve03 pve-ha-crm[2155]: status change wait_for_quorum => master
Jun 12 10:00:17 pve03 pve-ha-crm[2155]: adding new service 'vm:100' on node 'pve01'
Jun 12 10:24:07 pve03 pve-ha-crm[2155]: got crm command: migrate vm:100 pve04
Jun 12 10:24:07 pve03 pve-ha-crm[2155]: migrate service 'vm:100' to node 'pve04'
Jun 12 10:24:07 pve03 pve-ha-crm[2155]: service 'vm:100': state changed from 'started' to 'migrate' (node = pve01, target = pve04)
Jun 12 10:25:27 pve03 pve-ha-crm[2155]: service 'vm:100': state changed from 'migrate' to 'started' (node = pve04)
Jun 12 12:13:18 pve03 pve-ha-crm[2155]: node 'pve01': state changed from 'online' => 'unknown'
Jun 12 12:18:48 pve03 pve-ha-crm[2155]: node 'pve01': state changed from 'unknown' => 'online'
root@pve01:~# journalctl --since "2017-06-12" --until "2017-06-13" -u 'pve*'
-- Logs begin at Mon 2017-06-12 12:18:33 CEST, end at Mon 2017-06-12 15:25:01 CEST. --
Jun 12 12:18:34 pve01 systemd[1]: Starting Proxmox VE firewall logger...
Jun 12 12:18:34 pve01 systemd[1]: Starting Proxmox VE Login Banner...
Jun 12 12:18:34 pve01 systemd[1]: Starting Commit Proxmox VE network changes...
Jun 12 12:18:34 pve01 mv[1364]: /bin/mv: cannot stat â/etc/network/interfaces.newâ: No such file or directory
Jun 12 12:18:34 pve01 systemd[1]: Started Commit Proxmox VE network changes.
Jun 12 12:18:34 pve01 pvepw-logger[1381]: starting pvefw logger
Jun 12 12:18:34 pve01 systemd[1]: Started Proxmox VE firewall logger.
Jun 12 12:18:35 pve01 systemd[1]: Started Proxmox VE Login Banner.
Jun 12 12:18:39 pve01 systemd[1]: Starting The Proxmox VE cluster filesystem...
Jun 12 12:18:39 pve01 pmxcfs[2004]: [quorum] crit: quorum_initialize failed: 2
Jun 12 12:18:39 pve01 pmxcfs[2004]: [quorum] crit: can't initialize service
Jun 12 12:18:39 pve01 pmxcfs[2004]: [confdb] crit: cmap_initialize failed: 2
Jun 12 12:18:39 pve01 pmxcfs[2004]: [confdb] crit: can't initialize service
Jun 12 12:18:39 pve01 pmxcfs[2004]: [dcdb] crit: cpg_initialize failed: 2
Jun 12 12:18:39 pve01 pmxcfs[2004]: [dcdb] crit: can't initialize service
Jun 12 12:18:39 pve01 pmxcfs[2004]: [status] crit: cpg_initialize failed: 2
Jun 12 12:18:39 pve01 pmxcfs[2004]: [status] crit: can't initialize service
Jun 12 12:18:40 pve01 systemd[1]: Started The Proxmox VE cluster filesystem.
Jun 12 12:18:40 pve01 systemd[1]: Starting PVE Status Daemon...
Jun 12 12:18:40 pve01 systemd[1]: Starting Proxmox VE firewall...
Jun 12 12:18:40 pve01 pve-firewall[2234]: starting server
Jun 12 12:18:40 pve01 systemd[1]: Started Proxmox VE firewall.
Jun 12 12:18:40 pve01 pvestatd[2236]: starting server
Jun 12 12:18:40 pve01 systemd[1]: Started PVE Status Daemon.
Jun 12 12:18:40 pve01 systemd[1]: Starting PVE API Daemon...
Jun 12 12:18:41 pve01 pvedaemon[2494]: starting server
Jun 12 12:18:41 pve01 pvedaemon[2494]: starting 3 worker(s)
Jun 12 12:18:41 pve01 pvedaemon[2494]: worker 2501 started
Jun 12 12:18:41 pve01 pvedaemon[2494]: worker 2502 started
Jun 12 12:18:41 pve01 pvedaemon[2494]: worker 2503 started
Jun 12 12:18:41 pve01 systemd[1]: Started PVE API Daemon.
Jun 12 12:18:41 pve01 systemd[1]: Starting PVE Cluster Ressource Manager Daemon...
Jun 12 12:18:41 pve01 pve-ha-crm[2818]: starting server
Jun 12 12:18:41 pve01 pve-ha-crm[2818]: status change startup => wait_for_quorum
Jun 12 12:18:41 pve01 systemd[1]: Started PVE Cluster Ressource Manager Daemon.
Jun 12 12:18:41 pve01 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
Jun 12 12:18:42 pve01 pve-ha-lrm[3023]: starting server
Jun 12 12:18:42 pve01 pve-ha-lrm[3023]: status change startup => wait_for_agent_lock
Jun 12 12:18:42 pve01 systemd[1]: Started PVE Local HA Ressource Manager Daemon.
Jun 12 12:18:43 pve01 systemd[1]: Starting PVE API Proxy Server...
Jun 12 12:18:43 pve01 pveproxy[4581]: starting server
Jun 12 12:18:43 pve01 pveproxy[4581]: starting 3 worker(s)
Jun 12 12:18:43 pve01 pveproxy[4581]: worker 4582 started
Jun 12 12:18:43 pve01 pveproxy[4581]: worker 4583 started
Jun 12 12:18:43 pve01 pveproxy[4581]: worker 4584 started
Jun 12 12:18:43 pve01 systemd[1]: Started PVE API Proxy Server.
Jun 12 12:18:44 pve01 systemd[1]: Starting PVE VM Manager...
Jun 12 12:18:44 pve01 pve-manager[4701]: <root@pam> starting task UPID
ve01:00001349:00000969:593E6A84:startall::root@pam:
Jun 12 12:18:45 pve01 pmxcfs[2004]: [status] notice: update cluster info (cluster name net4sec1, version = 4)
Jun 12 12:18:45 pve01 pmxcfs[2004]: [status] notice: node has quorum
Jun 12 12:18:45 pve01 pmxcfs[2004]: [dcdb] notice: members: 1/2004, 2/1901, 3/1872, 4/1873
Jun 12 12:18:45 pve01 pmxcfs[2004]: [dcdb] notice: starting data syncronisation
Jun 12 12:18:45 pve01 pmxcfs[2004]: [status] notice: members: 1/2004, 2/1901, 3/1872, 4/1873
Jun 12 12:18:45 pve01 pmxcfs[2004]: [status] notice: starting data syncronisation
Jun 12 12:18:45 pve01 pmxcfs[2004]: [dcdb] notice: received sync request (epoch 1/2004/00000001)
Jun 12 12:18:45 pve01 pmxcfs[2004]: [status] notice: received sync request (epoch 1/2004/00000001)
Jun 12 12:18:45 pve01 pmxcfs[2004]: [dcdb] notice: received all states
Jun 12 12:18:45 pve01 pmxcfs[2004]: [dcdb] notice: leader is 2/1901
Jun 12 12:18:45 pve01 pmxcfs[2004]: [dcdb] notice: synced members: 2/1901, 3/1872, 4/1873
Jun 12 12:18:45 pve01 pmxcfs[2004]: [dcdb] notice: waiting for updates from leader
Jun 12 12:18:45 pve01 pmxcfs[2004]: [dcdb] notice: update complete - trying to commit (got 8 inode updates)
Jun 12 12:18:45 pve01 pmxcfs[2004]: [dcdb] notice: all data is up to date
Jun 12 12:18:45 pve01 pmxcfs[2004]: [status] notice: received all states
Jun 12 12:18:45 pve01 pmxcfs[2004]: [status] notice: all data is up to date
Jun 12 12:18:46 pve01 pve-manager[4701]: <root@pam> end task UPID
ve01:00001349:00000969:593E6A84:startall::root@pam: OK
Jun 12 12:18:46 pve01 systemd[1]: Started PVE VM Manager.
Jun 12 12:18:51 pve01 pve-ha-crm[2818]: status change wait_for_quorum => slave
Jun 12 12:19:03 pve01 pvedaemon[2501]: <root@pam> successful auth for user 'root@pam'
Jun 12 12:33:57 pve01 pvedaemon[2503]: <root@pam> successful auth for user 'root@pam'
Jun 12 12:48:58 pve01 pvedaemon[2502]: <root@pam> successful auth for user 'root@pam'
Jun 12 13:03:59 pve01 pvedaemon[2502]: <root@pam> successful auth for user 'root@pam'
Jun 12 13:15:10 pve01 pveproxy[4584]: worker exit
Jun 12 13:15:10 pve01 pveproxy[4581]: worker 4584 finished
Jun 12 13:15:10 pve01 pveproxy[4581]: starting 1 worker(s)
Jun 12 13:15:10 pve01 pveproxy[4581]: worker 30147 started
Jun 12 13:18:39 pve01 pmxcfs[2004]: [dcdb] notice: data verification successful
Jun 12 13:18:59 pve01 pvedaemon[2502]: <root@pam> successful auth for user 'root@pam'
Jun 12 13:29:14 pve01 pveproxy[4583]: worker exit
Jun 12 13:29:14 pve01 pveproxy[4581]: worker 4583 finished
Jun 12 13:29:14 pve01 pveproxy[4581]: starting 1 worker(s)
Jun 12 13:29:14 pve01 pveproxy[4581]: worker 36718 started
Jun 12 13:34:00 pve01 pvedaemon[2501]: <root@pam> successful auth for user 'root@pam'
Jun 12 13:49:01 pve01 pvedaemon[2501]: <root@pam> successful auth for user 'root@pam'
Jun 12 14:04:01 pve01 pvedaemon[2502]: <root@pam> successful auth for user 'root@pam'
Jun 12 14:13:15 pve01 pveproxy[36718]: worker exit
Jun 12 14:13:15 pve01 pveproxy[4581]: worker 36718 finished
Jun 12 14:13:15 pve01 pveproxy[4581]: starting 1 worker(s)
Jun 12 14:13:15 pve01 pveproxy[4581]: worker 55918 started
Jun 12 14:18:39 pve01 pmxcfs[2004]: [dcdb] notice: data verification successful
Jun 12 14:19:01 pve01 pvedaemon[2501]: <root@pam> successful auth for user 'root@pam'
Jun 12 14:34:02 pve01 pvedaemon[2501]: <root@pam> successful auth for user 'root@pam'
Jun 12 14:44:01 pve01 pvedaemon[2503]: worker exit
Jun 12 14:44:01 pve01 pvedaemon[2494]: worker 2503 finished
Jun 12 14:44:01 pve01 pvedaemon[2494]: starting 1 worker(s)
Jun 12 14:44:01 pve01 pvedaemon[2494]: worker 14814 started
Jun 12 14:49:03 pve01 pvedaemon[2502]: <root@pam> successful auth for user 'root@pam'
Jun 12 14:49:50 pve01 pveproxy[4582]: worker exit
Jun 12 14:49:50 pve01 pveproxy[4581]: worker 4582 finished
Jun 12 14:49:50 pve01 pveproxy[4581]: starting 1 worker(s)
Jun 12 14:49:50 pve01 pveproxy[4581]: worker 17357 started
Jun 12 14:57:20 pve01 pvedaemon[2502]: <root@pam> starting task UPID
ve01:000050D1:000E8E97:593E8FB0:vzstart:101:root@pam:
Jun 12 14:57:20 pve01 pvedaemon[20689]: starting CT 101: UPID
ve01:000050D1:000E8E97:593E8FB0:vzstart:101:root@pam:
Jun 12 14:57:51 pve01 pveproxy[17357]: proxy detected vanished client connection
Jun 12 14:58:03 pve01 pvedaemon[2502]: <root@pam> end task UPID
ve01:000050D1:000E8E97:593E8FB0:vzstart:101:root@pam: OK
Jun 12 14:58:04 pve01 pvestatd[2236]: status update time (33.835 seconds)
Jun 12 15:04:04 pve01 pvedaemon[14814]: <root@pam> successful auth for user 'root@pam'
Jun 12 15:16:23 pve01 pvedaemon[2502]: worker exit
Jun 12 15:16:24 pve01 pvedaemon[2494]: worker 2502 finished
Jun 12 15:16:24 pve01 pvedaemon[2494]: starting 1 worker(s)
Jun 12 15:16:24 pve01 pvedaemon[2494]: worker 28125 started
Jun 12 15:18:01 pve01 pveproxy[30147]: worker exit
Jun 12 15:18:01 pve01 pveproxy[4581]: worker 30147 finished
Jun 12 15:18:01 pve01 pveproxy[4581]: starting 1 worker(s)
Jun 12 15:18:01 pve01 pveproxy[4581]: worker 28708 started
Jun 12 15:18:39 pve01 pmxcfs[2004]: [dcdb] notice: data verification successful
Jun 12 15:19:03 pve01 pvedaemon[2501]: <root@pam> successful auth for user 'root@pam'
Jun 12 15:21:10 pve01 pveproxy[55918]: worker exit
Jun 12 15:21:10 pve01 pveproxy[4581]: worker 55918 finished
Jun 12 15:21:10 pve01 pveproxy[4581]: starting 1 worker(s)
Jun 12 15:21:10 pve01 pveproxy[4581]: worker 29835 started