I have a 3 node cluster and I do some HA between 2 of the nodes. I have dual corosync networks as recommended. So, I would think that with dual networks, I could at any given time unplug one of the networks (say to move to a different switch) and my cluster would stay solid. But no... one of my nodes is made to reboot (from fencing maybe???). Is this the expected behavior?
corosync.conf
syslog
corosync.conf
Code:
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: proxmox1
nodeid: 1
quorum_votes: 1
ring0_addr: 10.1.0.1
ring1_addr: 10.2.0.1
}
node {
name: proxmox2
nodeid: 2
quorum_votes: 1
ring0_addr: 10.1.0.2
ring1_addr: 10.2.0.2
}
node {
name: proxmox3
nodeid: 3
quorum_votes: 1
ring0_addr: 10.1.0.3
ring1_addr: 10.2.0.3
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: SWVC
config_version: 9
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 10.1.0.0
ringnumber: 0
}
interface {
bindnetaddr: 10.2.0.0
ringnumber: 1
}
}
syslog
Code:
Aug 02 13:08:35 proxmox2 corosync[1346]: [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]: [KNET ] link: host: 1 link: 0 is down
Aug 02 13:08:35 proxmox2 corosync[1346]: [KNET ] link: host: 1 link: 1 is down
Aug 02 13:08:35 proxmox2 corosync[1346]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]: [KNET ] host: host: 1 has no active links
Aug 02 13:08:35 proxmox2 corosync[1346]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]: [KNET ] host: host: 1 has no active links
Aug 02 13:08:36 proxmox2 kernel: e1000e 0000:04:00.0 eno1: NIC Link is Down
Aug 02 13:08:37 proxmox2 corosync[1346]: [KNET ] link: host: 3 link: 0 is down
Aug 02 13:08:37 proxmox2 corosync[1346]: [KNET ] link: host: 3 link: 1 is down
Aug 02 13:08:37 proxmox2 corosync[1346]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]: [KNET ] host: host: 3 has no active links
Aug 02 13:08:37 proxmox2 corosync[1346]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]: [KNET ] host: host: 3 has no active links
Aug 02 13:08:38 proxmox2 corosync[1346]: [TOTEM ] Token has not been received in 2737 ms
Aug 02 13:08:38 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:08:38 proxmox2 corosync[1346]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 02 13:08:43 proxmox2 corosync[1346]: [QUORUM] Sync members[1]: 2
Aug 02 13:08:43 proxmox2 corosync[1346]: [QUORUM] Sync left[2]: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]: [TOTEM ] A new membership (2.1340) was formed. Members left: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]: [TOTEM ] Failed to receive the leave message. failed: 1 3
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [dcdb] notice: members: 2/1291
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [status] notice: members: 2/1291
Aug 02 13:08:43 proxmox2 corosync[1346]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 02 13:08:43 proxmox2 corosync[1346]: [QUORUM] Members[1]: 2
Aug 02 13:08:43 proxmox2 corosync[1346]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [status] notice: node lost quorum
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [dcdb] crit: received write while not quorate - trigger resync
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [dcdb] crit: leaving CPG group
Aug 02 13:08:44 proxmox2 pmxcfs[1291]: [dcdb] notice: start cluster connection
Aug 02 13:08:44 proxmox2 pmxcfs[1291]: [dcdb] crit: cpg_join failed: 14
Aug 02 13:08:44 proxmox2 pmxcfs[1291]: [dcdb] crit: can't initialize service
Aug 02 13:08:44 proxmox2 pve-ha-crm[1537]: status change slave => wait_for_quorum
Aug 02 13:08:44 proxmox2 pve-ha-lrm[1599]: unable to write lrm status file - unable to open file '/etc/pve/nodes/proxmox2/lrm_status.tmp.1599' - Permission denied
Aug 02 13:08:45 proxmox2 pve-ha-lrm[1599]: lost lock 'ha_agent_proxmox2_lock - cfs lock update failed - Permission denied
Aug 02 13:08:49 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:08:50 proxmox2 pmxcfs[1291]: [dcdb] notice: members: 2/1291
Aug 02 13:08:50 proxmox2 pmxcfs[1291]: [dcdb] notice: all data is up to date
Aug 02 13:08:50 proxmox2 pve-ha-lrm[1599]: status change active => lost_agent_lock
Aug 02 13:08:58 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:08 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:09 proxmox2 pvescheduler[16458]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 02 13:09:09 proxmox2 pvescheduler[16457]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 02 13:09:18 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:28 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:36 proxmox2 watchdog-mux[693]: client watchdog expired - disable watchdog updates
Aug 02 13:09:36 proxmox2 kernel: e1000e 0000:04:00.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Aug 02 13:09:38 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] rx: host: 1 link: 0 is up
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] link: Resetting MTU for link 0 because host 1 joined
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] rx: host: 3 link: 0 is up
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] rx: host: 3 link: 1 is up
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] link: Resetting MTU for link 1 because host 3 joined
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]: [QUORUM] Sync members[3]: 1 2 3
Aug 02 13:09:38 proxmox2 corosync[1346]: [QUORUM] Sync joined[2]: 1 3
Aug 02 13:09:38 proxmox2 corosync[1346]: [TOTEM ] A new membership (1.1344) was formed. Members joined: 1 3
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: members: 1/1290, 2/1291, 3/1321
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: starting data syncronisation
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: members: 1/1290, 2/1291, 3/1321
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: starting data syncronisation
Aug 02 13:09:38 proxmox2 corosync[1346]: [QUORUM] This node is within the primary component and will provide service.
Aug 02 13:09:38 proxmox2 corosync[1346]: [QUORUM] Members[3]: 1 2 3
Aug 02 13:09:38 proxmox2 corosync[1346]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: node has quorum
Aug 02 13:09:38 proxmox2 corosync[1346]: [KNET ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: received sync request (epoch 1/1290/00000004)
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: received sync request (epoch 1/1290/00000004)
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: received all states
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: leader is 1/1290
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: synced members: 1/1290, 3/1321
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: waiting for updates from leader
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: dfsm_deliver_queue: queue length 1
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: received all states
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: all data is up to date
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: update complete - trying to commit (got 7 inode updates)
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: all data is up to date
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: dfsm_deliver_sync_queue: queue length 1
Aug 02 13:09:39 proxmox2 corosync[1346]: [KNET ] rx: host: 1 link: 1 is up
Aug 02 13:09:39 proxmox2 corosync[1346]: [KNET ] link: Resetting MTU for link 1 because host 1 joined
Aug 02 13:09:39 proxmox2 corosync[1346]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:09:39 proxmox2 corosync[1346]: [KNET ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:40 proxmox2 pve-ha-lrm[1599]: successfully acquired lock 'ha_agent_proxmox2_lock'
Aug 02 13:09:40 proxmox2 pve-ha-lrm[1599]: status change lost_agent_lock => active
Aug 02 13:09:40 proxmox2 watchdog-mux[693]: exit watchdog-mux with active connections
Aug 02 13:09:40 proxmox2 systemd-journald[387]: Received client request to sync journal.
Aug 02 13:09:40 proxmox2 kernel: watchdog: watchdog0: watchdog did not stop!
-- Reboot -