Hi ,
I have a classic three node hyper-converged cluster.
Each node is identical HP DL380p with 96GB memory.
12 disk bays as follows:-
1 x sata 320 GB HDD (PVE host OS)
1 x sata 120GB SSD (Ceph WAL)
10 x 1TB HDD (OSDs)
Network and interfaces are as follows:
1 Gbe corosync (single port on each node connected to an unmanaged 1Gbe switch)
10 Gbe Ceph public network ( broadcast bond directly connected, no switch)
10 Gbe Ceph cluster network ( separate cluster net , also a directly connected broadcast bond)
10 Gbe office network interfaces ( cisco nexus )
I am running the latest full upgrade from no-subscription.
There are about a dozen VMs running, most are fairly light weight., the largest is a 2TB linux image running Owncloud.
There are around 20 physical machines on the office network which sync to owncloud.
I am seeing seemingly random corosync issues where quorum is momentarily lost, sometimes resulting in a node reboot.
Problems exacerbated during backup to a pbs datastore, not so bad if backing up to nfs.
Having said this removing all NFS mounts has stabilized things but logs still show corosync is still glitching every few minutes.
Couple of questions:
1) I am doubting this physical corosync setup (nics/cabling/switch)
Can I replace the corosync switch with a directly connected broadcast bond using the same topology as my 10Ge Ceph networks
( i.e. using two 1Ge ports in each node) ?
Will the latency be acceptable ?
2) Removing NFS mounts seems to have helped but I wondering about pbs.
Does it use NFS under the hood ?
Thanks
Here is what I see in syslog
May 4 11:45:07 mfscn01 corosync[2087]: [MAIN ] Completed service synchronization, ready to provide service.
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 39 times
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retried 39 times
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 1 times
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: members: 1/2070, 2/2223, 3/2157
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: starting data syncronisation
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: received sync request (epoch 1/2070/00000066)
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: received sync request (epoch 1/2070/00000042)
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: received all states
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: leader is 1/2070
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: synced members: 1/2070, 2/2223
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: start sending inode updates
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: sent all (8) updates
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: all data is up to date
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: dfsm_deliver_queue: queue length 5
May 4 11:45:07 mfscn01 pve-ha-crm[3585]: loop take too long (38 seconds)
May 4 11:45:07 mfscn01 pve-ha-lrm[3618]: successfully acquired lock 'ha_agent_mfscn01_lock'
May 4 11:45:07 mfscn01 pve-ha-lrm[3618]: status change lost_agent_lock => active
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: received all states
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: all data is up to date
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: dfsm_deliver_queue: queue length 41
May 4 11:45:07 mfscn01 pvestatd[3151]: status update time (24.251 seconds)
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn01/local: -1
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn01/pbs01: -1
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn01/POOL0: -1
May 4 11:45:08 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn02/local: -1
May 4 11:45:08 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn02/pbs01: -1
May 4 11:45:08 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn02/POOL0: -1
May 4 11:46:51 mfscn01 pveproxy[3610]: worker 311484 finished
May 4 11:46:51 mfscn01 pveproxy[3610]: worker 311484 finished
May 4 11:46:51 mfscn01 pveproxy[3610]: starting 1 worker(s)
May 4 11:46:51 mfscn01 pveproxy[3610]: worker 331992 started
May 4 11:46:53 mfscn01 pveproxy[331991]: worker exit
May 4 11:47:30 mfscn01 pvedaemon[193873]: <root@pam> successful auth for user 'root@pam'
May 4 11:52:02 mfscn01 pveproxy[3610]: worker 310934 finished
May 4 11:52:02 mfscn01 pveproxy[3610]: starting 1 worker(s)
May 4 11:52:02 mfscn01 pveproxy[3610]: worker 336031 started
May 4 11:52:03 mfscn01 pveproxy[336030]: got inotify poll request in wrong process - disabling inotify
May 4 11:52:05 mfscn01 pveproxy[336030]: worker exit
May 4 11:55:06 mfscn01 pvedaemon[259441]: <root@pam> successful auth for user 'root@pam'
May 4 12:01:30 mfscn01 corosync[2087]: [KNET ] link: host: 2 link: 0 is down
May 4 12:01:30 mfscn01 corosync[2087]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
May 4 12:01:30 mfscn01 corosync[2087]: [KNET ] host: host: 2 has no active links
May 4 12:01:33 mfscn01 corosync[2087]: [TOTEM ] Token has not been received in 2737 ms
May 4 12:01:38 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:01:38 mfscn01 corosync[2087]: [QUORUM] Sync left[1]: 2
May 4 12:01:38 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33dd) was formed. Members left: 2
May 4 12:01:38 mfscn01 corosync[2087]: [TOTEM ] Failed to receive the leave message. failed: 2
May 4 12:01:39 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 10
May 4 12:01:40 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 20
May 4 12:01:41 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 30
May 4 12:01:42 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 40
May 4 12:01:43 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:01:43 mfscn01 corosync[2087]: [QUORUM] Sync left[1]: 2
May 4 12:01:43 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33e1) was formed. Members
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: members: 1/2070, 3/2157
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: starting data syncronisation
May 4 12:01:43 mfscn01 corosync[2087]: [QUORUM] Members[2]: 1 3
May 4 12:01:43 mfscn01 corosync[2087]: [MAIN ] Completed service synchronization, ready to provide service.
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retried 43 times
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 1 times
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: members: 1/2070, 3/2157
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: starting data syncronisation
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: received sync request (epoch 1/2070/00000067)
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: received sync request (epoch 1/2070/00000043)
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: received all states
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: leader is 1/2070
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: synced members: 1/2070, 3/2157
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: start sending inode updates
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: sent all (0) updates
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: all data is up to date
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: dfsm_deliver_queue: queue length 6
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: received all states
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: all data is up to date
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: dfsm_deliver_queue: queue length 24
May 4 12:01:51 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:01:51 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33e5) was formed. Members
May 4 12:01:54 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 10
May 4 12:01:55 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 20
May 4 12:01:56 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 30
May 4 12:01:56 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:01:56 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33e9) was formed. Members
May 4 12:01:57 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 40
May 4 12:01:58 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 50
May 4 12:01:59 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 60
May 4 12:01:59 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 10
May 4 12:02:00 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 70
May 4 12:02:00 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 20
May 4 12:02:00 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:02:00 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33ed) was formed. Members
May 4 12:02:01 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 80
May 4 12:02:01 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 30
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 90
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 40
May 4 12:02:02 mfscn01 corosync[2087]: [KNET ] rx: host: 2 link: 0 is up
May 4 12:02:02 mfscn01 corosync[2087]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
May 4 12:02:02 mfscn01 corosync[2087]: [QUORUM] Sync members[3]: 1 2 3
May 4 12:02:02 mfscn01 corosync[2087]: [QUORUM] Sync joined[1]: 2
May 4 12:02:02 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33f1) was formed. Members joined: 2
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: members: 1/2070, 2/2223, 3/2157
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: starting data syncronisation
May 4 12:02:02 mfscn01 corosync[2087]: [QUORUM] Members[3]: 1 2 3
May 4 12:02:02 mfscn01 corosync[2087]: [MAIN ] Completed service synchronization, ready to provide service.
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retried 42 times
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 96 times
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 1 times
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: members: 1/2070, 2/2223, 3/2157
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: starting data syncronisation
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: received sync request (epoch 1/2070/00000068)
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: received sync request (epoch 1/2070/00000044)
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: received all states
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: leader is 1/2070
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: synced members: 1/2070, 3/2157
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: start sending inode updates
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: sent all (5) updates
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: all data is up to date
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: dfsm_deliver_queue: queue length 6
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: received all states
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: all data is up to date
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: dfsm_deliver_queue: queue length 19
May 4 12:02:30 mfscn01 pvedaemon[259441]: <root@pam> successful auth for user 'root@pam'
May 4 12:03:13 mfscn01 corosync[2087]: [KNET ] link: host: 3 link: 0 is down
May 4 12:03:13 mfscn01 corosync[2087]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
May 4 12:03:13 mfscn01 corosync[2087]: [KNET ] host: host: 3 has no active links
May 4 12:03:13 mfscn01 corosync[2087]: [TOTEM ] Token has not been received in 2737 ms
May 4 12:03:14 mfscn01 corosync[2087]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
May 4 12:03:19 mfscn01 corosync[2087]: [QUORUM] Sync members[1]: 1
May 4 11:47:30 mfscn01 pvedaemon[193873]: <root@pam> successful auth for user 'root@pam'
May 4 11:52:02 mfscn01 pveproxy[3610]: worker 310934 finished
May 4 11:52:02 mfscn01 pveproxy[3610]: starting 1 worker(s)
May 4 11:52:02 mfscn01 pveproxy[3610]: worker 336031 started
May 4 11:52:03 mfscn01 pveproxy[336030]: got inotify poll request in wrong process - disabling inotify
May 4 11:52:05 mfscn01 pveproxy[336030]: worker exit
May 4 11:55:06 mfscn01 pvedaemon[259441]: <root@pam> successful auth for user 'root@pam'
I have a classic three node hyper-converged cluster.
Each node is identical HP DL380p with 96GB memory.
12 disk bays as follows:-
1 x sata 320 GB HDD (PVE host OS)
1 x sata 120GB SSD (Ceph WAL)
10 x 1TB HDD (OSDs)
Network and interfaces are as follows:
1 Gbe corosync (single port on each node connected to an unmanaged 1Gbe switch)
10 Gbe Ceph public network ( broadcast bond directly connected, no switch)
10 Gbe Ceph cluster network ( separate cluster net , also a directly connected broadcast bond)
10 Gbe office network interfaces ( cisco nexus )
I am running the latest full upgrade from no-subscription.
There are about a dozen VMs running, most are fairly light weight., the largest is a 2TB linux image running Owncloud.
There are around 20 physical machines on the office network which sync to owncloud.
I am seeing seemingly random corosync issues where quorum is momentarily lost, sometimes resulting in a node reboot.
Problems exacerbated during backup to a pbs datastore, not so bad if backing up to nfs.
Having said this removing all NFS mounts has stabilized things but logs still show corosync is still glitching every few minutes.
Couple of questions:
1) I am doubting this physical corosync setup (nics/cabling/switch)
Can I replace the corosync switch with a directly connected broadcast bond using the same topology as my 10Ge Ceph networks
( i.e. using two 1Ge ports in each node) ?
Will the latency be acceptable ?
2) Removing NFS mounts seems to have helped but I wondering about pbs.
Does it use NFS under the hood ?
Thanks
Here is what I see in syslog
May 4 11:45:07 mfscn01 corosync[2087]: [MAIN ] Completed service synchronization, ready to provide service.
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 39 times
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retried 39 times
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 1 times
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: members: 1/2070, 2/2223, 3/2157
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: starting data syncronisation
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: received sync request (epoch 1/2070/00000066)
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: received sync request (epoch 1/2070/00000042)
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: received all states
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: leader is 1/2070
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: synced members: 1/2070, 2/2223
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: start sending inode updates
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: sent all (8) updates
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: all data is up to date
May 4 11:45:07 mfscn01 pmxcfs[2070]: [dcdb] notice: dfsm_deliver_queue: queue length 5
May 4 11:45:07 mfscn01 pve-ha-crm[3585]: loop take too long (38 seconds)
May 4 11:45:07 mfscn01 pve-ha-lrm[3618]: successfully acquired lock 'ha_agent_mfscn01_lock'
May 4 11:45:07 mfscn01 pve-ha-lrm[3618]: status change lost_agent_lock => active
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: received all states
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: all data is up to date
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: dfsm_deliver_queue: queue length 41
May 4 11:45:07 mfscn01 pvestatd[3151]: status update time (24.251 seconds)
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn01/local: -1
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn01/pbs01: -1
May 4 11:45:07 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn01/POOL0: -1
May 4 11:45:08 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn02/local: -1
May 4 11:45:08 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn02/pbs01: -1
May 4 11:45:08 mfscn01 pmxcfs[2070]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/mfscn02/POOL0: -1
May 4 11:46:51 mfscn01 pveproxy[3610]: worker 311484 finished
May 4 11:46:51 mfscn01 pveproxy[3610]: worker 311484 finished
May 4 11:46:51 mfscn01 pveproxy[3610]: starting 1 worker(s)
May 4 11:46:51 mfscn01 pveproxy[3610]: worker 331992 started
May 4 11:46:53 mfscn01 pveproxy[331991]: worker exit
May 4 11:47:30 mfscn01 pvedaemon[193873]: <root@pam> successful auth for user 'root@pam'
May 4 11:52:02 mfscn01 pveproxy[3610]: worker 310934 finished
May 4 11:52:02 mfscn01 pveproxy[3610]: starting 1 worker(s)
May 4 11:52:02 mfscn01 pveproxy[3610]: worker 336031 started
May 4 11:52:03 mfscn01 pveproxy[336030]: got inotify poll request in wrong process - disabling inotify
May 4 11:52:05 mfscn01 pveproxy[336030]: worker exit
May 4 11:55:06 mfscn01 pvedaemon[259441]: <root@pam> successful auth for user 'root@pam'
May 4 12:01:30 mfscn01 corosync[2087]: [KNET ] link: host: 2 link: 0 is down
May 4 12:01:30 mfscn01 corosync[2087]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
May 4 12:01:30 mfscn01 corosync[2087]: [KNET ] host: host: 2 has no active links
May 4 12:01:33 mfscn01 corosync[2087]: [TOTEM ] Token has not been received in 2737 ms
May 4 12:01:38 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:01:38 mfscn01 corosync[2087]: [QUORUM] Sync left[1]: 2
May 4 12:01:38 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33dd) was formed. Members left: 2
May 4 12:01:38 mfscn01 corosync[2087]: [TOTEM ] Failed to receive the leave message. failed: 2
May 4 12:01:39 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 10
May 4 12:01:40 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 20
May 4 12:01:41 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 30
May 4 12:01:42 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 40
May 4 12:01:43 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:01:43 mfscn01 corosync[2087]: [QUORUM] Sync left[1]: 2
May 4 12:01:43 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33e1) was formed. Members
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: members: 1/2070, 3/2157
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: starting data syncronisation
May 4 12:01:43 mfscn01 corosync[2087]: [QUORUM] Members[2]: 1 3
May 4 12:01:43 mfscn01 corosync[2087]: [MAIN ] Completed service synchronization, ready to provide service.
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retried 43 times
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 1 times
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: members: 1/2070, 3/2157
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: starting data syncronisation
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: received sync request (epoch 1/2070/00000067)
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: received sync request (epoch 1/2070/00000043)
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: received all states
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: leader is 1/2070
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: synced members: 1/2070, 3/2157
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: start sending inode updates
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: sent all (0) updates
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: all data is up to date
May 4 12:01:43 mfscn01 pmxcfs[2070]: [dcdb] notice: dfsm_deliver_queue: queue length 6
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: received all states
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: all data is up to date
May 4 12:01:43 mfscn01 pmxcfs[2070]: [status] notice: dfsm_deliver_queue: queue length 24
May 4 12:01:51 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:01:51 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33e5) was formed. Members
May 4 12:01:54 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 10
May 4 12:01:55 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 20
May 4 12:01:56 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 30
May 4 12:01:56 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:01:56 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33e9) was formed. Members
May 4 12:01:57 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 40
May 4 12:01:58 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 50
May 4 12:01:59 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 60
May 4 12:01:59 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 10
May 4 12:02:00 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 70
May 4 12:02:00 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 20
May 4 12:02:00 mfscn01 corosync[2087]: [QUORUM] Sync members[2]: 1 3
May 4 12:02:00 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33ed) was formed. Members
May 4 12:02:01 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 80
May 4 12:02:01 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 30
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retry 90
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retry 40
May 4 12:02:02 mfscn01 corosync[2087]: [KNET ] rx: host: 2 link: 0 is up
May 4 12:02:02 mfscn01 corosync[2087]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
May 4 12:02:02 mfscn01 corosync[2087]: [QUORUM] Sync members[3]: 1 2 3
May 4 12:02:02 mfscn01 corosync[2087]: [QUORUM] Sync joined[1]: 2
May 4 12:02:02 mfscn01 corosync[2087]: [TOTEM ] A new membership (1.33f1) was formed. Members joined: 2
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: members: 1/2070, 2/2223, 3/2157
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: starting data syncronisation
May 4 12:02:02 mfscn01 corosync[2087]: [QUORUM] Members[3]: 1 2 3
May 4 12:02:02 mfscn01 corosync[2087]: [MAIN ] Completed service synchronization, ready to provide service.
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: cpg_send_message retried 42 times
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 96 times
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: cpg_send_message retried 1 times
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: members: 1/2070, 2/2223, 3/2157
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: starting data syncronisation
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: received sync request (epoch 1/2070/00000068)
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: received sync request (epoch 1/2070/00000044)
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: received all states
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: leader is 1/2070
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: synced members: 1/2070, 3/2157
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: start sending inode updates
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: sent all (5) updates
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: all data is up to date
May 4 12:02:02 mfscn01 pmxcfs[2070]: [dcdb] notice: dfsm_deliver_queue: queue length 6
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: received all states
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: all data is up to date
May 4 12:02:02 mfscn01 pmxcfs[2070]: [status] notice: dfsm_deliver_queue: queue length 19
May 4 12:02:30 mfscn01 pvedaemon[259441]: <root@pam> successful auth for user 'root@pam'
May 4 12:03:13 mfscn01 corosync[2087]: [KNET ] link: host: 3 link: 0 is down
May 4 12:03:13 mfscn01 corosync[2087]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
May 4 12:03:13 mfscn01 corosync[2087]: [KNET ] host: host: 3 has no active links
May 4 12:03:13 mfscn01 corosync[2087]: [TOTEM ] Token has not been received in 2737 ms
May 4 12:03:14 mfscn01 corosync[2087]: [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
May 4 12:03:19 mfscn01 corosync[2087]: [QUORUM] Sync members[1]: 1
May 4 11:47:30 mfscn01 pvedaemon[193873]: <root@pam> successful auth for user 'root@pam'
May 4 11:52:02 mfscn01 pveproxy[3610]: worker 310934 finished
May 4 11:52:02 mfscn01 pveproxy[3610]: starting 1 worker(s)
May 4 11:52:02 mfscn01 pveproxy[3610]: worker 336031 started
May 4 11:52:03 mfscn01 pveproxy[336030]: got inotify poll request in wrong process - disabling inotify
May 4 11:52:05 mfscn01 pveproxy[336030]: worker exit
May 4 11:55:06 mfscn01 pvedaemon[259441]: <root@pam> successful auth for user 'root@pam'
Last edited: