dual corosync networks, but server reboots

jmckee · Aug 2, 2024

I have a 3 node cluster and I do some HA between 2 of the nodes. I have dual corosync networks as recommended. So, I would think that with dual networks, I could at any given time unplug one of the networks (say to move to a different switch) and my cluster would stay solid. But no... one of my nodes is made to reboot (from fencing maybe???). Is this the expected behavior?

corosync.conf

Code:

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: proxmox1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.1.0.1
    ring1_addr: 10.2.0.1
  }
  node {
    name: proxmox2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.1.0.2
    ring1_addr: 10.2.0.2
  }
  node {
    name: proxmox3
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.1.0.3
    ring1_addr: 10.2.0.3
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: SWVC
  config_version: 9
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
    bindnetaddr: 10.1.0.0
    ringnumber: 0
  }
  interface {
    bindnetaddr: 10.2.0.0
    ringnumber: 1
  }
}

syslog

Code:

Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] link: host: 1 link: 0 is down
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] link: host: 1 link: 1 is down
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 has no active links
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 has no active links
Aug 02 13:08:36 proxmox2 kernel: e1000e 0000:04:00.0 eno1: NIC Link is Down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] link: host: 3 link: 0 is down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] link: host: 3 link: 1 is down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 has no active links
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 has no active links
Aug 02 13:08:38 proxmox2 corosync[1346]:   [TOTEM ] Token has not been received in 2737 ms
Aug 02 13:08:38 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:08:38 proxmox2 corosync[1346]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] Sync members[1]: 2
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] Sync left[2]: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]:   [TOTEM ] A new membership (2.1340) was formed. Members left: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]:   [TOTEM ] Failed to receive the leave message. failed: 1 3
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [dcdb] notice: members: 2/1291
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [status] notice: members: 2/1291
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] Members[1]: 2
Aug 02 13:08:43 proxmox2 corosync[1346]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [status] notice: node lost quorum
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [dcdb] crit: received write while not quorate - trigger resync
Aug 02 13:08:43 proxmox2 pmxcfs[1291]: [dcdb] crit: leaving CPG group
Aug 02 13:08:44 proxmox2 pmxcfs[1291]: [dcdb] notice: start cluster connection
Aug 02 13:08:44 proxmox2 pmxcfs[1291]: [dcdb] crit: cpg_join failed: 14
Aug 02 13:08:44 proxmox2 pmxcfs[1291]: [dcdb] crit: can't initialize service
Aug 02 13:08:44 proxmox2 pve-ha-crm[1537]: status change slave => wait_for_quorum
Aug 02 13:08:44 proxmox2 pve-ha-lrm[1599]: unable to write lrm status file - unable to open file '/etc/pve/nodes/proxmox2/lrm_status.tmp.1599' - Permission denied
Aug 02 13:08:45 proxmox2 pve-ha-lrm[1599]: lost lock 'ha_agent_proxmox2_lock - cfs lock update failed - Permission denied
Aug 02 13:08:49 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:08:50 proxmox2 pmxcfs[1291]: [dcdb] notice: members: 2/1291
Aug 02 13:08:50 proxmox2 pmxcfs[1291]: [dcdb] notice: all data is up to date
Aug 02 13:08:50 proxmox2 pve-ha-lrm[1599]: status change active => lost_agent_lock
Aug 02 13:08:58 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:08 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:09 proxmox2 pvescheduler[16458]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 02 13:09:09 proxmox2 pvescheduler[16457]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 02 13:09:18 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:28 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:36 proxmox2 watchdog-mux[693]: client watchdog expired - disable watchdog updates
Aug 02 13:09:36 proxmox2 kernel: e1000e 0000:04:00.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Aug 02 13:09:38 proxmox2 pvestatd[1487]: mount error: mount.nfs: mounting proxmox3:/data/backups failed, reason given by server: No such file or directory
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] rx: host: 1 link: 0 is up
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] rx: host: 3 link: 0 is up
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] rx: host: 3 link: 1 is up
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 1 because host 3 joined
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] Sync members[3]: 1 2 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] Sync joined[2]: 1 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [TOTEM ] A new membership (1.1344) was formed. Members joined: 1 3
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: members: 1/1290, 2/1291, 3/1321
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: starting data syncronisation
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: members: 1/1290, 2/1291, 3/1321
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: starting data syncronisation
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] This node is within the primary component and will provide service.
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] Members[3]: 1 2 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: node has quorum
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: received sync request (epoch 1/1290/00000004)
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: received sync request (epoch 1/1290/00000004)
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: received all states
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: leader is 1/1290
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: synced members: 1/1290, 3/1321
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: waiting for updates from leader
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: dfsm_deliver_queue: queue length 1
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: received all states
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [status] notice: all data is up to date
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: update complete - trying to commit (got 7 inode updates)
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: all data is up to date
Aug 02 13:09:38 proxmox2 pmxcfs[1291]: [dcdb] notice: dfsm_deliver_sync_queue: queue length 1
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] rx: host: 1 link: 1 is up
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 1 because host 1 joined
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:40 proxmox2 pve-ha-lrm[1599]: successfully acquired lock 'ha_agent_proxmox2_lock'
Aug 02 13:09:40 proxmox2 pve-ha-lrm[1599]: status change lost_agent_lock => active
Aug 02 13:09:40 proxmox2 watchdog-mux[693]: exit watchdog-mux with active connections
Aug 02 13:09:40 proxmox2 systemd-journald[387]: Received client request to sync journal.
Aug 02 13:09:40 proxmox2 kernel: watchdog: watchdog0: watchdog did not stop!
-- Reboot -

esi_y · Aug 2, 2024

Thanks for posting good logs in the first post already. No, it's not expected behaviour, but you lose both links at once.

What's your network topology and how are you performing the test?

EDIT: E.g. are you sure that both IPs are not on eno1?

jmckee · Aug 8, 2024

what the most concise way to show my topology? How about /etc/network/interfaces or is there some better way?

esi_y · Aug 8, 2024

jmckee said:
what the most concise way to show my topology? How about /etc/network/interfaces or is there some better way?

I meant the physical one, literally how are your corosync link switched / routed - you would have to describe it, literally. If your two corosync links share a single point of failure, they are not really "dual", that's why I asked.

jmckee · Aug 10, 2024

First network is on eno1 device on all 3 nodes. They all connect to my big netgear managed switch, segmented into their own vlan. Second network is on enp0s25 device on all 3 nodes. They all connect to a small cheap unmanaged switch.

some more diagnostics:

Code:

root@proxmox2:~# corosync-cfgtool -s
Local node ID 2, transport knet
LINK ID 0 udp
        addr    = 10.1.0.2
        status:
                nodeid:          1:     connected
                nodeid:          2:     localhost
                nodeid:          3:     connected
LINK ID 1 udp
        addr    = 10.2.0.2
        status:
                nodeid:          1:     connected
                nodeid:          2:     localhost
                nodeid:          3:     connected

Code:

root@proxmox2:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:25:90:aa:32:fb brd ff:ff:ff:ff:ff:ff
    inet 10.2.0.2/24 scope global enp0s25
       valid_lft forever preferred_lft forever
    inet6 fe80::225:90ff:feaa:32fb/64 scope link
       valid_lft forever preferred_lft forever
3: enp1s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff
4: enp1s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master bond0 state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff permaddr 00:1b:21:89:cf:ef
5: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:25:90:aa:32:fa brd ff:ff:ff:ff:ff:ff
    altname enp4s0
    inet 10.1.0.2/24 scope global eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::225:90ff:feaa:32fa/64 scope link
       valid_lft forever preferred_lft forever
6: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:02:c9:4e:e2:e6 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.2/24 scope global enp2s0
       valid_lft forever preferred_lft forever
    inet6 fe80::202:c9ff:fe4e:e2e6/64 scope link
       valid_lft forever preferred_lft forever
7: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr0 state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.22/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::21b:21ff:fe89:cfee/64 scope link
       valid_lft forever preferred_lft forever
9: bond0.4088@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr4088 state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff
10: vmbr4088: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff
    inet6 fe80::21b:21ff:fe89:cfee/64 scope link
       valid_lft forever preferred_lft forever
11: bond0.4089@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr4089 state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff
12: vmbr4089: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff
    inet6 fe80::21b:21ff:fe89:cfee/64 scope link
       valid_lft forever preferred_lft forever
13: bond0.10@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr10 state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff
14: vmbr10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:1b:21:89:cf:ee brd ff:ff:ff:ff:ff:ff
    inet6 fe80::21b:21ff:fe89:cfee/64 scope link
       valid_lft forever preferred_lft forever
15: tap204i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UNKNOWN group default qlen 1000
    link/ether 32:bc:cb:f9:a5:95 brd ff:ff:ff:ff:ff:ff
    ...

esi_y · Aug 10, 2024

jmckee said:
First network is on eno1 device on all 3 nodes. They all connect to my big netgear managed switch, segmented into their own vlan. Second network is on enp0s25 device on all 3 nodes. They all connect to a small cheap unmanaged switch.

That sounds okay.

jmckee said:

Code:

2: enp0s25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.2.0.2/24 scope global enp0s25
       valid_lft forever preferred_lft forever
5: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:25:90:aa:32:fa brd ff:ff:ff:ff:ff:ff
    altname enp4s0
    inet 10.1.0.2/24 scope global eno1
       valid_lft forever preferred_lft forever

Is eno1 = enp4... on all nodes as well?

Can you share more of the log earlier (a bit more of entries before "Token has not been received". And from all 3 nodes (yes, the same piece of log from each node during the same time, 15min prior NIC down would be fine, you can limit it to corosync and kernel messages.

jmckee said:

Code:

Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] link: host: 1 link: 0 is down
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] link: host: 1 link: 1 is down
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 has no active links
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 has no active links
Aug 02 13:08:36 proxmox2 kernel: e1000e 0000:04:00.0 eno1: NIC Link is Down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] link: host: 3 link: 0 is down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] link: host: 3 link: 1 is down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 has no active links
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 has no active links
Aug 02 13:08:38 proxmox2 corosync[1346]:   [TOTEM ] Token has not been received in 2737 ms

Can you rule out e.g. power issue that is common for both switches? Do you have maybe logs of the switches to check what was happening at the same time?

jmckee · Aug 11, 2024

esi_y said:
Is eno1 = enp4... on all nodes as well?

enp4 or enp3. What is the significance?

Code:

Aug 02 12:59:34 proxmox2 kernel: tap501i0: entered promiscuous mode
Aug 02 12:59:34 proxmox2 kernel: vmbr10: port 3(fwpr501p0) entered blocking state
Aug 02 12:59:34 proxmox2 kernel: vmbr10: port 3(fwpr501p0) entered disabled state
Aug 02 12:59:34 proxmox2 kernel: fwpr501p0: entered allmulticast mode
Aug 02 12:59:34 proxmox2 kernel: fwpr501p0: entered promiscuous mode
Aug 02 12:59:34 proxmox2 kernel: vmbr10: port 3(fwpr501p0) entered blocking state
Aug 02 12:59:34 proxmox2 kernel: vmbr10: port 3(fwpr501p0) entered forwarding state
Aug 02 12:59:34 proxmox2 kernel: fwbr501i0: port 1(fwln501i0) entered blocking state
Aug 02 12:59:34 proxmox2 kernel: fwbr501i0: port 1(fwln501i0) entered disabled state
Aug 02 12:59:34 proxmox2 kernel: fwln501i0: entered allmulticast mode
Aug 02 12:59:34 proxmox2 kernel: fwln501i0: entered promiscuous mode
Aug 02 12:59:34 proxmox2 kernel: fwbr501i0: port 1(fwln501i0) entered blocking state
Aug 02 12:59:34 proxmox2 kernel: fwbr501i0: port 1(fwln501i0) entered forwarding state
Aug 02 12:59:34 proxmox2 kernel: fwbr501i0: port 2(tap501i0) entered blocking state
Aug 02 12:59:34 proxmox2 kernel: fwbr501i0: port 2(tap501i0) entered disabled state
Aug 02 12:59:34 proxmox2 kernel: tap501i0: entered allmulticast mode
Aug 02 12:59:34 proxmox2 kernel: fwbr501i0: port 2(tap501i0) entered blocking state
Aug 02 12:59:34 proxmox2 kernel: fwbr501i0: port 2(tap501i0) entered forwarding state
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [TOTEM ] Retransmit List: 3cd8 3cd9
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] link: host: 1 link: 0 is down
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] link: host: 1 link: 1 is down
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 has no active links
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 has no active links
Aug 02 13:08:36 proxmox2 kernel: e1000e 0000:04:00.0 eno1: NIC Link is Down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] link: host: 3 link: 0 is down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] link: host: 3 link: 1 is down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 has no active links
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 has no active links
Aug 02 13:08:38 proxmox2 corosync[1346]:   [TOTEM ] Token has not been received in 2737 ms
Aug 02 13:08:38 proxmox2 corosync[1346]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] Sync members[1]: 2
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] Sync left[2]: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]:   [TOTEM ] A new membership (2.1340) was formed. Members left: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]:   [TOTEM ] Failed to receive the leave message. failed: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] Members[1]: 2
Aug 02 13:08:43 proxmox2 corosync[1346]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 02 13:09:36 proxmox2 kernel: e1000e 0000:04:00.0 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] rx: host: 1 link: 0 is up
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] rx: host: 3 link: 0 is up
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] rx: host: 3 link: 1 is up
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 1 because host 3 joined
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] Sync members[3]: 1 2 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] Sync joined[2]: 1 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [TOTEM ] A new membership (1.1344) was formed. Members joined: 1 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] This node is within the primary component and will provide service.
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] Members[3]: 1 2 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] rx: host: 1 link: 1 is up
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 1 because host 1 joined
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:40 proxmox2 systemd-journald[387]: Received client request to sync journal.
Aug 02 13:09:40 proxmox2 kernel: watchdog: watchdog0: watchdog did not stop!

jmckee · Aug 11, 2024

Code:

Aug 02 12:59:39 proxmox1 kernel: tap501i0: left allmulticast mode
Aug 02 12:59:39 proxmox1 kernel: fwbr501i0: port 2(tap501i0) entered disabled state
Aug 02 12:59:39 proxmox1 kernel: fwbr501i0: port 1(fwln501i0) entered disabled state
Aug 02 12:59:39 proxmox1 kernel: vmbr10: port 6(fwpr501p0) entered disabled state
Aug 02 12:59:39 proxmox1 kernel: fwln501i0 (unregistering): left allmulticast mode
Aug 02 12:59:39 proxmox1 kernel: fwln501i0 (unregistering): left promiscuous mode
Aug 02 12:59:39 proxmox1 kernel: fwbr501i0: port 1(fwln501i0) entered disabled state
Aug 02 12:59:40 proxmox1 kernel: fwpr501p0 (unregistering): left allmulticast mode
Aug 02 12:59:40 proxmox1 kernel: fwpr501p0 (unregistering): left promiscuous mode
Aug 02 12:59:40 proxmox1 kernel: vmbr10: port 6(fwpr501p0) entered disabled state
Aug 02 13:08:34 proxmox1 kernel: e1000e 0000:00:19.0 enp0s25: NIC Link is Down
Aug 02 13:08:35 proxmox1 corosync[1302]:   [KNET  ] link: host: 3 link: 1 is down
Aug 02 13:08:35 proxmox1 corosync[1302]:   [KNET  ] link: host: 2 link: 0 is down
Aug 02 13:08:35 proxmox1 corosync[1302]:   [KNET  ] link: host: 2 link: 1 is down
Aug 02 13:08:35 proxmox1 corosync[1302]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox1 corosync[1302]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox1 corosync[1302]:   [KNET  ] host: host: 2 has no active links
Aug 02 13:08:35 proxmox1 corosync[1302]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox1 corosync[1302]:   [KNET  ] host: host: 2 has no active links
Aug 02 13:08:38 proxmox1 corosync[1302]:   [TOTEM ] Token has not been received in 2737 ms
Aug 02 13:08:38 proxmox1 corosync[1302]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 02 13:08:43 proxmox1 corosync[1302]:   [QUORUM] Sync members[2]: 1 3
Aug 02 13:08:43 proxmox1 corosync[1302]:   [QUORUM] Sync left[1]: 2
Aug 02 13:08:43 proxmox1 corosync[1302]:   [TOTEM ] A new membership (1.1340) was formed. Members left: 2
Aug 02 13:08:43 proxmox1 corosync[1302]:   [TOTEM ] Failed to receive the leave message. failed: 2
Aug 02 13:08:43 proxmox1 corosync[1302]:   [QUORUM] Members[2]: 1 3
Aug 02 13:08:43 proxmox1 corosync[1302]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 02 13:09:05 proxmox1 kernel: e1000e 0000:00:19.0 enp0s25: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Aug 02 13:09:20 proxmox1 corosync[1302]:   [KNET  ] rx: host: 3 link: 1 is up
Aug 02 13:09:20 proxmox1 corosync[1302]:   [KNET  ] link: Resetting MTU for link 1 because host 3 joined
Aug 02 13:09:20 proxmox1 corosync[1302]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:09:20 proxmox1 corosync[1302]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:38 proxmox1 corosync[1302]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Aug 02 13:09:38 proxmox1 corosync[1302]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox1 corosync[1302]:   [QUORUM] Sync members[3]: 1 2 3
Aug 02 13:09:38 proxmox1 corosync[1302]:   [QUORUM] Sync joined[1]: 2
Aug 02 13:09:38 proxmox1 corosync[1302]:   [TOTEM ] A new membership (1.1344) was formed. Members joined: 2
Aug 02 13:09:38 proxmox1 corosync[1302]:   [QUORUM] Members[3]: 1 2 3
Aug 02 13:09:38 proxmox1 corosync[1302]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 02 13:09:38 proxmox1 corosync[1302]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:38 proxmox1 corosync[1302]:   [KNET  ] rx: host: 2 link: 1 is up
Aug 02 13:09:38 proxmox1 corosync[1302]:   [KNET  ] link: Resetting MTU for link 1 because host 2 joined
Aug 02 13:09:38 proxmox1 corosync[1302]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox1 corosync[1302]:   [KNET  ] pmtud: Global data MTU changed to: 1397

Code:

Aug 02 13:08:35 proxmox3 corosync[1400]:   [KNET  ] link: host: 1 link: 1 is down
Aug 02 13:08:35 proxmox3 corosync[1400]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox3 corosync[1400]:   [KNET  ] link: host: 2 link: 0 is down
Aug 02 13:08:37 proxmox3 corosync[1400]:   [KNET  ] link: host: 2 link: 1 is down
Aug 02 13:08:37 proxmox3 corosync[1400]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox3 corosync[1400]:   [KNET  ] host: host: 2 has no active links
Aug 02 13:08:37 proxmox3 corosync[1400]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox3 corosync[1400]:   [KNET  ] host: host: 2 has no active links
Aug 02 13:08:38 proxmox3 corosync[1400]:   [TOTEM ] Token has not been received in 2737 ms
Aug 02 13:08:38 proxmox3 kernel: e1000e 0000:00:19.0 enp0s25: NIC Link is Down
Aug 02 13:08:38 proxmox3 corosync[1400]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 02 13:08:43 proxmox3 corosync[1400]:   [QUORUM] Sync members[2]: 1 3
Aug 02 13:08:43 proxmox3 corosync[1400]:   [QUORUM] Sync left[1]: 2
Aug 02 13:08:43 proxmox3 corosync[1400]:   [TOTEM ] A new membership (1.1340) was formed. Members left: 2
Aug 02 13:08:43 proxmox3 corosync[1400]:   [TOTEM ] Failed to receive the leave message. failed: 2
Aug 02 13:08:43 proxmox3 corosync[1400]:   [QUORUM] Members[2]: 1 3
Aug 02 13:08:43 proxmox3 corosync[1400]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 02 13:09:19 proxmox3 kernel: e1000e 0000:00:19.0 enp0s25: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Aug 02 13:09:21 proxmox3 corosync[1400]:   [KNET  ] rx: host: 1 link: 1 is up
Aug 02 13:09:21 proxmox3 corosync[1400]:   [KNET  ] link: Resetting MTU for link 1 because host 1 joined
Aug 02 13:09:21 proxmox3 corosync[1400]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:09:21 proxmox3 corosync[1400]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:38 proxmox3 corosync[1400]:   [KNET  ] rx: host: 2 link: 1 is up
Aug 02 13:09:38 proxmox3 corosync[1400]:   [KNET  ] link: Resetting MTU for link 1 because host 2 joined
Aug 02 13:09:38 proxmox3 corosync[1400]:   [KNET  ] host: host: 2 (passive) best link: 1 (pri: 1)
Aug 02 13:09:38 proxmox3 corosync[1400]:   [KNET  ] rx: host: 2 link: 0 is up
Aug 02 13:09:38 proxmox3 corosync[1400]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
Aug 02 13:09:38 proxmox3 corosync[1400]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox3 corosync[1400]:   [QUORUM] Sync members[3]: 1 2 3
Aug 02 13:09:38 proxmox3 corosync[1400]:   [QUORUM] Sync joined[1]: 2
Aug 02 13:09:38 proxmox3 corosync[1400]:   [TOTEM ] A new membership (1.1344) was formed. Members joined: 2
Aug 02 13:09:38 proxmox3 corosync[1400]:   [QUORUM] Members[3]: 1 2 3
Aug 02 13:09:38 proxmox3 corosync[1400]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 02 13:09:38 proxmox3 corosync[1400]:   [KNET  ] pmtud: Global data MTU changed to: 1397

esi_y · Aug 11, 2024

Hey!

jmckee said:
enp4 or enp3. What is the significance?

I was literally trying to second-guess what's going on before the other two logs are shown (and whether it's the same on them - it is not).

I will sum it up (nothing new for you, but maybe someone glances at this and it speeds up further help) meanwhile:

Three nodes, only one of which is self-fencing (log below) - you did not report the other two doing the same at other times.
This despite there's 2 corosync links because both are lost at once, however only one NIC (eno1) is reported down by the kernel.
Each NIC is hooked up onto different switch. The other nodes do not experiency any NICs down during the same time.

jmckee said:
Code:

Aug 02 13:08:35 proxmox2 corosync[1346]: [TOTEM ] Retransmit List: 3cd8

Both corosync links down, only eno1 NIC down.

jmckee said:

Code:

Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] link: host: 1 link: 0 is down
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] link: host: 1 link: 1 is down
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 has no active links
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:08:35 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 has no active links
Aug 02 13:08:36 proxmox2 kernel: e1000e 0000:04:00.0 eno1: NIC Link is Down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] link: host: 3 link: 0 is down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] link: host: 3 link: 1 is down
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 has no active links
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:08:37 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 has no active links
Aug 02 13:08:38 proxmox2 corosync[1346]:   [TOTEM ] Token has not been received in 2737 ms
Aug 02 13:08:38 proxmox2 corosync[1346]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] Sync members[1]: 2
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] Sync left[2]: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]:   [TOTEM ] A new membership (2.1340) was formed. Members left: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]:   [TOTEM ] Failed to receive the leave message. failed: 1 3
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 02 13:08:43 proxmox2 corosync[1346]:   [QUORUM] Members[1]: 2
Aug 02 13:08:43 proxmox2 corosync[1346]:   [MAIN  ] Completed service synchronization, ready to provide service.

Almost a whole minute later, the eno1 NIC is back up (meanwhile, the enp0s25 never went down).

jmckee said:

Suddenly both corosync links are up.

jmckee said:

Code:

Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] rx: host: 1 link: 0 is up
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] rx: host: 3 link: 0 is up
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] rx: host: 3 link: 1 is up
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 1 because host 3 joined
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] Sync members[3]: 1 2 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] Sync joined[2]: 1 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [TOTEM ] A new membership (1.1344) was formed. Members joined: 1 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] This node is within the primary component and will provide service.
Aug 02 13:09:38 proxmox2 corosync[1346]:   [QUORUM] Members[3]: 1 2 3
Aug 02 13:09:38 proxmox2 corosync[1346]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 02 13:09:38 proxmox2 corosync[1346]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] rx: host: 1 link: 1 is up
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] link: Resetting MTU for link 1 because host 1 joined
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 02 13:09:39 proxmox2 corosync[1346]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 02 13:09:40 proxmox2 systemd-journald[387]: Received client request to sync journal.

Just it's too late as the fencing has already started.

jmckee said:
Code:

Aug 02 13:09:40 proxmox2 kernel: watchdog: watchdog0: watchdog did not stop!

This is a bit of a guessing game, I am afraid.

It might help looking at: lshw -class network

I would also wonder about ethtool eno1 and ethtool enp0s25

Perhaps journalctl -b -1 | grep -e eno1 -e enp0s25

Also cat /proc/net/dev

Also do you mind sharing /etc/network/interfaces?

Are these hardware-identical nodes? What hardware is proxmox2 actually?

Someone might perhaps suggest trying different network cable and different port on the switch, but I find it weird the other link is down at the same time. What are these two network cards?

Is there any pattern to it? E.g. replication starts up (even on another NIC) and then it goes down like this?

Is this new install, did this happen after e.g. kernel upgrade.

There are all same version nodes?

jmckee · Aug 13, 2024

Well, it is somewhat old hardware - supermicro x9 boards. I have been running proxmox on this hardware since version 2 or 3 with no such problems. But.. yes I did use the script to upgrade version 7 to version 8. And now I have a bad feeling about that. So I think I will reinstall from iso and see if the problem still exists. *sigh*

esi_y · Aug 13, 2024

jmckee said:
So I think I will reinstall from iso and see if the problem still exists. *sigh*

It might be the fastest way, but if it persists, the outputs above might help nailiing it down (it could be something as weird as mismatch in flow control between switch and card).

Search

Search

dual corosync networks, but server reboots

jmckee

Renowned Member

esi_y

Renowned Member

jmckee

Renowned Member

esi_y

Renowned Member

jmckee

Renowned Member

esi_y

Renowned Member

jmckee

Renowned Member

jmckee

Renowned Member

esi_y

Renowned Member

jmckee

Renowned Member

esi_y

Renowned Member

We value your privacy