VM loses connectivity periodically without visible reasons for me

atlasloewenherz

New Member
Oct 23, 2022
3
0
1
Hello Community!

i have the following topology:


NETWORK Layout



and the following is my network configuration for all 3 nodes is identical:


Bash:
auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

iface eno3 inet manual

iface eno4 inet manual

auto vmbr0
iface vmbr0 inet dhcp
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    hwaddress 00:03:ff:31:b4:bc


iface usb0 inet manual

auto vmbr1
iface vmbr1 inet dhcp
    bridge-ports eno2
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    hwaddress 00:03:ff:31:b4:bd

auto vmbr2
iface vmbr2 inet manual
    bridge-ports eno3
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    hwaddress 00:03:ff:31:b4:be

auto vmbr3
iface vmbr3 inet manual
    bridge-ports eno4
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    hwaddress 00:03:ff:31:b4:bf


and the following is a snippet of my syslog ov node 01:


Bash:
Oct 28 15:14:47 pve01 corosync[2594]:   [TOTEM ] Token has not been received in 2737 ms
Oct 28 15:14:52 pve01 kernel: [251841.854565] sd 6:0:0:0: alua: supports implicit TPGS
Oct 28 15:15:07 pve01 corosync[2594]:   [KNET  ] link: host: 3 link: 0 is down
Oct 28 15:15:07 pve01 corosync[2594]:   [KNET  ] link: host: 2 link: 0 is down
Oct 28 15:15:07 pve01 corosync[2594]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Oct 28 15:15:07 pve01 corosync[2594]:   [KNET  ] host: host: 3 has no active links
Oct 28 15:15:07 pve01 corosync[2594]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Oct 28 15:15:07 pve01 corosync[2594]:   [KNET  ] host: host: 2 has no active links
Oct 28 15:15:09 pve01 corosync[2594]:   [TOTEM ] Token has not been received in 2737 ms
Oct 28 15:15:10 pve01 corosync[2594]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3650ms), waiting 4380ms for consensus.
Oct 28 15:15:10 pve01 corosync[2594]:   [KNET  ] rx: host: 3 link: 0 is up
Oct 28 15:15:10 pve01 corosync[2594]:   [KNET  ] rx: host: 2 link: 0 is up
Oct 28 15:15:10 pve01 corosync[2594]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Oct 28 15:15:10 pve01 corosync[2594]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Oct 28 15:15:10 pve01 corosync[2594]:   [QUORUM] Sync members[3]: 1 2 3
Oct 28 15:15:10 pve01 corosync[2594]:   [TOTEM ] A new membership (1.57d9) was formed. Members
Oct 28 15:15:10 pve01 corosync[2594]:   [QUORUM] Members[3]: 1 2 3
Oct 28 15:15:10 pve01 corosync[2594]:   [MAIN  ] Completed service synchronization, ready to provide service.
Oct 28 15:15:12 pve01 kernel: [251861.458895] sd 6:0:0:0: alua: supports implicit TPGS
Oct 28 15:15:27 pve01 corosync[2594]:   [TOTEM ] Token has not been received in 2737 ms
Oct 28 15:15:32 pve01 kernel: [251882.049199] sd 6:0:0:0: alua: supports implicit TPGS
Oct 28 15:15:37 pve01 corosync[2594]:   [KNET  ] link: host: 3 link: 0 is down
Oct 28 15:15:37 pve01 corosync[2594]:   [KNET  ] link: host: 2 link: 0 is down
Oct 28 15:15:37 pve01 corosync[2594]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Oct 28 15:15:37 pve01 corosync[2594]:   [KNET  ] host: host: 3 has no active links
Oct 28 15:15:37 pve01 corosync[2594]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Oct 28 15:15:37 pve01 corosync[2594]:   [KNET  ] host: host: 2 has no active links
Oct 28 15:15:39 pve01 corosync[2594]:   [TOTEM ] Token has not been received in 2737 ms
Oct 28 15:15:39 pve01 corosync[2594]:   [KNET  ] rx: host: 2 link: 0 is up
Oct 28 15:15:39 pve01 corosync[2594]:   [KNET  ] rx: host: 3 link: 0 is up
Oct 28 15:15:39 pve01 corosync[2594]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Oct 28 15:15:39 pve01 corosync[2594]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Oct 28 15:15:52 pve01 kernel: [251901.695690] sd 7:0:0:0: alua: supports implicit TPGS
Oct 28 15:16:13 pve01 kernel: [251922.518204] sd 7:0:0:0: alua: supports implicit TPGS
Oct 28 15:16:23 pve01 corosync[2594]:   [KNET  ] link: host: 3 link: 0 is down
Oct 28 15:16:23 pve01 corosync[2594]:   [KNET  ] link: host: 2 link: 0 is down
Oct 28 15:16:23 pve01 corosync[2594]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Oct 28 15:16:23 pve01 corosync[2594]:   [KNET  ] host: host: 3 has no active links
Oct 28 15:16:23 pve01 corosync[2594]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Oct 28 15:16:23 pve01 corosync[2594]:   [KNET  ] host: host: 2 has no active links
Oct 28 15:16:25 pve01 corosync[2594]:   [TOTEM ] Token has not been received in 2737 ms
Oct 28 15:16:25 pve01 corosync[2594]:   [KNET  ] rx: host: 2 link: 0 is up
Oct 28 15:16:25 pve01 corosync[2594]:   [KNET  ] rx: host: 3 link: 0 is up
Oct 28 15:16:25 pve01 corosync[2594]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Oct 28 15:16:25 pve01 corosync[2594]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Oct 28 15:16:32 pve01 kernel: [251942.053477] sd 7:0:0:0: alua: supports implicit TPGS
Oct 28 15:16:52 pve01 kernel: [251961.680030] sd 7:0:0:0: alua: supports implicit TPGS
Oct 28 15:17:01 pve01 CRON[705837]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Oct 28 15:17:13 pve01 kernel: [251982.449812] sd 7:0:0:0: alua: supports implicit TPGS
Oct 28 15:17:32 pve01 kernel: [252001.938536] sd 6:0:0:0: alua: supports implicit TPGS
Oct 28 15:17:42 pve01 pvedaemon[397892]: <root@pam> successful auth for user 'root@pam'
Oct 28 15:17:52 pve01 kernel: [252021.586332] sd 7:0:0:0: alua: supports implicit TPGS
Oct 28 15:18:12 pve01 kernel: [252042.254475] sd 7:0:0:0: alua: supports implicit TPGS
Oct 28 15:18:17 pve01 pmxcfs[2501]: [dcdb] notice: data verification successful
Oct 28 15:18:33 pve01 corosync[2594]:   [KNET  ] link: host: 3 link: 0 is down
Oct 28 15:18:33 pve01 corosync[2594]:   [KNET  ] link: host: 2 link: 0 is down
Oct 28 15:18:33 pve01 corosync[2594]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Oct 28 15:18:33 pve01 corosync[2594]:   [KNET  ] host: host: 3 has no active links
Oct 28 15:18:33 pve01 corosync[2594]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Oct 28 15:18:33 pve01 corosync[2594]:   [KNET  ] host: host: 2 has no active links
Oct 28 15:18:33 pve01 kernel: [252062.874552] sd 7:0:0:0: alua: supports implicit TPGS
Oct 28 15:18:35 pve01 corosync[2594]:   [KNET  ] rx: host: 2 link: 0 is up
Oct 28 15:18:35 pve01 corosync[2594]:   [KNET  ] rx: host: 3 link: 0 is up
Oct 28 15:18:35 pve01 corosync[2594]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Oct 28 15:18:35 pve01 corosync[2594]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
 

Attachments

  • Screenshot 2022-10-28 at 15.30.47.png
    Screenshot 2022-10-28 at 15.30.47.png
    82 KB · Views: 10
  • Screenshot 2022-10-28 at 15.31.55.png
    Screenshot 2022-10-28 at 15.31.55.png
    686 KB · Views: 10
*bump this thread*

Still struggling here and hope someone get the change to have a look at this!

current dmesg output during network disconnection of one node:

Dmesg Output

thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!