Hello Guys,
We have some strange Network Problems sience updated the Cluster to version 6.X
They are 11 nodes in the Cluster and the Network from all Nodes Break random but all at the same time like there is some strange Traffic.
No other Servers have Problems connected to the same switches.
In the syslog we see strange things i post below, maybe there is a Driver or Kernel Bug ? All nodes run the same kernel/pve-version.
The cluster run Fine before upgrading to 6. I Hope Someone can help me
Errors From another Node : https://pastebin.com/raw/rKff9643
We have some strange Network Problems sience updated the Cluster to version 6.X
They are 11 nodes in the Cluster and the Network from all Nodes Break random but all at the same time like there is some strange Traffic.
No other Servers have Problems connected to the same switches.
In the syslog we see strange things i post below, maybe there is a Driver or Kernel Bug ? All nodes run the same kernel/pve-version.
The cluster run Fine before upgrading to 6. I Hope Someone can help me
[276065.034418] drv_pulse_mb[00001d0b]
[276065.034423] bnx2 0000:1c:00.1 eth5: DEBUG: dev_info_signature[44564903] reset_type[01005254]
[276065.034778] condition[0003e10e]
[276065.034785] bnx2 0000:1c:00.1 eth5: DEBUG: 000001c0: 01005254 42530000 0003e10e 00000000
[276065.035148] bnx2 0000:1c:00.1 eth5: DEBUG: 000003cc: 44444444 44444444 44444444 00000a00
[276065.035513] bnx2 0000:1c:00.1 eth5: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
[276065.035884] bnx2 0000:1c:00.1 eth5: DEBUG: 000003ec: 00000000 00000000 00000000 00a60630
[276065.036257] bnx2 0000:1c:00.1 eth5: DEBUG: 0x3fc[0000ffff]
[276065.036631] bnx2 0000:1c:00.1 eth5: <--- end MCP states dump --->
[276070.125979] bnx2 0000:1c:00.1 eth5: <--- start FTQ dump --->
[276070.126522] bnx2 0000:1c:00.1 eth5: RV2P_PFTQ_CTL 00010000
[276070.126930] bnx2 0000:1c:00.1 eth5: RV2P_TFTQ_CTL 00020000
[276070.127329] bnx2 0000:1c:00.1 eth5: RV2P_MFTQ_CTL 00004000
[276070.127724] bnx2 0000:1c:00.1 eth5: TBDR_FTQ_CTL 00004002
[276070.128118] bnx2 0000:1c:00.1 eth5: TDMA_FTQ_CTL 00010002
[276070.128513] bnx2 0000:1c:00.1 eth5: TXP_FTQ_CTL 00010002
[276070.128911] bnx2 0000:1c:00.1 eth5: TXP_FTQ_CTL 00010002
[276070.129305] bnx2 0000:1c:00.1 eth5: TPAT_FTQ_CTL 00010000
[276070.129703] bnx2 0000:1c:00.1 eth5: RXP_CFTQ_CTL 00008000
[276070.130118] bnx2 0000:1c:00.1 eth5: RXP_FTQ_CTL 00100000
[276070.130547] bnx2 0000:1c:00.1 eth5: COM_COMXQ_FTQ_CTL 00010000
[276070.130953] bnx2 0000:1c:00.1 eth5: COM_COMTQ_FTQ_CTL 00020000
[276070.131356] bnx2 0000:1c:00.1 eth5: COM_COMQ_FTQ_CTL 00010000
[276070.131762] bnx2 0000:1c:00.1 eth5: CP_CPQ_FTQ_CTL 00004000
[276070.132170] bnx2 0000:1c:00.1 eth5: CPU states:
[276070.132587] bnx2 0000:1c:00.1 eth5: 045000 mode b84c state 80001000 evt_mask 500 pc 800128c pc 8001284 instr 38640001
[276070.133048] bnx2 0000:1c:00.1 eth5: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a48 pc 8000a58 instr 8f820014
[276070.133506] bnx2 0000:1c:00.1 eth5: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c20 pc 8004c14 instr 10e00088
[276070.133975] bnx2 0000:1c:00.1 eth5: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a98 pc 8000b24 instr 1040ffd9
[276070.134436] bnx2 0000:1c:00.1 eth5: 145000 mode b800 state 80000000 evt_mask 500 pc 800af54 pc 800aedc instr 1440000e
[276070.134900] bnx2 0000:1c:00.1 eth5: 185000 mode b8cc state 80004000 evt_mask 500 pc 8000c6c pc 8000920 instr 8ce800e8
[276070.135369] bnx2 0000:1c:00.1 eth5: <--- end FTQ dump --->
[276070.135838] bnx2 0000:1c:00.1 eth5: <--- start TBDC dump --->
[276070.136307] bnx2 0000:1c:00.1 eth5: TBDC free cnt: 32
[276070.136777] bnx2 0000:1c:00.1 eth5: LINE CID BIDX CMD VALIDS
[276070.137259] bnx2 0000:1c:00.1 eth5: 00 001180 49e0 00 [0]
[276070.137741] bnx2 0000:1c:00.1 eth5: 01 001080 44b0 00 [0]
[276070.138230] bnx2 0000:1c:00.1 eth5: 02 001100 45e0 00 [0]
[276070.138701] bnx2 0000:1c:00.1 eth5: 03 001100 45e8 00 [0]
[276070.139162] bnx2 0000:1c:00.1 eth5: 04 001000 3568 00 [0]
[276070.139612] bnx2 0000:1c:00.1 eth5: 05 001000 3570 00 [0]
[276070.140052] bnx2 0000:1c:00.1 eth5: 06 001100 4480 00 [0]
[276070.140486] bnx2 0000:1c:00.1 eth5: 07 001100 43c8 00 [0]
[276070.140914] bnx2 0000:1c:00.1 eth5: 08 001000 2b40 00 [0]
[276070.141337] bnx2 0000:1c:00.1 eth5: 09 001080 4070 00 [0]
[276070.141749] bnx2 0000:1c:00.1 eth5: 0a 001100 6788 00 [0]
[276070.142162] bnx2 0000:1c:00.1 eth5: 0b 001300 4e88 00 [0]
[276070.142553] bnx2 0000:1c:00.1 eth5: 0c 001080 1698 00 [0]
[276070.142936] bnx2 0000:1c:00.1 eth5: 0d 001180 69e8 00 [0]
[276070.143308] bnx2 0000:1c:00.1 eth5: 0e 001100 96d0 00 [0]
[276070.143669] bnx2 0000:1c:00.1 eth5: 0f 001000 abf8 00 [0]
[276070.144022] bnx2 0000:1c:00.1 eth5: 10 001080 4e70 00 [0]
[276070.144369] bnx2 0000:1c:00.1 eth5: 11 001000 fc38 00 [0]
[276070.144709] bnx2 0000:1c:00.1 eth5: 12 001180 9f40 00 [0]
[276070.145043] bnx2 0000:1c:00.1 eth5: 13 001100 e2d0 00 [0]
[276070.145368] bnx2 0000:1c:00.1 eth5: 14 001280 e180 00 [0]
[276070.145688] bnx2 0000:1c:00.1 eth5: 15 001300 b570 00 [0]
[276070.146014] bnx2 0000:1c:00.1 eth5: 16 001280 1110 00 [0]
[276070.146329] bnx2 0000:1c:00.1 eth5: 17 001100 2b88 00 [0]
[276070.146641] bnx2 0000:1c:00.1 eth5: 18 001180 3a48 00 [0]
[276070.146953] bnx2 0000:1c:00.1 eth5: 19 001180 3a50 00 [0]
[276070.147262] bnx2 0000:1c:00.1 eth5: 1a 001080 acf8 00 [0]
[276070.147571] bnx2 0000:1c:00.1 eth5: 1b 001080 ad00 00 [0]
[276070.147878] bnx2 0000:1c:00.1 eth5: 1c 001300 e4f8 00 [0]
[276070.148187] bnx2 0000:1c:00.1 eth5: 1d 001300 e500 00 [0]
[276070.148493] bnx2 0000:1c:00.1 eth5: 1e 000800 94a8 00 [0]
[276070.148798] bnx2 0000:1c:00.1 eth5: 1f 1ffe80 f7a8 fd [0]
[276070.149096] bnx2 0000:1c:00.1 eth5: <--- end TBDC dump --->
[276070.149398] bnx2 0000:1c:00.1 eth5: DEBUG: intr_sem[0] PCI_CMD[00100446]
[276070.149711] bnx2 0000:1c:00.1 eth5: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
[276070.150041] bnx2 0000:1c:00.1 eth5: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
[276070.150404] bnx2 0000:1c:00.1 eth5: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
[276070.150733] bnx2 0000:1c:00.1 eth5: DEBUG: HC_STATS_INTERRUPT_STATUS[01800019]
[276070.151051] bnx2 0000:1c:00.1 eth5: DEBUG: PBA[00000000]
[276070.151368] bnx2 0000:1c:00.1 eth5: <--- start MCP states dump --->
[276070.151692] bnx2 0000:1c:00.1 eth5: DEBUG: MCP_STATE_P0[0003e10e] MCP_STATE_P1[0003e10e]
[276070.152030] bnx2 0000:1c:00.1 eth5: DEBUG: MCP mode[0000b800] state[80000000] evt_mask[00000500]
[276070.152377] bnx2 0000:1c:00.1 eth5: DEBUG: pc[0800d974] pc[0800d900] instr[10400033]
[276070.152724] bnx2 0000:1c:00.1 eth5: DEBUG: shmem states:
[276070.153058] bnx2 0000:1c:00.1 eth5: DEBUG: drv_mb[01030009] fw_mb[00000009] link_status[0000006f]
[276070.153407] drv_pulse_mb[00001d10]
[276070.153412] bnx2 0000:1c:00.1 eth5: DEBUG: dev_info_signature[44564903] reset_type[01005254]
[276070.153766] condition[0003e10e]
[276070.153772] bnx2 0000:1c:00.1 eth5: DEBUG: 000001c0: 01005254 42530000 0003e10e 00000000
[276070.154146] bnx2 0000:1c:00.1 eth5: DEBUG: 000003cc: 44444444 44444444 44444444 00000a00
[276070.154559] bnx2 0000:1c:00.1 eth5: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
[276070.154929] bnx2 0000:1c:00.1 eth5: DEBUG: 000003ec: 00000000 00000000 00000000 00a60630
[276070.155302] bnx2 0000:1c:00.1 eth5: DEBUG: 0x3fc[0000ffff]
[276070.155675] bnx2 0000:1c:00.1 eth5: <--- end MCP states dump --->
[276071.263394] bnx2 0000:1c:00.1 eth5: NIC Copper Link is Down
[276071.383949] bnx2 0000:1c:00.1 eth5: speed changed to 0 for port eth5
[276071.393975] bond1: link status definitely down for interface eth5, disabling it
[276074.489064] bnx2 0000:1c:00.1 eth5: NIC Copper Link is Up, 1000 Mbps full duplex
[276065.034423] bnx2 0000:1c:00.1 eth5: DEBUG: dev_info_signature[44564903] reset_type[01005254]
[276065.034778] condition[0003e10e]
[276065.034785] bnx2 0000:1c:00.1 eth5: DEBUG: 000001c0: 01005254 42530000 0003e10e 00000000
[276065.035148] bnx2 0000:1c:00.1 eth5: DEBUG: 000003cc: 44444444 44444444 44444444 00000a00
[276065.035513] bnx2 0000:1c:00.1 eth5: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
[276065.035884] bnx2 0000:1c:00.1 eth5: DEBUG: 000003ec: 00000000 00000000 00000000 00a60630
[276065.036257] bnx2 0000:1c:00.1 eth5: DEBUG: 0x3fc[0000ffff]
[276065.036631] bnx2 0000:1c:00.1 eth5: <--- end MCP states dump --->
[276070.125979] bnx2 0000:1c:00.1 eth5: <--- start FTQ dump --->
[276070.126522] bnx2 0000:1c:00.1 eth5: RV2P_PFTQ_CTL 00010000
[276070.126930] bnx2 0000:1c:00.1 eth5: RV2P_TFTQ_CTL 00020000
[276070.127329] bnx2 0000:1c:00.1 eth5: RV2P_MFTQ_CTL 00004000
[276070.127724] bnx2 0000:1c:00.1 eth5: TBDR_FTQ_CTL 00004002
[276070.128118] bnx2 0000:1c:00.1 eth5: TDMA_FTQ_CTL 00010002
[276070.128513] bnx2 0000:1c:00.1 eth5: TXP_FTQ_CTL 00010002
[276070.128911] bnx2 0000:1c:00.1 eth5: TXP_FTQ_CTL 00010002
[276070.129305] bnx2 0000:1c:00.1 eth5: TPAT_FTQ_CTL 00010000
[276070.129703] bnx2 0000:1c:00.1 eth5: RXP_CFTQ_CTL 00008000
[276070.130118] bnx2 0000:1c:00.1 eth5: RXP_FTQ_CTL 00100000
[276070.130547] bnx2 0000:1c:00.1 eth5: COM_COMXQ_FTQ_CTL 00010000
[276070.130953] bnx2 0000:1c:00.1 eth5: COM_COMTQ_FTQ_CTL 00020000
[276070.131356] bnx2 0000:1c:00.1 eth5: COM_COMQ_FTQ_CTL 00010000
[276070.131762] bnx2 0000:1c:00.1 eth5: CP_CPQ_FTQ_CTL 00004000
[276070.132170] bnx2 0000:1c:00.1 eth5: CPU states:
[276070.132587] bnx2 0000:1c:00.1 eth5: 045000 mode b84c state 80001000 evt_mask 500 pc 800128c pc 8001284 instr 38640001
[276070.133048] bnx2 0000:1c:00.1 eth5: 085000 mode b84c state 80001000 evt_mask 500 pc 8000a48 pc 8000a58 instr 8f820014
[276070.133506] bnx2 0000:1c:00.1 eth5: 0c5000 mode b84c state 80001000 evt_mask 500 pc 8004c20 pc 8004c14 instr 10e00088
[276070.133975] bnx2 0000:1c:00.1 eth5: 105000 mode b8cc state 80000000 evt_mask 500 pc 8000a98 pc 8000b24 instr 1040ffd9
[276070.134436] bnx2 0000:1c:00.1 eth5: 145000 mode b800 state 80000000 evt_mask 500 pc 800af54 pc 800aedc instr 1440000e
[276070.134900] bnx2 0000:1c:00.1 eth5: 185000 mode b8cc state 80004000 evt_mask 500 pc 8000c6c pc 8000920 instr 8ce800e8
[276070.135369] bnx2 0000:1c:00.1 eth5: <--- end FTQ dump --->
[276070.135838] bnx2 0000:1c:00.1 eth5: <--- start TBDC dump --->
[276070.136307] bnx2 0000:1c:00.1 eth5: TBDC free cnt: 32
[276070.136777] bnx2 0000:1c:00.1 eth5: LINE CID BIDX CMD VALIDS
[276070.137259] bnx2 0000:1c:00.1 eth5: 00 001180 49e0 00 [0]
[276070.137741] bnx2 0000:1c:00.1 eth5: 01 001080 44b0 00 [0]
[276070.138230] bnx2 0000:1c:00.1 eth5: 02 001100 45e0 00 [0]
[276070.138701] bnx2 0000:1c:00.1 eth5: 03 001100 45e8 00 [0]
[276070.139162] bnx2 0000:1c:00.1 eth5: 04 001000 3568 00 [0]
[276070.139612] bnx2 0000:1c:00.1 eth5: 05 001000 3570 00 [0]
[276070.140052] bnx2 0000:1c:00.1 eth5: 06 001100 4480 00 [0]
[276070.140486] bnx2 0000:1c:00.1 eth5: 07 001100 43c8 00 [0]
[276070.140914] bnx2 0000:1c:00.1 eth5: 08 001000 2b40 00 [0]
[276070.141337] bnx2 0000:1c:00.1 eth5: 09 001080 4070 00 [0]
[276070.141749] bnx2 0000:1c:00.1 eth5: 0a 001100 6788 00 [0]
[276070.142162] bnx2 0000:1c:00.1 eth5: 0b 001300 4e88 00 [0]
[276070.142553] bnx2 0000:1c:00.1 eth5: 0c 001080 1698 00 [0]
[276070.142936] bnx2 0000:1c:00.1 eth5: 0d 001180 69e8 00 [0]
[276070.143308] bnx2 0000:1c:00.1 eth5: 0e 001100 96d0 00 [0]
[276070.143669] bnx2 0000:1c:00.1 eth5: 0f 001000 abf8 00 [0]
[276070.144022] bnx2 0000:1c:00.1 eth5: 10 001080 4e70 00 [0]
[276070.144369] bnx2 0000:1c:00.1 eth5: 11 001000 fc38 00 [0]
[276070.144709] bnx2 0000:1c:00.1 eth5: 12 001180 9f40 00 [0]
[276070.145043] bnx2 0000:1c:00.1 eth5: 13 001100 e2d0 00 [0]
[276070.145368] bnx2 0000:1c:00.1 eth5: 14 001280 e180 00 [0]
[276070.145688] bnx2 0000:1c:00.1 eth5: 15 001300 b570 00 [0]
[276070.146014] bnx2 0000:1c:00.1 eth5: 16 001280 1110 00 [0]
[276070.146329] bnx2 0000:1c:00.1 eth5: 17 001100 2b88 00 [0]
[276070.146641] bnx2 0000:1c:00.1 eth5: 18 001180 3a48 00 [0]
[276070.146953] bnx2 0000:1c:00.1 eth5: 19 001180 3a50 00 [0]
[276070.147262] bnx2 0000:1c:00.1 eth5: 1a 001080 acf8 00 [0]
[276070.147571] bnx2 0000:1c:00.1 eth5: 1b 001080 ad00 00 [0]
[276070.147878] bnx2 0000:1c:00.1 eth5: 1c 001300 e4f8 00 [0]
[276070.148187] bnx2 0000:1c:00.1 eth5: 1d 001300 e500 00 [0]
[276070.148493] bnx2 0000:1c:00.1 eth5: 1e 000800 94a8 00 [0]
[276070.148798] bnx2 0000:1c:00.1 eth5: 1f 1ffe80 f7a8 fd [0]
[276070.149096] bnx2 0000:1c:00.1 eth5: <--- end TBDC dump --->
[276070.149398] bnx2 0000:1c:00.1 eth5: DEBUG: intr_sem[0] PCI_CMD[00100446]
[276070.149711] bnx2 0000:1c:00.1 eth5: DEBUG: PCI_PM[19002008] PCI_MISC_CFG[92000088]
[276070.150041] bnx2 0000:1c:00.1 eth5: DEBUG: EMAC_TX_STATUS[00000008] EMAC_RX_STATUS[00000000]
[276070.150404] bnx2 0000:1c:00.1 eth5: DEBUG: RPM_MGMT_PKT_CTRL[40000088]
[276070.150733] bnx2 0000:1c:00.1 eth5: DEBUG: HC_STATS_INTERRUPT_STATUS[01800019]
[276070.151051] bnx2 0000:1c:00.1 eth5: DEBUG: PBA[00000000]
[276070.151368] bnx2 0000:1c:00.1 eth5: <--- start MCP states dump --->
[276070.151692] bnx2 0000:1c:00.1 eth5: DEBUG: MCP_STATE_P0[0003e10e] MCP_STATE_P1[0003e10e]
[276070.152030] bnx2 0000:1c:00.1 eth5: DEBUG: MCP mode[0000b800] state[80000000] evt_mask[00000500]
[276070.152377] bnx2 0000:1c:00.1 eth5: DEBUG: pc[0800d974] pc[0800d900] instr[10400033]
[276070.152724] bnx2 0000:1c:00.1 eth5: DEBUG: shmem states:
[276070.153058] bnx2 0000:1c:00.1 eth5: DEBUG: drv_mb[01030009] fw_mb[00000009] link_status[0000006f]
[276070.153407] drv_pulse_mb[00001d10]
[276070.153412] bnx2 0000:1c:00.1 eth5: DEBUG: dev_info_signature[44564903] reset_type[01005254]
[276070.153766] condition[0003e10e]
[276070.153772] bnx2 0000:1c:00.1 eth5: DEBUG: 000001c0: 01005254 42530000 0003e10e 00000000
[276070.154146] bnx2 0000:1c:00.1 eth5: DEBUG: 000003cc: 44444444 44444444 44444444 00000a00
[276070.154559] bnx2 0000:1c:00.1 eth5: DEBUG: 000003dc: 0ffeffff 0000ffff ffffffff ffffffff
[276070.154929] bnx2 0000:1c:00.1 eth5: DEBUG: 000003ec: 00000000 00000000 00000000 00a60630
[276070.155302] bnx2 0000:1c:00.1 eth5: DEBUG: 0x3fc[0000ffff]
[276070.155675] bnx2 0000:1c:00.1 eth5: <--- end MCP states dump --->
[276071.263394] bnx2 0000:1c:00.1 eth5: NIC Copper Link is Down
[276071.383949] bnx2 0000:1c:00.1 eth5: speed changed to 0 for port eth5
[276071.393975] bond1: link status definitely down for interface eth5, disabling it
[276074.489064] bnx2 0000:1c:00.1 eth5: NIC Copper Link is Up, 1000 Mbps full duplex
Errors From another Node : https://pastebin.com/raw/rKff9643
pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-1-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-7
pve-kernel-helper: 6.0-7
pve-kernel-5.0.21-1-pve: 5.0.21-2
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 14.2.2-pve1
ceph-fuse: 14.2.2-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.11-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-4
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-8
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-64
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.1-pve2