Nodes Reboot After Upgrade to 7.1

JRG

Member
Nov 7, 2018
6
0
6
33
Please provide the output of pveversion -v.
Do you have HA enabled?

We do write to the log before fencing, but it might not reach the disk anymore in time.
If you want as much information as possible, configure a remote syslog via UDP.
Hi,

This is the output of
Bash:
pveversion -v

proxmox-ve: 7.2-1 (running kernel: 5.15.35-1-pve)
pve-manager: 7.2-3 (running version: 7.2-3/c743d6c1)
pve-kernel-5.15: 7.2-3
pve-kernel-helper: 7.2-3
pve-kernel-5.15.35-1-pve: 5.15.35-2
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-8
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-6
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.2-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.8-1
proxmox-backup-file-restore: 2.1.8-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-1
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1

I stoped and disabled the following services on this node after the cluster reboot, I guess this takes care of disabling HA on this particular node meanwhile I troubleshoot the network so the reboot of all nodes on HA will not happend due flapping network on this server. Can you @mira confirm if this is correct?

pve-ha-lrm
pve-ha-crm


Currently all network ports, bond interfaces and bridges are UP however there is not connectivity to VLANs connected to the Bond and Bridges on the SFP+ ports, I think that disabling the SFP+ ports on this NIC (Intel X710) this will make the bond+vmbr interfaces works again.

The following are the logs I'm getting related to the network:

[Tue May 10 14:22:01 2022] i40e 0000:19:00.1: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[Tue May 10 14:22:01 2022] i40e 0000:19:00.1: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
[Tue May 10 14:22:01 2022] i40e 0000:19:00.1: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[Tue May 10 14:22:01 2022] i40e 0000:19:00.1: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF


Bash:
lshw -class network -short

H/W path Device Class Description
=======================================================
/0/100/1c/0 eno3 network I350 Gigabit Network Connection
/0/100/1c/0.1 eno4 network I350 Gigabit Network Connection
/0/102/0 eno1 network Ethernet Controller X710 for 10GbE SFP+
/0/102/0.1 eno2 network Ethernet Controller X710 for 10GbE SFP+
/0/103/0 enp94s0f0 network NetXtreme II BCM57810 10 Gigabit Ethernet
/0/103/0.1 enp94s0f1 network NetXtreme II BCM57810 10 Gigabit Ethernet
/3 vmbr0 network Ethernet interface
/4 bond0 network Ethernet interface
/5 vmbr1 network Ethernet interface
/6 bond1 network Ethernet interface
/7 vmbr2 network Ethernet interface
/8 vlanEDITED network Ethernet interface
/9 vlanEDITED network Ethernet interface
/a vlanEDITED network Ethernet interface
/b vlanEDITED network Ethernet interface

Bash:
ethtool eno1

Settings for eno1:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseSR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10000baseSR/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 0
Transceiver: internal
Supports Wake-on: g
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes

Bash:
ethtool eno2

Settings for eno2:
Supported ports: [ FIBRE ]
Supported link modes: 10000baseSR/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10000baseSR/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: 10000Mb/s
Duplex: Full
Auto-negotiation: off
Port: FIBRE
PHYAD: 0
Transceiver: internal
Supports Wake-on: g
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes

Bash:
ethtool -i eno1

driver: i40e
version: 5.15.35-1-pve
firmware-version: 8.40 0x8000af80 20.5.13
expansion-rom-version:
bus-info: 0000:19:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

Bash:
modinfo i40e

filename: /lib/modules/5.15.35-1-pve/kernel/drivers/net/ethernet/intel/i40e/i40e.ko
license: GPL v2
description: Intel(R) Ethernet Connection XL710 Network Driver
author: Intel Corporation, <e1000-devel@lists.sourceforge.net>
srcversion: CEFF8ACB01F180F6F7BB885
alias: pci:v00008086d0000158Bsv*sd*bc*sc*i*
alias: pci:v00008086d0000158Asv*sd*bc*sc*i*
alias: pci:v00008086d00000D58sv*sd*bc*sc*i*
alias: pci:v00008086d00000CF8sv*sd*bc*sc*i*
alias: pci:v00008086d00001588sv*sd*bc*sc*i*
alias: pci:v00008086d00001587sv*sd*bc*sc*i*
alias: pci:v00008086d000037D3sv*sd*bc*sc*i*
alias: pci:v00008086d000037D2sv*sd*bc*sc*i*
alias: pci:v00008086d000037D1sv*sd*bc*sc*i*
alias: pci:v00008086d000037D0sv*sd*bc*sc*i*
alias: pci:v00008086d000037CFsv*sd*bc*sc*i*
alias: pci:v00008086d000037CEsv*sd*bc*sc*i*
alias: pci:v00008086d0000104Fsv*sd*bc*sc*i*
alias: pci:v00008086d0000104Esv*sd*bc*sc*i*
alias: pci:v00008086d000015FFsv*sd*bc*sc*i*
alias: pci:v00008086d00001589sv*sd*bc*sc*i*
alias: pci:v00008086d00001586sv*sd*bc*sc*i*
alias: pci:v00008086d00001585sv*sd*bc*sc*i*
alias: pci:v00008086d00001584sv*sd*bc*sc*i*
alias: pci:v00008086d00001583sv*sd*bc*sc*i*
alias: pci:v00008086d00001581sv*sd*bc*sc*i*
alias: pci:v00008086d00001580sv*sd*bc*sc*i*
alias: pci:v00008086d00001574sv*sd*bc*sc*i*
alias: pci:v00008086d00001572sv*sd*bc*sc*i*
depends:
retpoline: Y
intree: Y
name: i40e
vermagic: 5.15.35-1-pve SMP mod_unload modversions
parm: debug:Debug level (0=none,...,16=all), Debug mask (0x8XXXXXXX) (uint)

This is my network config:

proxmox-node-net-config.png
 
Last edited:

mira

Proxmox Staff Member
Staff member
Aug 1, 2018
1,631
156
68
Yes, disabling those services will disable HA on that node. But you can also remove all HA resources from that node.

[Tue May 10 14:22:01 2022] i40e 0000:19:00.1: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[Tue May 10 14:22:01 2022] i40e 0000:19:00.1: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
[Tue May 10 14:22:01 2022] i40e 0000:19:00.1: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[Tue May 10 14:22:01 2022] i40e 0000:19:00.1: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
These usually appear when your interface has more VLANs configured than it can handle via offloading.
Try disabling offloading via ethtool or limit the VLANs to at most 128 (should be a safe default) in your /etc/network/interfaces -> bridge-vids.
The bridge-vids option supports ranges (-) and single IDs separated by a comma. Just make sure the number of IDs stays below 128.
 

JRG

Member
Nov 7, 2018
6
0
6
33
Yes, disabling those services will disable HA on that node. But you can also remove all HA resources from that node.


These usually appear when your interface has more VLANs configured than it can handle via offloading.
Try disabling offloading via ethtool or limit the VLANs to at most 128 (should be a safe default) in your /etc/network/interfaces -> bridge-vids.
The bridge-vids option supports ranges (-) and single IDs separated by a comma. Just make sure the number of IDs stays below 128.
Thanks, I think that stopping and disabling those services is a easy way to trobleshoot networking issues like this one witout the risk of triggering node reboots due HA. I'm waiting for a new NIC to replace the one giving all those issues. Thanks again for your time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!