same MAC on all LACP Bonds/Bridges after Upgrade Proxmox 8

houbidoo

Renowned Member
Mar 16, 2015
13
1
68
Hey all,

I just upgraded a three node cluster (new set up) and the cluster links (2* 10G LACP) stopped working.

I can see that the MAC-Adresses of all cluster links (2* 10G, LACP Bond, Bridge) on the three servers are identical now. After new installation...the same.
It is just the cluster links with Intel 10G cards that shows this behaivor. I also have 2* 10G LACP bonding and bridges for Management and Storage with Broadcom cards that still work fine.

Is it a bug?
What is the best way to fix?


@netzwerkcluster-server01:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-19-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2: 6.2.16-19
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 17.2.7-pve1
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.5
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-3
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-7
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1

@netzwerkcluster-server01:~# cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v6.2.16-19-pve

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 98:b7:85:55:22:11 <- same on all nodes
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 11
Partner Mac Address: 80:db:17:3b:49:00

Slave Interface: ens1f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 98:b7:85:55:22:11
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 98:b7:85:55:22:11
port key: 15
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 127
system mac address: 80:db:17:3b:49:00
oper key: 11
port priority: 127
port number: 3
port state: 63

Slave Interface: ens1f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 98:b7:85:55:22:12
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 98:b7:85:55:22:11
port key: 15
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 127
system mac address: 80:db:17:3b:49:00
oper key: 11
port priority: 127
port number: 6
port state: 63

@netzwerkcluster-server01:~# cat /etc/network/interfaces

auto bond0
iface bond0 inet manual
bond-slaves eno5 eno6
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#Management

auto bond1
iface bond1 inet manual
bond-slaves eno3 eno4
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#VM-Netzwerk

auto bond2
iface bond2 inet manual
bond-slaves ens1f0 ens1f1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#Cluster-Link

auto vmbr0
iface vmbr0 inet static
address 10.10.4.41/24
gateway 10.10.4.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
#Management

auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#VM-Netzwerk

auto vmbr2
iface vmbr2 inet static
address 172.31.0.1/24
bridge-ports bond2
bridge-stp off
bridge-fd 0
#Cluster-Link
 
a recent patch have fixed inherit of the mac address from first device, instead randomly generate a mac address. (this was causing other problem).

What is strange, is the "98:b7:85:55:22:11" on each node ...
https://macvendors.com/ show that's a Shenzhen 10Gtek Transceivers Co.. ??? (transceiver don't have mac normally)
(is is a fiber card ?)

what is the macs of the ifaces if you don't create a bond ?




Can you do some debug ? (I can't reproduce myself, and I have seen other user with differents problem)


can you try to rollback

apt install ifupdown2=3.2.0-1+pmx4

then reboot.

send result of cat /proc/net/bonding/*
(this should be a random systemd generated mac)

then do a reload (ifreload -a)

and look at cat /proc/net/bonding/*

(the reload is changing mac from first slave device, so it should break like current version at boot)



Then, still with ifupdown2=3.2.0-1+pmx4
edit /etc/systemd/network/99-default.link,
and replace
"MACAddressPolicy=Persistent"
with
"MACAddressPolicy=none"

Then reboot,

and send result of
cat /proc/net/bonding/*

(this should be a the mac of the first nic)
 
Hi,

Rollback to ifupdown2=3.2.0-1+pmx4 worked. I saw some other post where this worked.
Unfortunatly i cannot do anymore testing now..

Anyone some idea if it will be fixed in next versions? Or do we have todo some config change somewhere to have a "normal" behaivour?

For information:
The brand of NICs is 10Gtek. They use the intel X520 Chipset. If you have a look at the drivers you see the ixgbe driver..
For the NICs we use Flexoptix SFP+ SR coded with INtel.
 
Hi,

Rollback to ifupdown2=3.2.0-1+pmx4 worked. I saw some other post where this worked.
Unfortunatly i cannot do anymore testing now..

Anyone some idea if it will be fixed in next versions? Or do we have todo some config change somewhere to have a "normal" behaivour?

For information:
The brand of NICs is 10Gtek. They use the intel X520 Chipset. If you have a look at the drivers you see the ixgbe driver..
For the NICs we use Flexoptix SFP+ SR coded with INtel.
I can't rollback without testing "MACAddressPolicy=none" + ifupdown2=3.2.0-1+pmx4 , because pmx5 it's fixing another bugs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!