same MAC on all LACP Bonds/Bridges after Upgrade Proxmox 8

houbidoo

Renowned Member
Mar 16, 2015
13
1
68
Hey all,

I just upgraded a three node cluster (new set up) and the cluster links (2* 10G LACP) stopped working.

I can see that the MAC-Adresses of all cluster links (2* 10G, LACP Bond, Bridge) on the three servers are identical now. After new installation...the same.
It is just the cluster links with Intel 10G cards that shows this behaivor. I also have 2* 10G LACP bonding and bridges for Management and Storage with Broadcom cards that still work fine.

Is it a bug?
What is the best way to fix?


@netzwerkcluster-server01:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-19-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2: 6.2.16-19
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 17.2.7-pve1
ceph-fuse: 17.2.7-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.5
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-3
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-7
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1

@netzwerkcluster-server01:~# cat /proc/net/bonding/bond2
Ethernet Channel Bonding Driver: v6.2.16-19-pve

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: 98:b7:85:55:22:11 <- same on all nodes
Active Aggregator Info:
Aggregator ID: 1
Number of ports: 2
Actor Key: 15
Partner Key: 11
Partner Mac Address: 80:db:17:3b:49:00

Slave Interface: ens1f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 98:b7:85:55:22:11
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 98:b7:85:55:22:11
port key: 15
port priority: 255
port number: 1
port state: 61
details partner lacp pdu:
system priority: 127
system mac address: 80:db:17:3b:49:00
oper key: 11
port priority: 127
port number: 3
port state: 63

Slave Interface: ens1f1
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 98:b7:85:55:22:12
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
system priority: 65535
system mac address: 98:b7:85:55:22:11
port key: 15
port priority: 255
port number: 2
port state: 61
details partner lacp pdu:
system priority: 127
system mac address: 80:db:17:3b:49:00
oper key: 11
port priority: 127
port number: 6
port state: 63

@netzwerkcluster-server01:~# cat /etc/network/interfaces

auto bond0
iface bond0 inet manual
bond-slaves eno5 eno6
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#Management

auto bond1
iface bond1 inet manual
bond-slaves eno3 eno4
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#VM-Netzwerk

auto bond2
iface bond2 inet manual
bond-slaves ens1f0 ens1f1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
#Cluster-Link

auto vmbr0
iface vmbr0 inet static
address 10.10.4.41/24
gateway 10.10.4.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
#Management

auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#VM-Netzwerk

auto vmbr2
iface vmbr2 inet static
address 172.31.0.1/24
bridge-ports bond2
bridge-stp off
bridge-fd 0
#Cluster-Link
 
a recent patch have fixed inherit of the mac address from first device, instead randomly generate a mac address. (this was causing other problem).

What is strange, is the "98:b7:85:55:22:11" on each node ...
https://macvendors.com/ show that's a Shenzhen 10Gtek Transceivers Co.. ??? (transceiver don't have mac normally)
(is is a fiber card ?)

what is the macs of the ifaces if you don't create a bond ?




Can you do some debug ? (I can't reproduce myself, and I have seen other user with differents problem)


can you try to rollback

apt install ifupdown2=3.2.0-1+pmx4

then reboot.

send result of cat /proc/net/bonding/*
(this should be a random systemd generated mac)

then do a reload (ifreload -a)

and look at cat /proc/net/bonding/*

(the reload is changing mac from first slave device, so it should break like current version at boot)



Then, still with ifupdown2=3.2.0-1+pmx4
edit /etc/systemd/network/99-default.link,
and replace
"MACAddressPolicy=Persistent"
with
"MACAddressPolicy=none"

Then reboot,

and send result of
cat /proc/net/bonding/*

(this should be a the mac of the first nic)
 
Hi,

Rollback to ifupdown2=3.2.0-1+pmx4 worked. I saw some other post where this worked.
Unfortunatly i cannot do anymore testing now..

Anyone some idea if it will be fixed in next versions? Or do we have todo some config change somewhere to have a "normal" behaivour?

For information:
The brand of NICs is 10Gtek. They use the intel X520 Chipset. If you have a look at the drivers you see the ixgbe driver..
For the NICs we use Flexoptix SFP+ SR coded with INtel.
 
Hi,

Rollback to ifupdown2=3.2.0-1+pmx4 worked. I saw some other post where this worked.
Unfortunatly i cannot do anymore testing now..

Anyone some idea if it will be fixed in next versions? Or do we have todo some config change somewhere to have a "normal" behaivour?

For information:
The brand of NICs is 10Gtek. They use the intel X520 Chipset. If you have a look at the drivers you see the ixgbe driver..
For the NICs we use Flexoptix SFP+ SR coded with INtel.
I can't rollback without testing "MACAddressPolicy=none" + ifupdown2=3.2.0-1+pmx4 , because pmx5 it's fixing another bugs.