Nach Upgrade von PVE5 auf PVE6 kommen die Netzwerk Interfaces nicht mehr hoch

KaiS · Feb 25, 2022

Hallo,

ich habe eben ein 3 Node Cluster von PVE5 auf PVE6 upgegraded. Streng nach Manual.
Vorher hab ich es auch in nem Test-Lab-Aufbau getestet und es klappte fehlerfrei.

Ceph wurde noch nicht angefasst. (d.h. es läuft noch Luminous).

Nachdem ich nun die ersten beiden Nodes upgegraded hatte (fehlerfrei) und dann von der 3. Node Maschinen auf Node 1 ziehen wollte, merkte ich, dass nichts mehr ging -> Timeouts.

Nach einigem Forschen fand ich heraus, dass allte Interfaces bis auf eins auf DOWN stehen.

Ich habe 2 Interfaces als bond0-vmbr0 und entsprechend 2 Interfaces als bond1-vmbr1 (für Ceph)

Es kommt nach dem Booten des Servers aber nur noch ein Interface von bond0 hoch.

Nun bin ich ratlos. Hier mal die Interfaces Datei:

Code:

root@Prox1:/etc/network# cat interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto enp96s0f0
iface enp96s0f0 inet manual

auto enp96s0f1
iface enp96s0f1 inet manual

auto enp216s0f0
iface enp216s0f0 inet manual

auto enp216s0f1
iface enp216s0f1 inet manual

iface eno0 inet manual

iface ens4f0 inet manual

iface ens4f1 inet manual

auto bond0
iface bond0 inet manual
    bond-slaves enp96s0f0 enp96s0f1
    bond-miimon 100
    bond-mode balance-alb
#LAN Traffic

auto bond1
iface bond1 inet manual
    bond-slaves enp216s0f0 enp216s0f1
    bond-miimon 100
    bond-mode balance-rr
#Ceph Traffic

auto vmbr0
iface vmbr0 inet static
    address 192.168.3.243/24
    gateway 192.168.3.134
    bridge-ports bond0
    bridge-stp off
    bridge-fd 0
#LAN Traffic

auto vmbr1
iface vmbr1 inet static
    address 10.0.0.1/24
    bridge-ports bond1
    bridge-stp off
    bridge-fd 0
#Ceph Traffic

Und hier ein Ausschnitt von syslog während des Bootens:

Code:

Feb 25 23:17:49 Prox1 kernel: [   10.765370] i40e 0000:60:00.1 enp96s0f1: already using mac address ac:1f:6b:ba:67:87
Feb 25 23:17:49 Prox1 kernel: [   10.765457] bond0: (slave enp96s0f1): making interface the new active one
Feb 25 23:17:49 Prox1 kernel: [   10.765460] i40e 0000:60:00.1 enp96s0f1: already using mac address ac:1f:6b:ba:67:87
Feb 25 23:17:49 Prox1 kernel: [   10.765474] i40e 0000:60:00.1 enp96s0f1: already using mac address ac:1f:6b:ba:67:87
Feb 25 23:17:49 Prox1 kernel: [   10.765496] bond0: (slave enp96s0f1): Enslaving as an active interface with an up link
Feb 25 23:17:49 Prox1 systemd-udevd[927]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 25 23:17:49 Prox1 systemd-udevd[927]: Could not generate persistent MAC address for vmbr0: No such file or directory
Feb 25 23:17:49 Prox1 kernel: [   10.789842] vmbr0: port 1(bond0) entered blocking state
Feb 25 23:17:49 Prox1 kernel: [   10.789844] vmbr0: port 1(bond0) entered disabled state
Feb 25 23:17:49 Prox1 kernel: [   10.789925] device bond0 entered promiscuous mode
Feb 25 23:17:49 Prox1 kernel: [   10.789926] device enp96s0f1 entered promiscuous mode
Feb 25 23:17:49 Prox1 kernel: [   10.794421] vmbr0: port 1(bond0) entered blocking state
Feb 25 23:17:49 Prox1 kernel: [   10.794424] vmbr0: port 1(bond0) entered forwarding state
Feb 25 23:17:49 Prox1 ifup[1383]: Waiting for vmbr0 to get ready (MAXWAIT is 2 seconds).
Feb 25 23:17:49 Prox1 kernel: [   10.795956] i40e 0000:60:00.1: entering allmulti mode.
Feb 25 23:17:49 Prox1 systemd-udevd[927]: link_config: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 25 23:17:49 Prox1 systemd-udevd[927]: Could not generate persistent MAC address for vmbr1: No such file or directory
Feb 25 23:17:49 Prox1 ifup[1383]: interface enp216s0f0 does not exist!
Feb 25 23:17:49 Prox1 ifup[1383]: Waiting for vmbr1 to get ready (MAXWAIT is 2 seconds).
Feb 25 23:17:50 Prox1 systemd[1]: Started Raise network interfaces.
Feb 25 23:17:50 Prox1 systemd[1]: Reached target Network.
Feb 25 23:17:50 Prox1 systemd[1]: Started Zabbix Agent.
Feb 25 23:17:50 Prox1 systemd[1]: Starting OpenBSD Secure Shell server...
Feb 25 23:17:50 Prox1 systemd[1]: Condition check resulted in fast remote file copy program daemon being skipped.
Feb 25 23:17:50 Prox1 systemd[1]: Reached target Network is Online.
Feb 25 23:17:50 Prox1 systemd[1]: Starting Map RBD devices...
Feb 25 23:17:50 Prox1 systemd[1]: Starting iSCSI initiator daemon (iscsid)...
Feb 25 23:17:50 Prox1 systemd[1]: Starting Postfix Mail Transport Agent (instance -)...
Feb 25 23:17:50 Prox1 systemd[1]: Starting LXC network bridge setup...
Feb 25 23:17:50 Prox1 systemd[1]: Started LXC Container Monitoring Daemon.
Feb 25 23:17:50 Prox1 systemd[1]: Started Map RBD devices.

Es sieht für mich so aus, als ob meine Interfaces tatsächlich nicht mehr existieren ...

"interface enp216s0f0 does not exist!"

Ich bin echt ratlos

(etwas später nun

Ich habe herausgefunden, dass offensichtlich die Namen meiner Interfaces während/durch das Update abgeändert wurden.

Entsprechend werden die im Bond definierten Namen nicht mehr gefunden und schlussendlich funktioniert dann vmbr1 dadurch nicht mehr, weswegen CEPH auch nicht funktioniert.

Nur ich kann nicht nachvollziehen nach welchem Schema die Namen verändert wurde, bzw. was genau ich jetzt anpassen muss.

Ich habe etwas experimentiert und nun mein vmbr1 (Ceph) umkonfiguriert und lasse es nicht mehr auf bond1 zeigen, sondern direkt auf eine der neu aufgetauchten Interfaces ens4f0.

Das hat dazu geführt, dass CEPH wieder kommunizieren kann und funktioniert.

Konfiguriere ich aber ens4f0 und ensff1 zusammen als bond1, geht wieder garnichts - deswegen vermute ich, dass trotz namensähnlichkeit der beiden Interfaces diese nicht auf der selben Karte sind?

Das ist wirklich schrecklich.

Hier 2 Screenshots: Im ersten Screenshot ist meine original Konfiguration vom PVE5 (wie es auf Node3 noch läuft) und es seit 3 Jahren super läuft.

Und Screenshot 2 zeigt die Netzwekrkonfig nach dem Upgrade auf PVE6 (mit der Änderung die ich oben beschrieben habe und manuell gemacht habe, um es irgendwie wieder ans LAufen zu bringen:

Was ich dringend bräuchte wäre jetzt eine Hilfestellung, wie ich das wieder sauber hinbekomme.

Wie finde ich raus, wie die Interfaces jetzt wirklich heissen, bzw. wie identifiziere ich sie, damit ich die Konfig geradebiegen kann.

Was mich wundert, dass die enp96s0f1 funktioniert, aber die enp96s0f0 nicht. Und wohin die eno0 gehört weiss ich auch nicht :-(

Wäre toll wenn mir jemand helfen könnte, denn so inkonsistent wie es gerade läuft will ich ungern mit den nächsten Schritten (Ceph Upgrade) weitermachen und am Montag sollte das System wieder in Production sein...

Grüße,
Kai

KaiS · Feb 26, 2022

Könnte mir jemand einen Tipp geben, wie ich mein Netzwerk neu konfigurieren kann, d.h. wie bekomme ich raus, wie das Betriebssystem die Interfaces benennt? Meine INTERFACES Datei ist ja auf die alten Namen festgeschrieben und das bringt alles durcheinander

noPa$$word · Feb 26, 2022

lshw -c network und/oder lspci auf der Konsole. lshw muss ev installiert werden..

KaiS · Feb 27, 2022

Vielen Dnak für Deinen Tipp. Mit den richtigen Bezeichnungen jetzt konnte ich meine INTERFACES Konfig anpassen und den Cluster wieder samt CEPH zum Laufen bringen und nun geht wieder alles.

Vielleicht sollte noch eine kleine Warnung / Hinweis zu diesem Problem in die ansonsten vorbildliche Upgrade Doku eingebaut werden.

KaiS · Feb 28, 2022

Nochmal ich. Ich verstehe nicht wieso die Benennung meiner logischen Namen der Netzwerkkarten so ist:

Code:

root@Prox1:/etc/pve# lshw -c network
  *-network:0               
       description: Ethernet interface
       product: Ethernet Connection X722 for 10GBASE-T
       vendor: Intel Corporation
       physical id: 0
       bus info: pci@0000:60:00.0
       logical name: eno0
       version: 09
       serial: ac:1f:6b:ba:67:86
       size: 10Gbit/s
       capacity: 10Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi msix pciexpress bus_master cap_list rom ethernet physical tp 1000bt-fd 10000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=i40e driverversion=5.13.19-3-pve duplex=full firmware=3.33 0x80000e48 1.1876.0 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s
       resources: iomemory:38c00-38bff iomemory:38c00-38bff irq:29 memory:38c000000000-38c000ffffff memory:38c002800000-38c002807fff memory:c5d00000-c5d7ffff memory:38c002000000-38c0023fffff memory:38c002810000-38c00288ffff
  *-network:1
       description: Ethernet interface
       product: Ethernet Connection X722 for 10GBASE-T
       vendor: Intel Corporation
       physical id: 0.1
       bus info: pci@0000:60:00.1
       logical name: enp96s0f1
       version: 09
       serial: ac:1f:6b:ba:67:87
       size: 10Gbit/s
       capacity: 10Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi msix pciexpress bus_master cap_list rom ethernet physical tp 1000bt-fd 10000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=i40e driverversion=5.13.19-3-pve duplex=full firmware=3.33 0x80000e48 1.1876.0 latency=0 link=yes multicast=yes port=twisted pair speed=10Gbit/s
       resources: iomemory:38c00-38bff iomemory:38c00-38bff irq:29 memory:38c001000000-38c001ffffff memory:38c002808000-38c00280ffff memory:c5d80000-c5dfffff memory:38c002400000-38c0027fffff memory:38c002890000-38c00290ffff
  *-network:0
       description: Ethernet interface
       product: Ethernet Controller 10G X550T
       vendor: Intel Corporation
       physical id: 0
       bus info: pci@0000:d8:00.0
       logical name: enp216s0f0
       version: 01
       serial: a0:36:9f:28:04:f0
       size: 10Gbit/s
       capacity: 10Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical tp 100bt-fd 1000bt-fd 10000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=ixgbe driverversion=5.13.19-3-pve duplex=full firmware=0x80000483, 17.5.9 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s
       resources: irq:37 memory:fb800000-fbbfffff memory:fbc04000-fbc07fff memory:fbe80000-fbefffff
  *-network:1
       description: Ethernet interface
       product: Ethernet Controller 10G X550T
       vendor: Intel Corporation
       physical id: 0.1
       bus info: pci@0000:d8:00.1
       logical name: enp216s0f1
       version: 01
       serial: a0:36:9f:28:04:f2
       size: 10Gbit/s
       capacity: 10Gbit/s
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi msix pciexpress vpd bus_master cap_list rom ethernet physical tp 100bt-fd 1000bt-fd 10000bt-fd autonegotiation
       configuration: autonegotiation=on broadcast=yes driver=ixgbe driverversion=5.13.19-3-pve duplex=full firmware=0x80000483, 17.5.9 latency=0 link=yes multicast=yes port=twisted pair slave=yes speed=10Gbit/s
       resources: irq:358 memory:fb400000-fb7fffff memory:fbc00000-fbc03fff memory:fbe00000-fbe7ffff
  *-network:0
       description: Ethernet interface
       physical id: 3
       logical name: bond0
       serial: ac:1f:6b:ba:67:86
       size: 10Gbit/s
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bonding driverversion=5.13.19-3-pve duplex=full firmware=2 link=yes master=yes multicast=yes speed=10Gbit/s
  *-network:1
       description: Ethernet interface
       physical id: 4
       logical name: bond1
       serial: a0:36:9f:28:04:f0
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bonding driverversion=5.13.19-3-pve duplex=full firmware=2 link=yes master=yes multicast=yes
  *-network:2
       description: Ethernet interface
       physical id: 5
       logical name: vmbr0
       serial: ac:1f:6b:ba:67:87
       size: 10Gbit/s
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=192.168.3.243 link=yes multicast=yes speed=10Gbit/s
  *-network:3
       description: Ethernet interface
       physical id: 6
       logical name: vmbr1
       serial: a0:36:9f:28:04:f0
       capabilities: ethernet physical
       configuration: autonegotiation=off broadcast=yes driver=bridge driverversion=2.3 firmware=N/A ip=10.0.0.1 link=yes multicast=yes
root@Prox1:/etc/pve#

Insbesondere der erste Eintrag der Karte X722 - diese bekommt den logical name "en0"
Während der 2. Port der selben Karte dann "enp96s0f1" benannt wird.

Deutet das auf irgendeinen Konfigurationsfehler meinerseits hin?

Grüße,
Kai

Nach Upgrade von PVE5 auf PVE6 kommen die Netzwerk Interfaces nicht mehr hoch

KaiS

Active Member

KaiS

Active Member

noPa$$word

Renowned Member

KaiS

Active Member

KaiS

Active Member

We value your privacy