[SOLVED] Host network access lost after Proxmox upgrade 7.0 to 7.1 (Router VM with pass through NIC)

patch

Member
Aug 5, 2021
38
3
8
32
I'm new to Proxmox and not sure how to further investigate this fault. Any ideas would be appreciated

Fault
  • Proxmox hypervisor has no network access. Ping 192.168.11.1 gateway has 100% packet loss, chrony fails to access any time servers, update servers are unable to refresh.
  • Lan Web access to Proxmox hypervistor works
  • VMs on Proxmox with pass through Nic all function OK

Recent changes
  • Proxmox fresh install on hardware not long after 7.0 was released
  • Enabled pve-no-subscription repository yesterday and upgraded to debain and Proxmox to current version -> fault started
  • Noted host file had old pve IP so updated to correct IP & subnet. Ensured gateway & DNS also current -> No change in fault
  • Checked modifications to Proxmox hypervistor intact. Changes reapplied -> no change
  • Restart Proxmox hypervistor -> no change

System hardware summary
  • Proxmox a single board computer with 6 Intel NIC
  • One Nic has a bridge in Proxmox and is used for Management console and Proxmox internet access via Lan gateway through pfsense.
  • pfsense runs on a VM with 4 NIC passed through (WAN and 3 LAN)
  • 3cx runs as a VM with 1 NIC passed through (externally connected to VoIP lan with pfsense gateway)

Network configuration
Main network
lan: Gateway / DNS / DHCP / 192.168.11.1/24 (via pfsense)
Proxmox: 192.168.11.50/24, VLAN not used

VoIP network
3CX: 192.168.12.55/24, VLAN not used

pfsense: VLAN aware, Main on VLAN 11, VoIP VAN 12,
Netgear managed switch used to connect devices & servers

11 Network.jpg

11 DNS.jpg

12 Certificates.jpg

Configuration settings
/etc/network/interfaces
Code:
auto lo
iface lo inet loopback

iface enp2s0 inet manual
#Lan5

iface enp1s0 inet manual
#Lan6(Wan passthrough)

iface enp3s0 inet manual
#Lan4(3cx)

iface enp4s0 inet manual
#Lan3(Passthrough)

iface enp5s0 inet manual
#Lan2(Passthrough)

iface enp6s0 inet manual
#Lan1(Passthrough)

auto vmbr0
iface vmbr0 inet static
        address 192.168.11.50/24
        gateway 192.168.11.1
        bridge-ports enp2s0
        bridge-stp off
        bridge-fd 0
#Lan5

/etc/hosts
Code:
127.0.0.1 localhost.localdomain localhost
192.168.11.50 pve.home.arpa pve

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

systemctl status networking.service
Code:
● networking.service - Network initialization
     Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
     Active: active (exited) since Sun 2021-11-21 12:13:02 ACDT; 45min ago
       Docs: man:interfaces(5)
             man:ifup(8)
             man:ifdown(8)
    Process: 1266 ExecStart=/usr/share/ifupdown2/sbin/start-networking start (code=exited, status=0/SUCCESS)
   Main PID: 1266 (code=exited, status=0/SUCCESS)
        CPU: 365ms

Nov 21 12:13:01 pve systemd[1]: Starting Network initialization...
Nov 21 12:13:02 pve networking[1266]: networking: Configuring network interfaces
Nov 21 12:13:02 pve systemd[1]: Finished Network initialization.

ip addr
Code:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.50/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::2f4:21ff:fe68:2750/64 scope link
       valid_lft forever preferred_lft forever


Code:
root@pve:~# cat /etc/default/pveproxy
cat: /etc/default/pveproxy: No such file or directory

Code:
ss -tlpn
Code:
State    Recv-Q    Send-Q    Local Address:Port    Peer Address: Port    Process
LISTEN    0    4096    0.0.0.0:111    0.0.0.0:*    "users:((""rpcbind"",pid=1295,fd=4),(""systemd"",pid=1,fd=35))"
LISTEN    0    4096    127.0.0.1:85    0.0.0.0:*    "users:((""pvedaemon worke"",pid=1787,fd=6),(""pvedaemon worke"",pid=1786,fd=6),(""pvedaemon worke"",pid=1784,fd=6),(""pvedaemon"",pid=1783,fd=6))"
LISTEN    0    128    0.0.0.0:22    0.0.0.0:*    "users:((""sshd"",pid=1496,fd=3))"
LISTEN    0    100    127.0.0.1:25    0.0.0.0:*    "users:((""master"",pid=1729,fd=13))"
LISTEN    0    4096    127.0.0.1:61000    0.0.0.0:*    "users:((""kvm"",pid=1920,fd=15))"
LISTEN    0    4096    127.0.0.1:61001    0.0.0.0:*    "users:((""kvm"",pid=3131,fd=15))"
LISTEN    0    4096    [::]:111    [::]:*    "users:((""rpcbind"",pid=1295,fd=6),(""systemd"",pid=1,fd=37))"
LISTEN    0    128    [::]:22    [::]:*    "users:((""sshd"",pid=1496,fd=4))"
LISTEN    0    4096    *:3128    *:*    "users:((""spiceproxy work"",pid=1808,fd=6),(""spiceproxy"",pid=1807,fd=6))"
LISTEN    0    100    [::1]:25    [::]:*    "users:((""master"",pid=1729,fd=14))"
LISTEN    0    4096    *:8006    *:*    "users:((""pveproxy worker"",pid=408946,fd=6),(""pveproxy worker"",pid=405516,fd=6),(""pveproxy worker"",pid=403942,fd=6),(""pveproxy"",pid=1801,fd=6))"
 

Attachments

  • syslog.log
    743.3 KB · Views: 1
Last edited:

patch

Member
Aug 5, 2021
38
3
8
32
I'm not sure if there are any other tests I can do to look at the Nic configuration settings and confirm nothing is being sent from the hardware.

Given the other issues reported with pass through, I suspect the problem maybe there.
I'm not sure this is correct
Code:
root@pve:~# dmesg | grep -e DMAR -e IOMMU
[    0.017684] ACPI: DMAR 0x000000008C5306F0 0000A8 (v01 INTEL  EDK2     00000002      01000013)
[    0.017731] ACPI: Reserving DMAR table memory at [mem 0x8c5306f0-0x8c530797]
[    0.057620] DMAR: IOMMU enabled
[    0.155501] DMAR: Host address width 39
[    0.155502] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.155510] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[    0.155514] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.155519] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.155522] DMAR: RMRR base: 0x0000008b80c000 end: 0x0000008b82bfff
[    0.155525] DMAR: RMRR base: 0x0000008d800000 end: 0x0000008fffffff
[    0.155529] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.155531] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.155533] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.158276] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.546897] DMAR: No ATSR found
[    0.546898] DMAR: No SATC found
[    0.546900] DMAR: IOMMU feature fl1gp_support inconsistent
[    0.546902] DMAR: IOMMU feature pgsel_inv inconsistent
[    0.546904] DMAR: IOMMU feature nwfs inconsistent
[    0.546905] DMAR: IOMMU feature pasid inconsistent
[    0.546906] DMAR: IOMMU feature eafs inconsistent
[    0.546907] DMAR: IOMMU feature prs inconsistent
[    0.546908] DMAR: IOMMU feature nest inconsistent
[    0.546908] DMAR: IOMMU feature mts inconsistent
[    0.546909] DMAR: IOMMU feature sc_support inconsistent
[    0.546910] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    0.546912] DMAR: dmar0: Using Queued invalidation
[    0.546916] DMAR: dmar1: Using Queued invalidation
[    0.547775] DMAR: Intel(R) Virtualization Technology for Directed I/O

In contrast on an almost identically configured hardware which has not been upgraded / is still on Proxmox 7.0 (pve-manager/7.0-11, Linux 5.11.22-4-pve #1 SMP PVE 5.11.22-8
Code:
root@pve2:~# dmesg | grep -e DMAR -e IOMMU
[    0.018342] ACPI: DMAR 0x000000008C5316F0 0000A8 (v01 INTEL  EDK2     00000002      01000013)
[    0.018398] ACPI: Reserving DMAR table memory at [mem 0x8c5316f0-0x8c531797]
[    0.059226] DMAR: IOMMU enabled
[    0.159141] DMAR: Host address width 39
[    0.159143] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.159152] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[    0.159156] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.159162] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.159166] DMAR: RMRR base: 0x0000008b80d000 end: 0x0000008b82cfff
[    0.159168] DMAR: RMRR base: 0x0000008d800000 end: 0x0000008fffffff
[    0.159172] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.159174] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.159176] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.162148] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    1.972447] DMAR: No ATSR found
[    1.972457] DMAR: dmar0: Using Queued invalidation
[    1.972463] DMAR: dmar1: Using Queued invalidation
[    1.973373] DMAR: Intel(R) Virtualization Technology for Directed I/O
root@pve2:~#



However the above listed
Code:
ip addr
looks OK to me

Looking at the IO groups all looks OK to my untrained eye also
Code:
root@pve:~# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation Coffee Lake HOST and DRAM Controller [8086:3e34] (rev 0c)
IOMMU group 10 00:1a.0 SD Host controller [0805]: Intel Corporation Device [8086:9dc4] (rev 30)
IOMMU group 11 00:1c.0 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port #5 [8086:9dbc] (rev f0)
IOMMU group 12 00:1c.5 PCI bridge [0604]: Intel Corporation Device [8086:9dbd] (rev f0)
IOMMU group 13 00:1c.6 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port #7 [8086:9dbe] (rev f0)
IOMMU group 14 00:1c.7 PCI bridge [0604]: Intel Corporation Cannon Point PCI Express Root Port #8 [8086:9dbf] (rev f0)
IOMMU group 15 00:1d.0 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port #9 [8086:9db0] (rev f0)
IOMMU group 16 00:1d.1 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port #10 [8086:9db1] (rev f0)
IOMMU group 17 00:1f.0 ISA bridge [0601]: Intel Corporation Cannon Point-LP LPC Controller [8086:9d84] (rev 30)
IOMMU group 17 00:1f.3 Audio device [0403]: Intel Corporation Cannon Point-LP High Definition Audio Controller [8086:9dc8] (rev 30)
IOMMU group 17 00:1f.4 SMBus [0c05]: Intel Corporation Cannon Point-LP SMBus Controller [8086:9da3] (rev 30)
IOMMU group 17 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP SPI Controller [8086:9da4] (rev 30)
IOMMU group 18 01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
IOMMU group 19 02:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
IOMMU group 1 00:02.0 VGA compatible controller [0300]: Intel Corporation WhiskeyLake-U GT2 [UHD Graphics 620] [8086:3ea0] (rev 02)
IOMMU group 20 03:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
IOMMU group 21 04:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
IOMMU group 22 05:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
IOMMU group 23 06:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
IOMMU group 2 00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 0c)
IOMMU group 3 00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
IOMMU group 4 00:12.0 Signal processing controller [1180]: Intel Corporation Cannon Point-LP Thermal Controller [8086:9df9] (rev 30)
IOMMU group 5 00:14.0 USB controller [0c03]: Intel Corporation Cannon Point-LP USB 3.1 xHCI Controller [8086:9ded] (rev 30)
IOMMU group 5 00:14.2 RAM memory [0500]: Intel Corporation Cannon Point-LP Shared SRAM [8086:9def] (rev 30)
IOMMU group 5 00:14.5 SD Host controller [0805]: Intel Corporation BayHubTech Integrated SD controller [8086:9df5] (rev 30)
IOMMU group 6 00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP Serial IO I2C Controller #0 [8086:9de8] (rev 30)
IOMMU group 6 00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP Serial IO I2C Controller #1 [8086:9de9] (rev 30)
IOMMU group 7 00:16.0 Communication controller [0780]: Intel Corporation Cannon Point-LP MEI Controller #1 [8086:9de0] (rev 30)
IOMMU group 8 00:17.0 SATA controller [0106]: Intel Corporation Cannon Point-LP SATA Controller [AHCI Mode] [8086:9dd3] (rev 30)
IOMMU group 9 00:19.0 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP Serial IO I2C Host Controller [8086:9dc5] (rev 30)
root@pve:~#
 
Last edited:

patch

Member
Aug 5, 2021
38
3
8
32
Further testing demonstrated:
  • The differences shown above for dmesg | grep -e DMAR -e IOMMU are due to differences between kernel 5.11.22-7-pve and 5.13.19-1-pve (dimonstrated by connecting a keyboard & screen to the hardware to enable selection of the kernel boot options).
  • Network issues when booting Proxmox 7.1 are handled less robustly than Proxmox 7.0 and not corrected when the network comes on line

The fragility is revealed in my setup as the network router (pfsense) runs as one of Proxmox virtual machines, so is not running while Proxmox is booting.
  • Replacing the network router with an old hardware router enables booting Proxmox with either kernel and results in apparently normal function (network up when Proxmox boots).
  • Disconnecting the hardware router from the switch conneneted to the Proxmox hardware enables booting to 5.11.22-7-pve with apparently normal function (within limits of router connection & functionality restored when reconnected)
  • Repeating the above but booting to 5.13.19-1-pve results in Proxmox failing to start, instead reporting a over as screen of entries ending in USB Logitech keyboard messages. Confusingly on repeat testing it boots with no console messages or just "SGX Disbled in bios).

Using the pfsense router VM in Proxmox and allowing default boot to latest kernel 5.13.19-1-pve or 5.13.19-1-pve, continues to result in the hypervistor having no router access. Unsuccessful work arounds tried
  • Restart Proxmox with only single dedicated NIC connected (LAN5) then adding other Ethernet connections after Host & all VM finished starting up.
  • Renewing the network connections in Proxmox by pve -> Network -> edit the comment for the Linux Bridge -> Apply Configuration
  • Cycling power on the network switch interconnecting all LAN NIC
  • Running root@pve:~# systemctl restart networking.service
Which suggest to me the problem is not just in the kernel.

For reference the modifications for pass through added are as shown below. Access to passed through NICs by the hypervisor has not been explicitly blocked, instead relying on dynamic inactivation, which maybe a mistake.
Bash:
root@pve:~# cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on
root@pve:~# cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
root@pve:~#

During the update from Proxmox 7.0 to 7.1 it was
Bash:
root@pve:~# cat /etc/kernel/cmdline
root-ZFS=rpool/ROOT/pve-1 boot=zfs quiet intel_iommu=on iommu=pt

I have also made the equivalent changes for grub but I believe they are not used for and EFI boot
Bash:
root@pve:~# cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox VE"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs"

# Disable os-prober, it might add menu entries for each guest
GRUB_DISABLE_OS_PROBER=true

# Uncomment to enable BadRAM filtering, modify to suit your needs
# This works with Linux (no patch required) and with any kernel that obtains
# the memory map information from GRUB (GNU Mach, kernel of FreeBSD ...)
#GRUB_BADRAM="0x01234567,0xfefefefe,0x89abcdef,0xefefefef"

# Uncomment to disable graphical terminal (grub-pc only)
#GRUB_TERMINAL=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command `vbeinfo'
#GRUB_GFXMODE=640x480

# Uncomment if you don't want GRUB to pass "root=UUID=xxx" parameter to Linux
#GRUB_DISABLE_LINUX_UUID=true

# Disable generation of recovery mode menu entries
GRUB_DISABLE_RECOVERY="true"

# Uncomment to get a beep at grub start
#GRUB_INIT_TUNE="480 440 1"
root@pve:~#
 
Last edited:

sbellon

New Member
Oct 12, 2021
10
0
1
45
After your reports, I'm now even more frightened to upgrade from 7.0 to 7.1 because I'm using (almost) exactly the same setup than yours (except for OPNsense instead of pfsense): Using a 6 NIC fanless Core i5 box with Proxmox VE 7.0 on it to run OPNsense as VM and pihole, unifi and Proxmox BS as containers. Two NICs are configured as PCI-passthrough for WAN and LAN of OPNsense.

Configuration modification in my setup is identical to yours (except for iommu=pt which I have not).

Hoping to see some resolutions on this thread as otherwise I'm reluctant to move to 7.1.
 

patch

Member
Aug 5, 2021
38
3
8
32
I believe the problem is Proxmox is having trouble moving PCI devices / NIC to the vfio-pci kernel driver when it's own network configuration is not established. It thows lots of errors on changing kernel to a later version, disables the Proxmox NIC and can not shutdown but eventually becomes more stable (but with no hypervisor network access)

To simply the task, statically excluding NICs the Proxmox hypervisor will not have access to may help. I'm not sure how to do that though.

lspci -nn (or an extract of it)
Bash:
01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
02:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
03:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
04:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
05:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
06:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)

shows the 6 NIC have different Bus numbers (01:00.00 to 06:00.00 but the same PCI IDs 8086:1533

Which is a problem for /etc/modprobe.d/local.conf which would normal have added
Bash:
options vfio-pci ids=8086:1533

but that will match all 6 NICs however the NIC at bus address 02:00.0 is used by the Proxmox hypervisor so can not be excluded.

As shown above all NIC are in different IOMMU groups so I assume it is possible. Perhaps this is why the boot code gets it wrong.
 

sbellon

New Member
Oct 12, 2021
10
0
1
45
Now I'm even more terrified, because I think we have very similar - if not identical - hardware:

Code:
pve:~# lspci -nn | grep Ethernet
01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
02:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
03:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
04:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
05:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
06:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
 

patch

Member
Aug 5, 2021
38
3
8
32
Statically excluding NICs the Proxmox hypervisor will not have access to must be possible because that is what appears to happen dynamically in
Kernel Version: Linux 5.13.19-1-pve #1 SMP PVE 5.13.19-3​
PVE Manager Version: pve-manager/7.1-6​
Although it is not really a solution to the current pass through instability, but may make systems more resilient to such bugs in the future.

Bash:
root@pve:~# lspci -nnk
00:00.0 Host bridge [0600]: Intel Corporation Coffee Lake HOST and DRAM Controller [8086:3e34] (rev 0c)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Coffee Lake HOST and DRAM Controller [8086:7270]
        Kernel driver in use: skl_uncore
00:02.0 VGA compatible controller [0300]: Intel Corporation WhiskeyLake-U GT2 [UHD Graphics 620] [8086:3ea0] (rev 02)
        DeviceName: Onboard - Video
        Subsystem: Intel Corporation UHD Graphics 620 (Whiskey Lake) [8086:2212]
        Kernel driver in use: i915
        Kernel modules: i915
00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 0c)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:7270]
        Kernel modules: processor_thermal_device
00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:7270]
00:12.0 Signal processing controller [1180]: Intel Corporation Cannon Point-LP Thermal Controller [8086:9df9] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP Thermal Controller [8086:7270]
        Kernel driver in use: intel_pch_thermal
        Kernel modules: intel_pch_thermal
00:14.0 USB controller [0c03]: Intel Corporation Cannon Point-LP USB 3.1 xHCI Controller [8086:9ded] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP USB 3.1 xHCI Controller [8086:7270]
        Kernel driver in use: xhci_hcd
        Kernel modules: xhci_pci
00:14.2 RAM memory [0500]: Intel Corporation Cannon Point-LP Shared SRAM [8086:9def] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP Shared SRAM [8086:7270]
00:14.5 SD Host controller [0805]: Intel Corporation BayHubTech Integrated SD controller [8086:9df5] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation BayHubTech Integrated SD controller [8086:7270]
        Kernel driver in use: sdhci-pci
        Kernel modules: sdhci_pci
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP Serial IO I2C Controller #0 [8086:9de8] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP Serial IO I2C Controller [8086:7270]
        Kernel driver in use: intel-lpss
        Kernel modules: intel_lpss_pci
00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP Serial IO I2C Controller #1 [8086:9de9] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP Serial IO I2C Controller [8086:7270]
        Kernel driver in use: intel-lpss
        Kernel modules: intel_lpss_pci
00:16.0 Communication controller [0780]: Intel Corporation Cannon Point-LP MEI Controller #1 [8086:9de0] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP MEI Controller [8086:7270]
        Kernel driver in use: mei_me
        Kernel modules: mei_me
00:17.0 SATA controller [0106]: Intel Corporation Cannon Point-LP SATA Controller [AHCI Mode] [8086:9dd3] (rev 30)
        DeviceName: Onboard - SATA
        Subsystem: Intel Corporation Cannon Point-LP SATA Controller [AHCI Mode] [8086:7270]
        Kernel driver in use: ahci
        Kernel modules: ahci
00:19.0 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP Serial IO I2C Host Controller [8086:9dc5] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP Serial IO I2C Host Controller [8086:7270]
        Kernel driver in use: intel-lpss
        Kernel modules: intel_lpss_pci
00:1a.0 SD Host controller [0805]: Intel Corporation Device [8086:9dc4] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Device [8086:7270]
        Kernel driver in use: sdhci-pci
        Kernel modules: sdhci_pci
00:1c.0 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port #5 [8086:9dbc] (rev f0)
        Kernel driver in use: pcieport
00:1c.5 PCI bridge [0604]: Intel Corporation Device [8086:9dbd] (rev f0)
        Kernel driver in use: pcieport
00:1c.6 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port #7 [8086:9dbe] (rev f0)
        Kernel driver in use: pcieport
00:1c.7 PCI bridge [0604]: Intel Corporation Cannon Point PCI Express Root Port #8 [8086:9dbf] (rev f0)
        Kernel driver in use: pcieport
00:1d.0 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port #9 [8086:9db0] (rev f0)
        Kernel driver in use: pcieport
00:1d.1 PCI bridge [0604]: Intel Corporation Cannon Point-LP PCI Express Root Port #10 [8086:9db1] (rev f0)
        Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Cannon Point-LP LPC Controller [8086:9d84] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP LPC Controller [8086:7270]
00:1f.3 Audio device [0403]: Intel Corporation Cannon Point-LP High Definition Audio Controller [8086:9dc8] (rev 30)
        DeviceName: Onboard - Sound
        Subsystem: Intel Corporation Cannon Point-LP High Definition Audio Controller [8086:7270]
        Kernel driver in use: snd_hda_intel
        Kernel modules: snd_hda_intel, snd_sof_pci_intel_cnl
00:1f.4 SMBus [0c05]: Intel Corporation Cannon Point-LP SMBus Controller [8086:9da3] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP SMBus Controller [8086:7270]
        Kernel driver in use: i801_smbus
        Kernel modules: i2c_i801
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Point-LP SPI Controller [8086:9da4] (rev 30)
        DeviceName: Onboard - Other
        Subsystem: Intel Corporation Cannon Point-LP SPI Controller [8086:7270]
01:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
        Kernel driver in use: vfio-pci
        Kernel modules: igb
02:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
        Kernel driver in use: igb
        Kernel modules: igb
03:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
        Kernel driver in use: vfio-pci
        Kernel modules: igb
04:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
        Kernel driver in use: vfio-pci
        Kernel modules: igb
05:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
        Kernel driver in use: vfio-pci
        Kernel modules: igb
06:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
        Kernel driver in use: vfio-pci
        Kernel modules: igb
root@pve:~#

The passed through NIC have Kernel driver in use: vfio-pci whereas the NIC used by Proxmox has Kernel driver in use: igb
 
Last edited:

patch

Member
Aug 5, 2021
38
3
8
32
To update / answer my own questions


The following were tried but did not help
  • Restart Proxmox with only single dedicated NIC connected (LAN5) then adding other Ethernet connections after Host & all VM finished starting up.
  • Renewing the network connections in Proxmox by pve -> Network -> edit the comment for the Linux Bridge -> Apply Configuration
  • Cycling power on the network switch interconnecting all LAN NIC
  • Running root@pve:~# systemctl restart networking.service
  • Reverting to Proxmox kernel 5.11.22-7-pve
  • Not using VM with pass through (Clone the pfsense VM, Delete the pass through NIC. Create Equivalent Bridges in Proxmox and add these to the VM clone. Start pfsense clone and assign NIC to the new bridges and VLANs to old VLANs. Login to pfsense using a LAN not VLAN interface and correct which NIC each VLAN is based (as the old are nolonger available pfsense defaults to the first NIC ie the WAN NIC). Note I did not try removing the /etc/kernel/cmdline and /etc/modules
  • I did not try statically passing selected NIC as that requires a script run early during Linux booting and I wasn't sure how reliably this would work with the Proxmox kernel, so not a great debugging tool. For an example of the likely code required see here


What did work is correcting the pfsense DHCP server Static Mapping for Promox. Correcting the mac address ensured the IP address assigned by the pfsense DHCP server (after Proxmox boot up) was the same as set in Proxmox, not just one from the pool.

In more detail
With the incorrect mac address after booting Proxmox, there is no network access however the network interface appears OK
Bash:
root@pve:~# ping 192.168.11.1
PING 192.168.11.1 (192.168.11.1) 56(84) bytes of data.
^C
--- 192.168.11.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2029ms

root@pve:~# ip a && ip r
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.50/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::2f4:21ff:fe68:2750/64 scope link
       valid_lft forever preferred_lft forever
9: vmbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:54 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2754/64 scope link
       valid_lft forever preferred_lft forever
10: vmbr2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:53 brd ff:ff:ff:ff:ff:ff
11: vmbr3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:52 brd ff:ff:ff:ff:ff:ff
12: vmbr4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:51 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2751/64 scope link
       valid_lft forever preferred_lft forever
13: vmbr6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:4f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:274f/64 scope link
       valid_lft forever preferred_lft forever
default via 192.168.11.1 dev vmbr0 proto kernel onlink
192.168.11.0/24 dev vmbr0 proto kernel scope link src 192.168.11.50
root@pve:~#

Refreshing the DHCP settings results in a different IP address (192.168.11.135) being issued, Proxmox GUI responding on both IP addresses but network access is still not functional
Bash:
root@pve:~# dhclient vmbr0
root@pve:~# ip a && ip r
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.50/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet 192.168.11.135/24 brd 192.168.11.255 scope global secondary dynamic vmbr0
       valid_lft 7185sec preferred_lft 7185sec
    inet6 fe80::2f4:21ff:fe68:2750/64 scope link
       valid_lft forever preferred_lft forever
9: vmbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:54 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2754/64 scope link
       valid_lft forever preferred_lft forever
10: vmbr2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:53 brd ff:ff:ff:ff:ff:ff
11: vmbr3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:52 brd ff:ff:ff:ff:ff:ff
12: vmbr4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:51 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2751/64 scope link
       valid_lft forever preferred_lft forever
13: vmbr6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:4f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:274f/64 scope link
       valid_lft forever preferred_lft forever
default via 192.168.11.1 dev vmbr0 proto kernel onlink
192.168.11.0/24 dev vmbr0 proto kernel scope link src 192.168.11.50
root@pve:~# ping 192.168.11.1
PING 192.168.11.1 (192.168.11.1) 56(84) bytes of data.
^C
--- 192.168.11.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2033ms

root@pve:~#

Correcting Pfsense -> DHCP server -> "DHCP Static Mappings for this Interface" -> mac address for Proxmox pve
Resulted in restoration of network access, and it is maintained after restarting Proxmox.

Bash:
root@pve:~# ping 192.168.11.1
PING 192.168.11.1 (192.168.11.1) 56(84) bytes of data.
64 bytes from 192.168.11.1: icmp_seq=1 ttl=64 time=0.266 ms
64 bytes from 192.168.11.1: icmp_seq=2 ttl=64 time=0.338 ms
64 bytes from 192.168.11.1: icmp_seq=3 ttl=64 time=0.275 ms
^C
--- 192.168.11.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2025ms
rtt min/avg/max/mdev = 0.266/0.293/0.338/0.032 ms
root@pve:~# ip a && ip r
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.50/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::2f4:21ff:fe68:2750/64 scope link
       valid_lft forever preferred_lft forever
9: vmbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:54 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2754/64 scope link
       valid_lft forever preferred_lft forever
10: vmbr2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:53 brd ff:ff:ff:ff:ff:ff
11: vmbr3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:52 brd ff:ff:ff:ff:ff:ff
12: vmbr4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:51 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2751/64 scope link
       valid_lft forever preferred_lft forever
13: vmbr6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:4f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:274f/64 scope link
       valid_lft forever preferred_lft forever
default via 192.168.11.1 dev vmbr0 proto kernel onlink
192.168.11.0/24 dev vmbr0 proto kernel scope link src 192.168.11.50
root@pve:~#


However refreshing the DHCP entry then responds differently.
Bash:
root@pve:~# dhclient vmbr0
RTNETLINK answers: File exists
root@pve:~#


It appears updating from Proxmox 7.0 to 7.1 resulted a different behavior when there is a difference between the static IP binding and that issued by a DHCP server.
 
Last edited:

sbellon

New Member
Oct 12, 2021
10
0
1
45
I'm trying to understand ... but I'm not sure I followed correctly.

Where exactly does this different DHCP behaviour come into play? I don't assume that you hand out IPs via DHCP to neither the Proxmox VE server nor the pfsense running on a VM. I assume, both have statically set IPs?

So, even if the pfsense acts as a DHCP server, that will only come into play once PVE as well as pfsense have booted with their respective static IPs. However, while Proxmox is booting - and before the pfsense VM is up and running - most likely Proxmox will not have access to DNS and will not have a route to the gateway and beyond.

But I'm still unsure where your mentioned DHCP issue comes into play ...
 

patch

Member
Aug 5, 2021
38
3
8
32
I'm trying to understand ...
Where your router is a VM running on a Proxmox hypervisor, so when Proxmox boots there is not route to the Internet, no DHCP server, and no DNS.

In Proxmox 7.0
  1. Set the IP address of Proxmox in Proxmox. Nothing else needs to by done.

In Proxmox 7.1
  1. Leave the IP address of Proxmox set in Proxmox (as per step 1. above)
  2. Add a DHCP entry in your DHCP server to set the IP address of Proxmox to the same value set in 1. above

I am aware step 2. is not normally required. It's a work around which worked for me. I'm yet to test if it is still required with the current Proxmox build (getting Promox working again after the last update was a painful experience, so not something I'm in a hurry to repeat).
 
Last edited:

sbellon

New Member
Oct 12, 2021
10
0
1
45
Ok, that relieves me a bit for two reasons:

1. For documentation purposes I put the statically set IPs into "reserved IPs" of the DHCPv4 configuration anyway, so your "step 2" is already done in my setup.

2. I now understand that you lost IPv4 access to the management of the PVE host due to the "mis-assigned" IPv4 address. In my case I "ssh" into PVE management via link-local IPv6 address configured in the .ssh/config file (I do this for all local servers "for reasons"), so I would have this fallback available anyway.

Might look like I try the update on the weekend.

Thanks for your intensive research and explanation of the issue.
 

patch

Member
Aug 5, 2021
38
3
8
32
For documentation purposes I put the statically set IPs into "reserved IPs" of the DHCPv4 configuration anyway
Yep, that's what I thought I had done too until I discovered the mac address was for the NIC not the bridge.

you lost IPv4 access to the management of the PVE host due to the "mis-assigned" IPv4 address
No, access to the Proxmox GUI was not lost.
What did happen is Proxmox lost access to the network (as a result ping, chrony, program update all failed).

Might look like I try the update on the weekend.
Sounds reasonable, the fix is not hard to do if you know it helps.

By the way, as part of the debugging I also set up
  • A fall back hardware router to provide minimum functionality. That way if Proxmox is broken I can change the hardware and my core network functions, and has Internet access.
  • A fall back VM router with no pass through. That way pass through bugs can be readily excluded (as it is less tested).
In hind sight both are useful backup systems to have available.
 
Last edited:

patch

Member
Aug 5, 2021
38
3
8
32
Might look like I try the update on the weekend.
Against my better judgment I tried it too.
The brief summary: Proxmox hypervisor still needs the same work around to get outgoing network access but behaves subtlety different to other fix attempts.

In more detail.

With the work around in place
Proxmox 7.1
  1. Leave the IP address of Proxmox set in Proxmox (as per step 1. above)
  2. Add a DHCP entry in your DHCP server to set the IP address of Proxmox to the same value set in 1. above
I updated to the current no subscription version without issues. Proxmox hypervisor maintained network access, as did the VM's (with pass through NICs).

To check if the work around was still required I:
A. Removed the DHCP entry (2. above by changing the mac address). The Proxmox hypervisor GUI continued to work without change (I didn't have a monitor connected to the hardware so I don't know if any status change information was displayed there).
C. Proxmox pve was rebooted
C. Checking network interface again, which looked OK to me however again Proxmox hypervisor outgoing network access was lost (ping failed)
Bash:
root@pve:~# ip a && ip r
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.50/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::2f4:21ff:fe68:2750/64 scope link
       valid_lft forever preferred_lft forever
9: vmbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:54 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2754/64 scope link
       valid_lft forever preferred_lft forever
10: vmbr2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:53 brd ff:ff:ff:ff:ff:ff
11: vmbr3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:52 brd ff:ff:ff:ff:ff:ff
12: vmbr4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:51 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2751/64 scope link
       valid_lft forever preferred_lft forever
13: vmbr6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:4f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:274f/64 scope link
       valid_lft forever preferred_lft forever
default via 192.168.11.1 dev vmbr0 proto kernel onlink
192.168.11.0/24 dev vmbr0 proto kernel scope link src 192.168.11.50
root@pve:~# ping 192.168.11.1
PING 192.168.11.1 (192.168.11.1) 56(84) bytes of data.
^C
--- 192.168.11.1 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4077ms

root@pve:~#

D. Checked what happened with
Bash:
root@pve:~# dhclient vmbr0
Doing so resulted in a different response.
  • The dhclient vmbr0 did not respond with RTNETLINK answers: File exists which always happens with Proxmox 7.0, has has happened with Proxmox 7.1 the second time this command is run after restarting the hypervisor.
  • Proxmox hypervisor GUI stopped working on its prior IP address 192.168.11.50 (which I have not seen in the past)
  • Proxmox hypervisor GUI started responding on the pool IP address given by the DHCP 192.168.11.135 (as seen with other versions of Proxmox v7.1
  • Proxmox hypervisor outgoing network access was restored (ping worked) which didn't happen in prior versions of Proxmox 7.1
From the pool IP address
Bash:
root@pve:~# ip a && ip r
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
8: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 00:f4:21:68:27:50 brd ff:ff:ff:ff:ff:ff
    inet 192.168.11.135/24 brd 192.168.11.255 scope global dynamic vmbr0
       valid_lft 6884sec preferred_lft 6884sec
    inet6 fe80::2f4:21ff:fe68:2750/64 scope link
       valid_lft forever preferred_lft forever
9: vmbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:54 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2754/64 scope link
       valid_lft forever preferred_lft forever
10: vmbr2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:53 brd ff:ff:ff:ff:ff:ff
11: vmbr3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:52 brd ff:ff:ff:ff:ff:ff
12: vmbr4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:51 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:2751/64 scope link
       valid_lft forever preferred_lft forever
13: vmbr6: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 00:f4:21:68:27:4f brd ff:ff:ff:ff:ff:ff
    inet6 fe80::2f4:21ff:fe68:274f/64 scope link
       valid_lft forever preferred_lft forever
default via 192.168.11.1 dev vmbr0
192.168.11.0/24 dev vmbr0 proto kernel scope link src 192.168.11.135
root@pve:~# dhclient vmbr0
RTNETLINK answers: File exists
root@pve:~# ping 192.168.11.1
PING 192.168.11.1 (192.168.11.1) 56(84) bytes of data.
64 bytes from 192.168.11.1: icmp_seq=1 ttl=64 time=0.293 ms
64 bytes from 192.168.11.1: icmp_seq=2 ttl=64 time=0.316 ms
64 bytes from 192.168.11.1: icmp_seq=3 ttl=64 time=0.335 ms
64 bytes from 192.168.11.1: icmp_seq=4 ttl=64 time=0.295 ms
64 bytes from 192.168.11.1: icmp_seq=5 ttl=64 time=0.288 ms
64 bytes from 192.168.11.1: icmp_seq=6 ttl=64 time=0.261 ms
^C
--- 192.168.11.1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5127ms
rtt min/avg/max/mdev = 0.261/0.298/0.335/0.023 ms
root@pve:~#

Restoring the fix was also a little different
  1. DHCP entry was restored (mac address set back to the vmbr0 mac address)
  2. Proxmox hypervisor GUI continued to work on to pool address
  3. Proxmox hypervisor was restarted.
  4. Boot up was much slower (I assume lots of warnings would have shown on the hardware monitor but it wasn't plugged in), eventually VM started and pass through NICs worked.
  5. Plugging in the hardware console monitor & keyboard & login to root
  6. ip a && ip r showed the network configuration was corrupted, no bridge interfaces (including vmbr0), no enp2s0 NIC (used for Proxmox access) but had enp3s0 (normally pass through to a VM)
  7. cat /etc/network/interfaces showed no corruption.
  8. systemctl status networking.service gave an error due to a dependency problem
  9. Hardware restart button resulted in Proxmox shut down but needed power cycling to actually restart
  10. After power cycling functionality returned to baseline (although may have had to shut it down twice including power cycling, to achieve that)

So in summary the same but different
 

patch

Member
Aug 5, 2021
38
3
8
32
Updated to current no subscription 7.1 version
  • Kernel Linux 5.13.19-2-pve #1 SMP PVE 5.13.19-4
  • Manager pve-manager/7.1-8
Summary -> no change that I could detect.
  • with DHCP server entry setting IPv4 address based on vmbr0 / enp2s0 mac address (as well as the fixed IPv4 specified in Proxmox) the system works
  • dhclient vmbr0 continues to converts the IP to dynamic when first run on v7.1. If run again or run on v7.0 the command returns RTNETLINK answers: File exists
  • If the DHCP entry mac address is changed (+/- the IP address) then Proxmox does a low level boot up, verbosely discovering hardware resources resulting in no bridges being created (so no remote access) and proxmox reboot requires a hardware reset to finish the reboot.
 

patch

Member
Aug 5, 2021
38
3
8
32
Did some more debugging, and found a better solution
Bash:
systemctl restart networking

The system must have been left in a weird state after the update from Proxmox 7.0 to 7.1

In more detail
  1. Having a fixed DHCP mapping in pfsense masked the problem and achieve apparently normal function.
  2. Removing the fixed DHCP mapping in pfsense resulted in no overt change until Proxmox reboot -> Physical console waring Watch dog timer not stopped error -> during subsequent restart Proxmox appeared to start in recovery mode, finding & reporting all hardware resources, running a disk test on zfs partitions and showing an error on starting network services. From the physically attached console reboot shows about 1/3 page of messages, closed VM down but didn't actually stop the hardware (required a hard shutdown of the hardware then physical restart). It then starts up OK and subsequent reboot run OK execept the Proxmox hypervisor has no Network access (can't ping gateway, can't update, Chrony can't access time servers) but the GUI works as does sftp. So I had left the fixed DHCP mapping in pfsense for Proxmox. Note points 1. & 2. reitterate what is described in earlier posts above.
  3. With the fixed DHCP mapping in pfsense for Proxmox, then running systemctl restart networking resulted in the same reboot behavior as descried in 2. above, after two reboots it returns to the behavior described in 1.
  4. Removing the fixed DHCP mapping in pfsense then running systemctl restart networking resulted in systemctl status networking.service showing for each pass through bridges a message like error: vmbr1: bridge port enp6s0 does not exist. However subsequent reboot ran without errors. The fixed DHCP mapping in pfsense could then be added and removed without apparent effect on Proxmox function.

Step 4. above in more detail
With no pfsense static binding for Proxmox then run systemctl restart networking -> GUI, ping, and Proxmox update OK but systemctl status networking.service reported errors
Bash:
root@pve:~# systemctl status networking.service
? networking.service - Network initialization
     Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
     Active: active (exited) since Sun 2021-12-19 14:22:02 ACDT; 6min ago
       Docs: man:interfaces(5)
             man:ifup(8)
             man:ifdown(8)
    Process: 1462 ExecStart=/usr/share/ifupdown2/sbin/start-networking start (code=exited, status=0/SUCCESS)
   Main PID: 1462 (code=exited, status=0/SUCCESS)
        CPU: 526ms

Dec 19 14:22:01 pve systemd[1]: Starting Network initialization...
Dec 19 14:22:01 pve networking[1462]: networking: Configuring network interfaces
Dec 19 14:22:02 pve systemd[1]: Finished Network initialization.
root@pve:~# systemctl restart networking
root@pve:~# systemctl status networking.service
? networking.service - Network initialization
     Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
     Active: active (exited) since Sun 2021-12-19 14:29:38 ACDT; 24min ago
       Docs: man:interfaces(5)
             man:ifup(8)
             man:ifdown(8)
    Process: 15660 ExecStart=/usr/share/ifupdown2/sbin/start-networking start (code=exited, status=0/SUCCESS)
   Main PID: 15660 (code=exited, status=0/SUCCESS)
        CPU: 579ms

Dec 19 14:29:37 pve systemd[1]: Starting Network initialization...
Dec 19 14:29:37 pve networking[15660]: networking: Configuring network interfaces
Dec 19 14:29:37 pve networking[15672]: error: vmbr1: bridge port enp6s0 does not exist
Dec 19 14:29:37 pve networking[15672]: error: vmbr2: bridge port enp5s0 does not exist
Dec 19 14:29:38 pve networking[15672]: error: vmbr3: bridge port enp4s0 does not exist
Dec 19 14:29:38 pve networking[15672]: error: vmbr4: bridge port enp3s0 does not exist
Dec 19 14:29:38 pve networking[15672]: error: vmbr6: bridge port enp1s0 does not exist
Dec 19 14:29:38 pve systemd[1]: Finished Network initialization.
root@pve:~#

After which Proxmox could be restarted cleanly on multiple occasions. The static binding for Proxmox in pfsence could also be added and removed without causing errors on Proxmox.

So I'm hoping Proxmox is more stable after step 4 above.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!