boot timing causing network naming issue

bBracelet · Jan 25, 2023

Hello,

I've worked through the process of building a dkms kernel module to add the vendor based Realtek R8125B driver support to the current kernel - all seems well. The driver loads as expected and eliminates the previous problem of the NIC dropping connections or renegotiating speeds at increasing shorter intervals over time. However, in the process of adding the driver module, something has changed in the overall device naming and timing of the boot process, or an extra step has been added creating a naming conflict.

Essentially, my console and logs are flooded with martian source packet notifications because seemingly normal broadcast network traffic is being identified as if it's on eth0. However, I did not change the default predictable interface names, yet something is identifying my NIC as eth0 (presumably the new driver), then renaming it to the standard predictable name during boot. On occasion, something syncs up during the boot, and zero martian packets are identified until a reboot, which starts the martian packets all over.

Details:
Proxmox 7.3-4 (kernel 5.15.83-1-pve)

Log entries related to r8169 (previous driver):

Code:

[    0.213850] pci 0000:26:00.0: [10ec:8125] type 00 class 0x020000
[    0.213877] pci 0000:26:00.0: reg 0x10: [io  0xe000-0xe0ff]
[    0.213912] pci 0000:26:00.0: reg 0x18: [mem 0xfb700000-0xfb70ffff 64bit]
[    0.213933] pci 0000:26:00.0: reg 0x20: [mem 0xfb710000-0xfb713fff 64bit]
[    0.214084] pci 0000:26:00.0: supports D1 D2
[    0.214085] pci 0000:26:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.241828] pci 0000:26:00.0: Adding to iommu group 25
[    0.977120] r8169 0000:26:00.0 eth0: RTL8125B, d8:bb:c1:4d:43:45, XID 641, IRQ 97
[    0.977124] r8169 0000:26:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]

[    1.693533] r8169 0000:26:00.0 enp38s0: renamed from eth0

[   14.480441] r8169 0000:26:00.0 enp38s0: Link is Down
[   20.778231] r8169 0000:26:00.0 enp38s0: Link is Up - 1Gbps/Full - flow control off

Log entries related to r8125/eth0:

Code:

[    0.700279] pci 0000:26:00.0: [10ec:8125] type 00 class 0x020000
[    0.700306] pci 0000:26:00.0: reg 0x10: [io  0xe000-0xe0ff]
[    0.700341] pci 0000:26:00.0: reg 0x18: [mem 0xfb700000-0xfb70ffff 64bit]
[    0.700362] pci 0000:26:00.0: reg 0x20: [mem 0xfb710000-0xfb713fff 64bit]
[    0.700513] pci 0000:26:00.0: supports D1 D2
[    0.700514] pci 0000:26:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[    0.727429] pci 0000:26:00.0: Adding to iommu group 25
[    4.826868] r8125 2.5Gigabit Ethernet driver 9.011.00-NAPI loaded

[    4.897005] r8125 0000:26:00.0 enp38s0: renamed from eth0

[   18.598815] eth0: renamed from vethz361BT
[   18.813640] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   19.481924] eth0: renamed from vethr9zIdh
[   19.743433] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   20.790636] eth0: renamed from vethEaF48t
[   21.002453] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   23.905747] r8125: enp38s0: link up
[   31.895463] eth0: renamed from vetheyrzBf
[   34.503827] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   36.318630] IPv4: martian source 255.255.255.255 from 192.168.1.2, on dev eth0
[   36.535913] IPv4: martian source 255.255.255.255 from 192.168.1.2, on dev eth0
[   45.820659] eth0: renamed from vethdc12yx
[   46.303899] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   51.476852] IPv4: martian source 255.255.255.255 from 192.168.1.10, on dev eth0
[   59.863383] IPv4: martian source 255.255.255.255 from 192.168.1.120, on dev eth0

/etc/network/interfaces file:

Code:

auto lo
iface lo inet loopback

auto enp38s0
allow-hotplug enp38s0
iface enp38s0 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0
        ovs_options tag=10 vlan_mode=native-untagged

auto vlan10
iface vlan10 inet static
        address 192.168.1.11/24
        gateway 192.168.1.1
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=10
        ovs_extra set interface ${IFACE} external-ids:iface-id=$(hostname -s)-${IFACE}-vif

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports enp38s0 vlan10

ip address output:

Code:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp38s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovs-system state UP group default qlen 1000
    link/ether c9:aa:b2:4d:34:56 brd ff:ff:ff:ff:ff:ff
    inet6 [...]/64 scope link
       valid_lft forever preferred_lft forever
3: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 46:f4:1c:4c:03:b9 brd ff:ff:ff:ff:ff:ff
4: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether d8:bc:d1:3d:44:56 brd ff:ff:ff:ff:ff:ff
    inet6 [...]/64 scope link
       valid_lft forever preferred_lft forever
5: vlan10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 96:53:6c:3a:3b:24 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.11/24 scope global vlan10
       valid_lft forever preferred_lft forever
    inet6 [...]/64 scope global dynamic mngtmpaddr
       valid_lft 1608sec preferred_lft 1608sec
    inet6 [...]/64 scope link
       valid_lft forever preferred_lft forever

Any insight into initiating the network interface rename earlier in boot process (similar to the previous driver loading) would be greatly helpful. Even a way to combat the eth0 record after boot would be appreciated. I can't find any existence of the eth0 name after boot, but clearly the logs dictate otherwise.

Thanks.

Update: Added additional log entries to highlight the difference in timing during boot to rename eth0 to a predictable network name during the boot process.

bBracelet · Jan 26, 2023

So after digging a bit more, I've found an LXC Container that auto starts on boot did not load it's ethernet (also named eth0) correctly during the proxmox server boot. After manually rebooting the container, the problematic eth0 link came up in the container and the martian packets stopped logging on the host. Looking through the logs, the container isn't started until after the host NIC is renamed to it's predictable name (enp38s0) during the host boot process. Oddly, the container that had issues loading is the 2nd container in the auto boot order. Something still seems off in timing.

Would using Node's "Start on boot delay" possibly help resolve this? Do individual container boot delays then initiate after this initial server delay? Is there a way to force the network driver to load earlier? I'm stumped as to why a container's network eth0 is getting tripped up by this new host network driver. Would renaming the container's network interface from the default eth0 make an impact?

Search

Search

boot timing causing network naming issue

bBracelet

New Member

bBracelet

New Member