Host losing default gateway on reboot

Aug 2, 2024
1
0
1
On one of my proxmox hosts, I started losing network connectivity on reboot. There weren't any updates or changes that immediately preceded the issue - so I'm a bit at a loss for causes of the change in behavior. I'm not seeing this on the other two hosts.

Looking closer, it appears that when ifupdown tries to set the default gateway, it's not yet reachable and the routing table update fails.

Network connectivity may take a few seconds to come up, but I feel like that alone shouldn't prevent a default gateway from being present until manual intervention. If I take bond0 or vlan100 down and back up, the gateway is set and I can continue as normal until the next reboot.

I also understand that there are some reported workarounds, i.e. by moving my vlan interfaces from the physical interface to the bridge (confirmed as a workaround), or adding the following to the affected interface:
Bash:
      post-up ifconfig $IFACE up
      pre-down ifconfig $IFACE down

But for something that has been working without issues, the sudden instability seems odd and I have a (blanket) expectation that things I can set as a user through a UI without any validation warnings should work reliably without an expectation that I perform workarounds. Particularly as Proxmox is poised to be a great replacement for VMware.

If this setup isn't expected to work, can we have any best practices document (maybe I missed them), or validation in code to warn the user that they may not be configuring network in a reliable manner?





Bash:
root@host1:~# journalctl -xe | grep ifup
Aug 02 00:44:14 host1 systemd[1]: Starting ifupdown2-pre.service - Helper to synchronize boot up for ifupdown...
░░ Subject: A start job for unit ifupdown2-pre.service has begun execution
░░ A start job for unit ifupdown2-pre.service has begun execution.
Aug 02 00:44:16 host1 systemd[1]: Finished ifupdown2-pre.service - Helper to synchronize boot up for ifupdown.
░░ Subject: A start job for unit ifupdown2-pre.service has finished successfully
░░ A start job for unit ifupdown2-pre.service has finished successfully.
Aug 02 00:44:17 host1 /usr/sbin/ifup[1313]: error: vlan100: cmd '/bin/ip route replace default via 172.20.1.1 proto kernel dev vlan100 onlink' failed: returned 2 (Error: Nexthop device is not up.
Aug 02 00:44:17 host1 networking[1313]: error: >>> Full logs available in: /var/log/ifupdown2/network_config_ifupdown2_86062_Aug-02-2024_00:44:17.166770 <<<
Aug 02 00:44:17 host1 /usr/sbin/ifup[1313]: >>> Full logs available in: /var/log/ifupdown2/network_config_ifupdown2_86062_Aug-02-2024_00:44:17.166770 <<<

In the log file, I see a more verbose version of Nexthop device is not up. :
Bash:
2024-08-02 00:44:17,460: MainThread: ifupdown.address: modulebase.py:124:log_error(): debug:   File "/usr/sbin/ifup", line 135, in <module>
    sys.exit(main())
   File "/usr/sbin/ifup", line 123, in main
    return stand_alone()
   File "/usr/sbin/ifup", line 103, in stand_alone
    status = ifupdown2.main()
   File "/usr/share/ifupdown2/ifupdown/main.py", line 77, in main
    self.handlers.get(self.op)(self.args)
   File "/usr/share/ifupdown2/ifupdown/main.py", line 193, in run_up
    ifupdown_handle.up(['pre-up', 'up', 'post-up'],
   File "/usr/share/ifupdown2/ifupdown/ifupdownmain.py", line 1845, in up
    ret = self._sched_ifaces(filtered_ifacenames, ops,
   File "/usr/share/ifupdown2/ifupdown/ifupdownmain.py", line 1568, in _sched_ifaces
    ifaceScheduler.sched_ifaces(self, ifacenames, ops,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 595, in sched_ifaces
    cls.run_iface_list(ifupdownobj, run_queue, ops,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 325, in run_iface_list
    cls.run_iface_graph(ifupdownobj, ifacename, ops, parent,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 315, in run_iface_graph
    cls.run_iface_list_ops(ifupdownobj, ifaceobjs, ops)
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 188, in run_iface_list_ops
    cls.run_iface_op(ifupdownobj, ifaceobj, op,
   File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 106, in run_iface_op
    m.run(ifaceobj, op,
   File "/usr/share/ifupdown2/addons/address.py", line 1606, in run
    op_handler(self, ifaceobj,
   File "/usr/share/ifupdown2/addons/address.py", line 1231, in _up
    self._add_delete_gateway(ifaceobj, gateways, prev_gw)
   File "/usr/share/ifupdown2/addons/address.py", line 759, in _add_delete_gateway
    self.log_error('%s: %s' % (ifaceobj.name, str(e)))
   File "/usr/share/ifupdown2/ifupdownaddons/modulebase.py", line 121, in log_error
    stack = traceback.format_stack()
2024-08-02 00:44:17,460: MainThread: ifupdown.address: modulebase.py:125:log_error(): debug: Traceback (most recent call last):
  File "/usr/share/ifupdown2/addons/address.py", line 757, in _add_delete_gateway
    self.iproute2.route_add_gateway(ifaceobj.name, add_gw, vrf, metric, onlink=self.l3_intf_default_gateway_set_onlink)
  File "/usr/share/ifupdown2/lib/iproute2.py", line 881, in route_add_gateway
    utils.exec_command(cmd)
  File "/usr/share/ifupdown2/ifupdown/utils.py", line 420, in exec_command
    return cls._execute_subprocess(shlex.split(cmd),
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/ifupdown2/ifupdown/utils.py", line 398, in _execute_subprocess
    raise Exception(cls._format_error(cmd,
Exception: cmd '/bin/ip route replace default via 172.20.1.1 proto kernel dev vlan100 onlink' failed: returned 2 (Error: Nexthop device is not up.
)
2024-08-02 00:44:17,461: MainThread: ifupdown: scheduler.py:114:run_iface_op(): error: vlan100: cmd '/bin/ip route replace default via 172.20.1.1 proto kernel dev vlan100 onlink' failed: returned 2 (Error: Nexthop device is not up.
)

Bash:
root@host1:/var/log/ifupdown2/network_config_ifupdown2_86062_Aug-02-2024_00:44:17.166770# cat interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto enp3s0
iface enp3s0 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves eno1 enp3s0
        bond-miimon 100
        bond-mode balance-xor
        bond-xmit-hash-policy layer3+4
        mtu 9000

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 2-4094
        mtu 9000

auto vlan100
iface vlan100 inet static
        address 172.20.1.4/27
        gateway 172.20.1.1
        vlan-raw-device bond0
#cloud-mgmt

auto vlan103
iface vlan103 inet static
        address 172.20.1.111/27
        mtu 9000
        vlan-raw-device bond0
#cloud-storage

auto vlan101
iface vlan101 inet static
        address 172.20.1.36/27
        mtu 9000
        vlan-raw-device bond0
#cloud-cluster

source /etc/network/interfaces.d/*


root@host1:/var/log/ifupdown2/network_config_ifupdown2_86062_Aug-02-2024_00:44:17.166770# cat interfaces.d/sdn
#version:3

auto vlan200
iface vlan200
        bridge_ports vmbr0.200
        bridge_stp off
        bridge_fd 0
        mtu 9000
        alias srvr_prod

auto vlan201
iface vlan201
        bridge_ports vmbr0.201
        bridge_stp off
        bridge_fd 0
        mtu 9000
        alias srvr_preprod

auto vlan202
iface vlan202
        bridge_ports vmbr0.202
        bridge_stp off
        bridge_fd 0
        mtu 9000
        alias srvr_dev
 
Last edited:
I did find a note that worked about setting link_master_slave=0 in /etc/network/ifupdown2/ifupdown2.conf

that did (at least temporarily) work around the issue, and allow the default route to be created at boot time.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!