On one of my proxmox hosts, I started losing network connectivity on reboot. There weren't any updates or changes that immediately preceded the issue - so I'm a bit at a loss for causes of the change in behavior. I'm not seeing this on the other two hosts.
Looking closer, it appears that when ifupdown tries to set the default gateway, it's not yet reachable and the routing table update fails.
Network connectivity may take a few seconds to come up, but I feel like that alone shouldn't prevent a default gateway from being present until manual intervention. If I take bond0 or vlan100 down and back up, the gateway is set and I can continue as normal until the next reboot.
I also understand that there are some reported workarounds, i.e. by moving my vlan interfaces from the physical interface to the bridge (confirmed as a workaround), or adding the following to the affected interface:
But for something that has been working without issues, the sudden instability seems odd and I have a (blanket) expectation that things I can set as a user through a UI without any validation warnings should work reliably without an expectation that I perform workarounds. Particularly as Proxmox is poised to be a great replacement for VMware.
If this setup isn't expected to work, can we have any best practices document (maybe I missed them), or validation in code to warn the user that they may not be configuring network in a reliable manner?
In the log file, I see a more verbose version of
Looking closer, it appears that when ifupdown tries to set the default gateway, it's not yet reachable and the routing table update fails.
Network connectivity may take a few seconds to come up, but I feel like that alone shouldn't prevent a default gateway from being present until manual intervention. If I take bond0 or vlan100 down and back up, the gateway is set and I can continue as normal until the next reboot.
I also understand that there are some reported workarounds, i.e. by moving my vlan interfaces from the physical interface to the bridge (confirmed as a workaround), or adding the following to the affected interface:
Bash:
post-up ifconfig $IFACE up
pre-down ifconfig $IFACE down
But for something that has been working without issues, the sudden instability seems odd and I have a (blanket) expectation that things I can set as a user through a UI without any validation warnings should work reliably without an expectation that I perform workarounds. Particularly as Proxmox is poised to be a great replacement for VMware.
If this setup isn't expected to work, can we have any best practices document (maybe I missed them), or validation in code to warn the user that they may not be configuring network in a reliable manner?
Bash:
root@host1:~# journalctl -xe | grep ifup
Aug 02 00:44:14 host1 systemd[1]: Starting ifupdown2-pre.service - Helper to synchronize boot up for ifupdown...
░░ Subject: A start job for unit ifupdown2-pre.service has begun execution
░░ A start job for unit ifupdown2-pre.service has begun execution.
Aug 02 00:44:16 host1 systemd[1]: Finished ifupdown2-pre.service - Helper to synchronize boot up for ifupdown.
░░ Subject: A start job for unit ifupdown2-pre.service has finished successfully
░░ A start job for unit ifupdown2-pre.service has finished successfully.
Aug 02 00:44:17 host1 /usr/sbin/ifup[1313]: error: vlan100: cmd '/bin/ip route replace default via 172.20.1.1 proto kernel dev vlan100 onlink' failed: returned 2 (Error: Nexthop device is not up.
Aug 02 00:44:17 host1 networking[1313]: error: >>> Full logs available in: /var/log/ifupdown2/network_config_ifupdown2_86062_Aug-02-2024_00:44:17.166770 <<<
Aug 02 00:44:17 host1 /usr/sbin/ifup[1313]: >>> Full logs available in: /var/log/ifupdown2/network_config_ifupdown2_86062_Aug-02-2024_00:44:17.166770 <<<
In the log file, I see a more verbose version of
Nexthop device is not up.
:
Bash:
2024-08-02 00:44:17,460: MainThread: ifupdown.address: modulebase.py:124:log_error(): debug: File "/usr/sbin/ifup", line 135, in <module>
sys.exit(main())
File "/usr/sbin/ifup", line 123, in main
return stand_alone()
File "/usr/sbin/ifup", line 103, in stand_alone
status = ifupdown2.main()
File "/usr/share/ifupdown2/ifupdown/main.py", line 77, in main
self.handlers.get(self.op)(self.args)
File "/usr/share/ifupdown2/ifupdown/main.py", line 193, in run_up
ifupdown_handle.up(['pre-up', 'up', 'post-up'],
File "/usr/share/ifupdown2/ifupdown/ifupdownmain.py", line 1845, in up
ret = self._sched_ifaces(filtered_ifacenames, ops,
File "/usr/share/ifupdown2/ifupdown/ifupdownmain.py", line 1568, in _sched_ifaces
ifaceScheduler.sched_ifaces(self, ifacenames, ops,
File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 595, in sched_ifaces
cls.run_iface_list(ifupdownobj, run_queue, ops,
File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 325, in run_iface_list
cls.run_iface_graph(ifupdownobj, ifacename, ops, parent,
File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 315, in run_iface_graph
cls.run_iface_list_ops(ifupdownobj, ifaceobjs, ops)
File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 188, in run_iface_list_ops
cls.run_iface_op(ifupdownobj, ifaceobj, op,
File "/usr/share/ifupdown2/ifupdown/scheduler.py", line 106, in run_iface_op
m.run(ifaceobj, op,
File "/usr/share/ifupdown2/addons/address.py", line 1606, in run
op_handler(self, ifaceobj,
File "/usr/share/ifupdown2/addons/address.py", line 1231, in _up
self._add_delete_gateway(ifaceobj, gateways, prev_gw)
File "/usr/share/ifupdown2/addons/address.py", line 759, in _add_delete_gateway
self.log_error('%s: %s' % (ifaceobj.name, str(e)))
File "/usr/share/ifupdown2/ifupdownaddons/modulebase.py", line 121, in log_error
stack = traceback.format_stack()
2024-08-02 00:44:17,460: MainThread: ifupdown.address: modulebase.py:125:log_error(): debug: Traceback (most recent call last):
File "/usr/share/ifupdown2/addons/address.py", line 757, in _add_delete_gateway
self.iproute2.route_add_gateway(ifaceobj.name, add_gw, vrf, metric, onlink=self.l3_intf_default_gateway_set_onlink)
File "/usr/share/ifupdown2/lib/iproute2.py", line 881, in route_add_gateway
utils.exec_command(cmd)
File "/usr/share/ifupdown2/ifupdown/utils.py", line 420, in exec_command
return cls._execute_subprocess(shlex.split(cmd),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/share/ifupdown2/ifupdown/utils.py", line 398, in _execute_subprocess
raise Exception(cls._format_error(cmd,
Exception: cmd '/bin/ip route replace default via 172.20.1.1 proto kernel dev vlan100 onlink' failed: returned 2 (Error: Nexthop device is not up.
)
2024-08-02 00:44:17,461: MainThread: ifupdown: scheduler.py:114:run_iface_op(): error: vlan100: cmd '/bin/ip route replace default via 172.20.1.1 proto kernel dev vlan100 onlink' failed: returned 2 (Error: Nexthop device is not up.
)
Bash:
root@host1:/var/log/ifupdown2/network_config_ifupdown2_86062_Aug-02-2024_00:44:17.166770# cat interfaces
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet manual
auto enp3s0
iface enp3s0 inet manual
auto bond0
iface bond0 inet manual
bond-slaves eno1 enp3s0
bond-miimon 100
bond-mode balance-xor
bond-xmit-hash-policy layer3+4
mtu 9000
auto vmbr0
iface vmbr0 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9000
auto vlan100
iface vlan100 inet static
address 172.20.1.4/27
gateway 172.20.1.1
vlan-raw-device bond0
#cloud-mgmt
auto vlan103
iface vlan103 inet static
address 172.20.1.111/27
mtu 9000
vlan-raw-device bond0
#cloud-storage
auto vlan101
iface vlan101 inet static
address 172.20.1.36/27
mtu 9000
vlan-raw-device bond0
#cloud-cluster
source /etc/network/interfaces.d/*
root@host1:/var/log/ifupdown2/network_config_ifupdown2_86062_Aug-02-2024_00:44:17.166770# cat interfaces.d/sdn
#version:3
auto vlan200
iface vlan200
bridge_ports vmbr0.200
bridge_stp off
bridge_fd 0
mtu 9000
alias srvr_prod
auto vlan201
iface vlan201
bridge_ports vmbr0.201
bridge_stp off
bridge_fd 0
mtu 9000
alias srvr_preprod
auto vlan202
iface vlan202
bridge_ports vmbr0.202
bridge_stp off
bridge_fd 0
mtu 9000
alias srvr_dev
Last edited: