[SOLVED] Restarting networking on the host disconnects all clients (permanently)

dmgeurts

New Member
Nov 30, 2023
24
3
3
Proxmox 8.3.2, recent install.
The host has a single NIC, routing enabled and the VMs are connected to two OVS bridges without physical interfaces.

One Bridge uses VLAN tags, and the other doesn't. It happens on both, so I think this is irrelevant.

Two clients: OPNsense (BSD) and Fedora, are both affected in the same way.

The networking service isn't restarted often, but as I'm building VMs and configuring the host this is still an occurrence. However, the same happens when the host is rebooted. This would mean that without manual intervention all VMs would remain disconnected.

I've disabled the Datacenter firewall to see if that makes any difference, but it doesn't. I've seen some references to the VM network device config firewall setting breaking connectivity. But for me it doesn't matter which way the toggle is set. It'll break and switching this toggle brings connectivity back. I'm stumped...
 
Hi,
I had a similar error with Windows Server, what fixed it for me was resetting the network adapter in Windows.
Restarting networking in Linux does not fix the issue. And even so, if one would have to do this on each VM when the host was rebooted or restarted it's network, then this would not be workable for a remotely managed Proxmox server.
 
That just reinserts the iptables rules. Do you have other methods configuring it on the host? Like a script?
Do you need/use the proxmox firewall at all?
The only thing I (want to) use the Proxmox firewall for is for restricting access to Proxmox itself to some static IP addresses. I do not want Proxmox firewalling the VMs at all.

The behaviour remains the same if I turn off the Proxmox firewall at the host and datacenter level. TBH, I'm not familiar with how Proxmox implements firewalling. As I want more from I firewall I run an OPNsense firewall in a VM which will protect the VMs behind it. So less interference from Proxmox the better.
 
That just reinserts the iptables rules. Do you have other methods configuring it on the host? Like a script?
Do you need/use the proxmox firewall at all?
Toggling it off also brings it back, so it doesn't appear to be particularly iptables related as far as the rules go. But more about how packets from a VM are handled in general, as if something gets stuck or the OVS bridge gets confused.
 
Well, on my Hetzner machine, I have turned off every Proxmox provided Firewalls, and wrote an Iptables script to secure down what I need.
But I still think, that you need to check the Iptables config on your side when it fails. (So right after you restart the machine.)
 
Whatever is the root cause, we have to investigate the differences between before and after the "network" restart. I would say, that the routing is fine, I would rather suspect the iptables as the source of the problems.
 
Well, on my Hetzner machine, I have turned off every Proxmox provided Firewalls, and wrote an Iptables script to secure down what I need.
But I still think, that you need to check the Iptables config on your side when it fails. (So right after you restart the machine.)
iptables-save output is the same before and after a host network restart.

When toggling the VM network device firewall, I have to do this for each NIC on each VM, it will only restore the connectivity of that specific network device. So it's something tied to the VM interface.
 
Hah! OVS doesn't recreate the VM interfaces when networking is restarted.

tail -f /var/log/openvswitch/ovs-vswitchd.log

Then restarted networking on the host:

Code:
2024-12-20T11:03:40.650Z|00620|bridge|INFO|bridge vmbr1: deleted interface vlan117 on port 1
2024-12-20T11:03:40.726Z|00621|bridge|INFO|bridge vmbr1: deleted interface fwln100o1 on port 5
2024-12-20T11:03:40.726Z|00622|bridge|INFO|bridge vmbr1: deleted interface fwln101o0 on port 4
2024-12-20T11:03:40.726Z|00623|bridge|INFO|bridge vmbr1: deleted interface fwln199o0 on port 3
2024-12-20T11:03:40.726Z|00624|bridge|INFO|bridge vmbr1: deleted interface fwln199o1 on port 7
2024-12-20T11:03:40.726Z|00625|bridge|INFO|bridge vmbr1: deleted interface vmbr1 on port 65534
2024-12-20T11:03:40.949Z|00626|bridge|INFO|bridge vmbr0: deleted interface vmbr0 on port 65534
2024-12-20T11:03:40.949Z|00627|bridge|INFO|bridge vmbr0: deleted interface fwln100o0 on port 1
2024-12-20T11:03:41.606Z|00628|dpif_netlink|INFO|Datapath dispatch mode: per-cpu
2024-12-20T11:03:41.606Z|00629|ofproto_dpif|INFO|system@ovs-system: Datapath supports recirculation
2024-12-20T11:03:41.606Z|00630|ofproto_dpif|INFO|system@ovs-system: VLAN header stack length probed as 2
2024-12-20T11:03:41.606Z|00631|ofproto_dpif|INFO|system@ovs-system: MPLS label stack length probed as 3
2024-12-20T11:03:41.606Z|00632|ofproto_dpif|INFO|system@ovs-system: Datapath supports truncate action
2024-12-20T11:03:41.606Z|00633|ofproto_dpif|INFO|system@ovs-system: Datapath supports unique flow ids
2024-12-20T11:03:41.606Z|00634|ofproto_dpif|INFO|system@ovs-system: Datapath supports clone action
2024-12-20T11:03:41.606Z|00635|ofproto_dpif|INFO|system@ovs-system: Max sample nesting level probed as 10
2024-12-20T11:03:41.606Z|00636|ofproto_dpif|INFO|system@ovs-system: Datapath supports eventmask in conntrack action
2024-12-20T11:03:41.606Z|00637|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_clear action
2024-12-20T11:03:41.606Z|00638|ofproto_dpif|INFO|system@ovs-system: Max dp_hash algorithm probed to be 1
2024-12-20T11:03:41.606Z|00639|ofproto_dpif|INFO|system@ovs-system: Datapath supports check_pkt_len action
2024-12-20T11:03:41.607Z|00640|ofproto_dpif|INFO|system@ovs-system: Datapath supports timeout policy in conntrack action
2024-12-20T11:03:41.607Z|00641|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_zero_snat
2024-12-20T11:03:41.607Z|00642|ofproto_dpif|INFO|system@ovs-system: Datapath supports add_mpls action
2024-12-20T11:03:41.607Z|00643|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state
2024-12-20T11:03:41.607Z|00644|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_zone
2024-12-20T11:03:41.607Z|00645|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_mark
2024-12-20T11:03:41.607Z|00646|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label
2024-12-20T11:03:41.607Z|00647|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state_nat
2024-12-20T11:03:41.607Z|00648|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple
2024-12-20T11:03:41.607Z|00649|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_orig_tuple6
2024-12-20T11:03:41.607Z|00650|ofproto_dpif|INFO|system@ovs-system: Datapath does not support IPv6 ND Extensions
2024-12-20T11:03:41.607Z|00651|ofproto_dpif_upcall|INFO|Overriding n-handler-threads to 32, setting n-revalidator-threads to 9
2024-12-20T11:03:41.607Z|00652|ofproto_dpif_upcall|INFO|Starting 41 threads
2024-12-20T11:03:41.682Z|00653|netdev|WARN|failed to set MTU for network device vmbr1: No such device
2024-12-20T11:03:41.682Z|00654|bridge|INFO|bridge vmbr1: added interface vmbr1 on port 65534
2024-12-20T11:03:41.683Z|00655|bridge|INFO|bridge vmbr1: using datapath ID 00006ea37f4e524e
2024-12-20T11:03:41.683Z|00656|connmgr|INFO|vmbr1: added service controller "punix:/var/run/openvswitch/vmbr1.mgmt"
2024-12-20T11:03:41.761Z|00657|netdev|WARN|failed to set MTU for network device vlan117: No such device
2024-12-20T11:03:41.761Z|00658|bridge|INFO|bridge vmbr1: added interface vlan117 on port 1
2024-12-20T11:03:41.831Z|00659|bridge|INFO|bridge vmbr0: added interface vmbr0 on port 65534
2024-12-20T11:03:41.831Z|00660|bridge|INFO|bridge vmbr0: using datapath ID 00004af755d6f449
2024-12-20T11:03:41.831Z|00661|connmgr|INFO|vmbr0: added service controller "punix:/var/run/openvswitch/vmbr0.mgmt"

Unticked all the firewall tickboxes from three VMs, and these lines are logged. New interfaces are added, this should have happened above:

Code:
2024-12-20T11:04:01.636Z|00662|bridge|INFO|bridge vmbr0: added interface tap100i0 on port 1
2024-12-20T11:04:07.201Z|00663|bridge|INFO|bridge vmbr1: added interface tap100i1 on port 2
2024-12-20T11:04:13.466Z|00664|bridge|INFO|bridge vmbr1: added interface tap101i0 on port 3
2024-12-20T11:04:19.995Z|00665|bridge|INFO|bridge vmbr1: added interface tap199i0 on port 4
2024-12-20T11:04:24.972Z|00666|bridge|INFO|bridge vmbr1: added interface tap199i1 on port 5

What happens when the firewall is again selected? The tap interface is replaced with a fwln interface:

Code:
2024-12-20T11:04:45.650Z|00667|bridge|INFO|bridge vmbr1: deleted interface tap199i1 on port 5
2024-12-20T11:04:45.710Z|00668|netdev|WARN|failed to set MTU for network device fwln199o1: No such device
2024-12-20T11:04:45.710Z|00669|bridge|INFO|bridge vmbr1: added interface fwln199o1 on port 6

And when reverting back, we see that the opposite happens:

Code:
2024-12-20T11:05:25.657Z|00670|bridge|INFO|bridge vmbr1: deleted interface fwln199o1 on port 6
2024-12-20T11:05:25.757Z|00671|bridge|INFO|bridge vmbr1: added interface tap199i1 on port 7

Is this a bug or a config error?
 
Last edited:
you cannot restart the networking service without interrupting networking.. it's not supposed to be restarted. with ifupdown2, you can use the builtin reload feature to apply changes at runtime if they can be applied.
 
It seems this has been reported before: https://forum.proxmox.com/threads/m...onnectivity-after-host-network-restart.98101/ I'd like to avoid the hacked workarounds used there.

I agree that restarting networking will interrupt traffic. But if a reboot or a network restart ends up removing all the VM network devices from the OVS bridge ports, then this is a quite serious problem to a user.

As stated, restarting networking is not a usual task, but during maintenance, it does happen. Having to go through all the VMs to reconfigure the network devices is far from ideal. As for a reboot, nobody likes them, but they're a necessary evil and do happen.
 
To be on the safe side, I just installed the latest update v8.3.3 and rebooted. All the VMs came back with network connectivity. So, I'm going to have to assume that my initial problem with reboots may have been flukes. I will do some more reboots over the next few days, but for now I think it's safe to say that when using OVS the networking service should not be restarted, but instead ifupdown2's ifreload -a should be used.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!