[SOLVED] Network fails on VM/CTs after migration to another node

mart.v

Well-Known Member
Mar 21, 2018
32
0
46
44
Hello,

I'm experiencing a problem after migration of VM/CT (it happens on both) to another node in Proxmox cluster (they are on the same network, subnet and they have the exact setup). It does not depend on the type of migration (online/offline) either.

The thing is that public network interface (eth0 in VM bridget to vmbr0 in HOST) does not work in any other node than the one on which has been VM/CT created. I even tried to set a new (unused) IP address after migration but nothing. I cannot even ping the host node from inside CT/VM. Everything stars working (without any other changes) when I move CT/VM back to the node where I created it.

Interesting fact is that when I add a second interface (private network only, bridged through vmbr1), it works fine even after live migration.

Can you help me how to debug/resolve this issue? I tried to analyze logs, but nothing suspicious.
Thanks!
 
Check your network configuration on the hosts. The numbering of the NIC interfaces may not be the same on both machines, making the network configuration mismatch the interface on one of the hosts.
 
Check your network configuration on the hosts. The numbering of the NIC interfaces may not be the same on both machines, making the network configuration mismatch the interface on one of the hosts.

Thank you. You are right, physical interfaces on host nodes have different names (one server has 10G on board, the other one via PCIe). But the bridge is always vmbr0.

How can I resolve this?
 
OK, I tried to unify all interfaces names across the cluster. I tried to use two different approaches (described for example here: https://forum.proxmox.com/threads/no-70-persistent-net-rules-on-proxmox-5.38501/)

1) Making *.link file in /etc/sytemd/network/
2) Making rules file in /etc/udv/rules.d/

Well, none of them really worked.

After a few reboots somehow managed to make this partially work. Strage thing is that on SOME (not always) reboots the system waits for this task: "A start job is running for udev wait for Complete Device Initialization" (there is a countdown to 3 minutes) and after that, some of the interfaces get "rename*" name, for example "rename7" instead of "lan0" (similar problem described here: "https://forum.proxmox.com/threads/pve5-fix-those-predictable-network-interface-names.37210/ )

Another reboot usually solves it (and the interface got correct name), but c'mon, that is not a real solution. I googled this and tried to disable responsible service "systemctl mask systemd-udev-settle". Well, that didn't work either because it appears that there is another (different) service who does the similar thing (there is a countdown during the boot, this time to 5 minutes with text "A start job is running for raise network interfaces"

After that I didn't try further because turning off system services does not seem like a way to go.

Any ideas?

EDIT: I have found out that the "renamed" interface is actually the bridge.
Mar 21 22:53:34 node3 kernel: [ 7.504439] rename6: renamed from vmbr0
I have no clue why is this happening :(
 
Last edited:
I finally accomplished to rename the inferfaces. I have added these lines into the file /etc/udev/rules.d/70-persistent-net.rules:

ACTION=="add", SUBSYSTEMS=="pci", SUBSYSTEM=="net",ATTR{address}=="ac:1f:6b:09:cb:58",NAME="net0"

The key was to mention SUBSYSTEMS=="pci" because if that string was missing udev tried to rename the bridge as well (has the same MAC).

BUT unfortunately this didnt solve my problem. I have the same interfaces on all servers and network still doesnt work after migration.
 
In my opinion, unifying interface names to the level that they match the name on every host, is cumbersome and unnecessary. You setup your network interfaces once and then forget about it.

OK, I may misunderstood your first reply. I thought that you were saying that my network interfaces have to be exactly the same on all hosts in the cluster. Can you please try to explain your first reply?
 
Check your network configuration on the hosts. The numbering of the NIC interfaces may not be the same on both machines, making the network configuration mismatch the interface on one of the hosts.
The interface naming may differ between hosts, as a different naming scheme may apply. Instead of trying to name all the interface the same, configure the appropriate interface (irregardless naming). From a VM/CT point of view, the only thing that needs to have the same name, is the bridge "vmbrX".
 
I have vmbr0 (external) and vmbr1 (internal) interface on all cluster nodes. Names are exactly the same.

Below I attached screen from hardware settings of VM. net0 and net1 are interfaces created for example on HN1. After I move the VM to HN2, net0 stops working (net1 still works fine). When I add a new net2 interface to the same bridge (but on HN2), external connection starts working through the net2 interface (net0 is still inaccessible).

After I move the VM back to HN1, net2 interface stops working, but net0 starts working. I have literally no idea how to resolve this.

gZtim-detQMx_yW9Y45kA9dBeFkvWi3oBi20CQKR0Vg
 
Do you use any firewall rules or other network filters (host or switch)?
Code:
+--------+      +----------------+      +----------------+      +---------------+
| switch | <--> | NIC port (PVE) | <--> | bridge (vmbrX) | <--> | NIC port (VM) |
+--------+      +----------------+      +----------------+      +---------------+
Somewhere in that path, there is either a misconfiguration or firewall/fitler.

Thank you. You are right, physical interfaces on host nodes have different names (one server has 10G on board, the other one via PCIe). But the bridge is always vmbr0.
The interface naming on the PVE hosts, should be different, onboard vs PCIe.
 
No, I do not use any kind of hardware or software firewall at this moment. The whole proxmox installation is clean with no manual edit (except of the network interface renaming, which was useless :)).
 
Last edited:
They PVE hosts can ping each other on the net0 (vmbr0) interface?
Does a clean start of the VM on all the host, result in a working network in the VM?

And to be safe, I suggest, to remove the custom udev naming rules.
 
They PVE hosts can ping each other on the net0 (vmbr0) interface?
Does a clean start of the VM on all the host, result in a working network in the VM?

And to be safe, I suggest, to remove the custom udev naming rules.

Yes, they are able to ping each other through vmbr0 (public IP addresses).

Unfortunately, no. When i do a clean VM shutdown on HN2, migrate the VM on HN1 and power it on, network interfaces bridged to vmbr0 do not work. Connection through vmbr1 still works though.

I have removed all custom names and revert it back to predictible interface names (default).

I also tried to remove vmbr0 (from linux bridge) and replace it with Open vSwitch. The problem still persists.
 
Then there might be a misconfiguration inside the VM or some setting on the switch that connects the hosts.
 
Thank you for your replies, they helped me to resolve the problem.

It was the switch. Apparently there is a function that links MAC address to the physical port and does not allow to move the MAC to different port. It is called port security and must be switched off for all Proxmox HN ports.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!