Migration to new nodes fails: "VLAN aware" bug?

NdK73

Renowned Member
Jul 19, 2012
107
6
83
Bologna, Italy
www.csshl.net
Hello all.
At the end of 2021 I upgraded all the six existing nodes to 7.1 (migrating VMs back and forth w/ no issues, except higher load).
While at it, I also installed 3 new nodes and added 'em to the cluster.
The cluster now sees the 9 nodes online and OK.
But when I try to live migrate a VM from one of the older nodes to a new one I get:

Code:
2022-01-10 09:29:54 starting migration of VM 142 to node 'virt7' (192.168.1.37)
2022-01-10 09:29:54 starting VM 142 on remote node 'virt7'
2022-01-10 09:29:58 [virt7] ovs-vsctl: no bridge named vmbr1
2022-01-10 09:29:58 [virt7] can't add ovs port 'fwln142o0' - command '/usr/bin/ovs-vsctl -- add-port vmbr1 fwln142o0 -- set Interface fwln142o0 'type=internal'' failed: exit code 1
2022-01-10 09:29:58 [virt7]
2022-01-10 09:29:58 [virt7] kvm: -netdev type=tap,id=net0,ifname=tap142i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on: network script /var/lib/qemu-server/pve-bridge failed with status 256
2022-01-10 09:29:58 [virt7] start failed: QEMU exited with code 1
2022-01-10 09:29:58 ERROR: online migrate failure - remote command failed with exit code 255
2022-01-10 09:29:58 aborting phase 2 - cleanup resources
2022-01-10 09:29:58 migrate_cancel
2022-01-10 09:30:00 ERROR: migration finished with problems (duration 00:00:07)
TASK ERROR: migration problems

Obviously I checked the network config to be sure vmbr1 exists also on the target node and maps to the same VLAN. The strangest thing is that it's a Linux bridge and we never used OVS for anything.

BUT if I remove these two spurious lines:
Code:
root@virt7:/etc/network# diff -u interfaces{bad,}
--- interfacesbad       2022-01-10 11:10:45.975357535 +0100
+++ interfaces  2022-01-10 11:11:13.987952707 +0100
@@ -30,8 +30,6 @@
        bridge-ports enp3s0
        bridge-stp off
        bridge-fd 0
-       vlan-id 1
-       vlan-raw-device enp3s0
 #Public net
 
 auto vmbr2
the error goes away.
Maybe it's a bug when removing the "VLAN aware" flag? We experimented a bit with it. Removing those lines from the GUI implies removing an recreating the affected interfaces from scratch, removing "vlan aware" flag is not enough.