[SOLVED] Live migration bug during upgrade to PVE 7.3

bfwdd

Renowned Member
Mar 29, 2016
17
2
68
Dresden
www.bfw-dresden.de
Hi,

we're are having a serious issues with live migration during upgrade to PVE 7.3
Our cluster is a 15 node cluster with external Ceph storage (no HA configured)

Since 4 years we are updating our cluster with an ansible script: (Upgrade packages on host, migrate vms to spare host, reboot and migrate back)

But today all VMs (2x Debian Linux & 2x WIN2012) got stuck with this error:
2022-11-24 15:58:13 start migrate command to tcp:10.40.0.66:60000
2022-11-24 15:58:14 migration active, transferred 52.5 MiB of 8.0 GiB VM-state, 6.4 GiB/s
2022-11-24 15:58:15 average migration speed: 4.0 GiB/s - downtime 119 ms
2022-11-24 15:58:15 migration status: completed
RTNETLINK answers: Operation not supported
2022-11-24 15:58:15 ERROR: tunnel replied 'ERR: resume failed - command '/sbin/bridge fdb append 5E:3C:C0:F3:E0:37 dev tap409i1 master static' failed: exit code 255' to command 'resume 409'
2022-11-24 15:58:19 ERROR: migration finished with problems (duration 00:00:12)
TASK ERROR: migration problems

VM on new host is paused and resume task is failing:

RTNETLINK answers: Operation not supported
TASK ERROR: command '/sbin/bridge fdb append 6A:41:31:8F:4C:18 dev tap409i0 master static' failed: exit code 255

I had to unlock vm, kill the kvm process on both hosts and start them again.

We are using OVSBridge.


I hope somebody can help

Kind regards,
Konrad
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!