Live migration fails, but offline migration OK after server reinstallation

fatalerrors

New Member
Nov 19, 2021
2
0
1
43
Toulouse
www.geoffray-levasseur.org
For a couple of foundation, I run and manage 4 Proxmox VE nodes with Ceph as main storage for VMs. After changing the hardware of one of the nodes, I had to reinstall it since the new motherboard don't support classic boot and had to change for UEFI. I was able to do that as cleanly as possible :
* Successful restoration of SSH keys in /root and node configuration /var/lib/pve-cluster
* Successful cleanup of former system keys of the reinstalled node in older nodes
* Successful restoration of Ceph (yet not simple at all, the documentation IMHO lacks a clean procedure, unless I couldn't find it)
=> As far as I know, everything works as expected, except the online migration (with or without HA) to the reinstalled machine. Online migration from the reinstalled to older nodes works.

Syslog says that when launching a migration :
Code:
May 25 20:13:05 mayon pve-ha-lrm[138560]: <root@pam> starting task UPID:mayon:00021D45:18E456BB:628E71B1:qmigrate:108:root@pam:
May 25 20:13:05 pinatubo pmxcfs[1551671]: [status] notice: received log
May 25 20:13:05 ragang pmxcfs[1659329]: [status] notice: received log
May 25 20:13:05 taal pmxcfs[2715]: [status] notice: received log
May 25 20:13:06 ragang pmxcfs[1659329]: [status] notice: received log
May 25 20:13:06 pinatubo pmxcfs[1551671]: [status] notice: received log
May 25 20:13:06 taal qm[2677758]: <root@pam> starting task UPID:taal:0028DC53:00A24237:628E71B2:qmstart:108:root@pam:
May 25 20:13:06 taal qm[2677843]: start VM 108: UPID:taal:0028DC53:00A24237:628E71B2:qmstart:108:root@pam:
May 25 20:13:06 mayon pmxcfs[1039713]: [status] notice: received logMay 25 20:13:06 taal systemd[1]: Started 108.scope.
May 25 20:13:06 taal systemd-udevd[2677874]: Using default interface naming scheme 'v247'.
May 25 20:13:06 taal systemd-udevd[2677874]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 25 20:13:06 taal kernel: [106338.699827] device tap108i0 entered promiscuous mode
May 25 20:13:06 taal kernel: [106338.707070] vmbr0: port 3(tap108i0) entered blocking state
May 25 20:13:06 taal kernel: [106338.707837] vmbr0: port 3(tap108i0) entered disabled state
May 25 20:13:06 taal kernel: [106338.708621] vmbr0: port 3(tap108i0) entered blocking state
May 25 20:13:06 taal kernel: [106338.709350] vmbr0: port 3(tap108i0) entered forwarding state
May 25 20:13:06 taal systemd-udevd[2677877]: Using default interface naming scheme 'v247'.
May 25 20:13:06 taal systemd-udevd[2677877]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 25 20:13:07 taal kernel: [106339.114985] device tap108i2 entered promiscuous mode
May 25 20:13:07 taal kernel: [106339.122053] vmbr2: port 2(tap108i2) entered blocking state
May 25 20:13:07 taal kernel: [106339.122760] vmbr2: port 2(tap108i2) entered disabled state
May 25 20:13:07 taal kernel: [106339.123603] vmbr2: port 2(tap108i2) entered blocking state
May 25 20:13:07 taal kernel: [106339.124280] vmbr2: port 2(tap108i2) entered forwarding state
May 25 20:13:07 ragang pmxcfs[1659329]: [status] notice: received log
May 25 20:13:07 pinatubo pmxcfs[1551671]: [status] notice: received log
May 25 20:13:07 mayon pmxcfs[1039713]: [status] notice: received log
May 25 20:13:07 taal systemd-udevd[2677877]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 25 20:13:07 taal kernel: [106339.516429] device tap108i3 entered promiscuous mode
May 25 20:13:07 taal kernel: [106339.523566] vmbr3: port 2(tap108i3) entered blocking state
May 25 20:13:07 taal kernel: [106339.524259] vmbr3: port 2(tap108i3) entered disabled state
May 25 20:13:07 taal kernel: [106339.525049] vmbr3: port 2(tap108i3) entered blocking state
May 25 20:13:07 taal kernel: [106339.525804] vmbr3: port 2(tap108i3) entered forwarding state
May 25 20:13:07 taal qm[2677758]: <root@pam> end task UPID:taal:0028DC53:00A24237:628E71B2:qmstart:108:root@pam: OK
May 25 20:13:08 mayon QEMU[2260]: kvm: Unable to write to socket: Broken pipe
May 25 20:13:09 taal qm[2677933]: <root@pam> starting task UPID:taal:0028DCAE:00A243AA:628E71B5:qmstop:108:root@pam:
May 25 20:13:09 taal qm[2677934]: stop VM 108: UPID:taal:0028DCAE:00A243AA:628E71B5:qmstop:108:root@pam:
May 25 20:13:09 taal QEMU[2677852]: kvm: terminating on signal 15 from pid 2677934 (task  PID:taal:0028DCAE:00A243AA:628E71B5:qmstop:108:root@pam:)
May 25 20:13:09 taal qm[2677933]: <root@pam> end task UPID:taal:0028DCAE:00A243AA:628E71B5:qmstop:108:root@pam: OK
May 25 20:13:10 taal kernel: [106342.293536] vmbr0: port 3(tap108i0) entered disabled state
May 25 20:13:10 taal kernel: [106342.524051] vmbr2: port 2(tap108i2) entered disabled state
May 25 20:13:10 taal kernel: [106342.753718] vmbr3: port 2(tap108i3) entered disabled state
May 25 20:13:10 taal qmeventd[1868]: read: Connection reset by peer
May 25 20:13:10 taal systemd[1]: 108.scope: Succeeded.


mayon, pinatubo and ragang are the old nodes that didn't change. The reinstalled node is named taal. On those logs, mayon is the origin node initiating the migration. I suspect lrm to be in trouble as on all the node status is active, except on taal where it's idle. Restarting pve-ha-lrm service is not fixing the issue.

Thank you for your help.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!