For a couple of foundation, I run and manage 4 Proxmox VE nodes with Ceph as main storage for VMs. After changing the hardware of one of the nodes, I had to reinstall it since the new motherboard don't support classic boot and had to change for UEFI. I was able to do that as cleanly as possible :
* Successful restoration of SSH keys in /root and node configuration /var/lib/pve-cluster
* Successful cleanup of former system keys of the reinstalled node in older nodes
* Successful restoration of Ceph (yet not simple at all, the documentation IMHO lacks a clean procedure, unless I couldn't find it)
=> As far as I know, everything works as expected, except the online migration (with or without HA) to the reinstalled machine. Online migration from the reinstalled to older nodes works.
Syslog says that when launching a migration :
mayon, pinatubo and ragang are the old nodes that didn't change. The reinstalled node is named taal. On those logs, mayon is the origin node initiating the migration. I suspect lrm to be in trouble as on all the node status is active, except on taal where it's idle. Restarting pve-ha-lrm service is not fixing the issue.
Thank you for your help.
* Successful restoration of SSH keys in /root and node configuration /var/lib/pve-cluster
* Successful cleanup of former system keys of the reinstalled node in older nodes
* Successful restoration of Ceph (yet not simple at all, the documentation IMHO lacks a clean procedure, unless I couldn't find it)
=> As far as I know, everything works as expected, except the online migration (with or without HA) to the reinstalled machine. Online migration from the reinstalled to older nodes works.
Syslog says that when launching a migration :
Code:
May 25 20:13:05 mayon pve-ha-lrm[138560]: <root@pam> starting task UPID:mayon:00021D45:18E456BB:628E71B1:qmigrate:108:root@pam:
May 25 20:13:05 pinatubo pmxcfs[1551671]: [status] notice: received log
May 25 20:13:05 ragang pmxcfs[1659329]: [status] notice: received log
May 25 20:13:05 taal pmxcfs[2715]: [status] notice: received log
May 25 20:13:06 ragang pmxcfs[1659329]: [status] notice: received log
May 25 20:13:06 pinatubo pmxcfs[1551671]: [status] notice: received log
May 25 20:13:06 taal qm[2677758]: <root@pam> starting task UPID:taal:0028DC53:00A24237:628E71B2:qmstart:108:root@pam:
May 25 20:13:06 taal qm[2677843]: start VM 108: UPID:taal:0028DC53:00A24237:628E71B2:qmstart:108:root@pam:
May 25 20:13:06 mayon pmxcfs[1039713]: [status] notice: received logMay 25 20:13:06 taal systemd[1]: Started 108.scope.
May 25 20:13:06 taal systemd-udevd[2677874]: Using default interface naming scheme 'v247'.
May 25 20:13:06 taal systemd-udevd[2677874]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 25 20:13:06 taal kernel: [106338.699827] device tap108i0 entered promiscuous mode
May 25 20:13:06 taal kernel: [106338.707070] vmbr0: port 3(tap108i0) entered blocking state
May 25 20:13:06 taal kernel: [106338.707837] vmbr0: port 3(tap108i0) entered disabled state
May 25 20:13:06 taal kernel: [106338.708621] vmbr0: port 3(tap108i0) entered blocking state
May 25 20:13:06 taal kernel: [106338.709350] vmbr0: port 3(tap108i0) entered forwarding state
May 25 20:13:06 taal systemd-udevd[2677877]: Using default interface naming scheme 'v247'.
May 25 20:13:06 taal systemd-udevd[2677877]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 25 20:13:07 taal kernel: [106339.114985] device tap108i2 entered promiscuous mode
May 25 20:13:07 taal kernel: [106339.122053] vmbr2: port 2(tap108i2) entered blocking state
May 25 20:13:07 taal kernel: [106339.122760] vmbr2: port 2(tap108i2) entered disabled state
May 25 20:13:07 taal kernel: [106339.123603] vmbr2: port 2(tap108i2) entered blocking state
May 25 20:13:07 taal kernel: [106339.124280] vmbr2: port 2(tap108i2) entered forwarding state
May 25 20:13:07 ragang pmxcfs[1659329]: [status] notice: received log
May 25 20:13:07 pinatubo pmxcfs[1551671]: [status] notice: received log
May 25 20:13:07 mayon pmxcfs[1039713]: [status] notice: received log
May 25 20:13:07 taal systemd-udevd[2677877]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
May 25 20:13:07 taal kernel: [106339.516429] device tap108i3 entered promiscuous mode
May 25 20:13:07 taal kernel: [106339.523566] vmbr3: port 2(tap108i3) entered blocking state
May 25 20:13:07 taal kernel: [106339.524259] vmbr3: port 2(tap108i3) entered disabled state
May 25 20:13:07 taal kernel: [106339.525049] vmbr3: port 2(tap108i3) entered blocking state
May 25 20:13:07 taal kernel: [106339.525804] vmbr3: port 2(tap108i3) entered forwarding state
May 25 20:13:07 taal qm[2677758]: <root@pam> end task UPID:taal:0028DC53:00A24237:628E71B2:qmstart:108:root@pam: OK
May 25 20:13:08 mayon QEMU[2260]: kvm: Unable to write to socket: Broken pipe
May 25 20:13:09 taal qm[2677933]: <root@pam> starting task UPID:taal:0028DCAE:00A243AA:628E71B5:qmstop:108:root@pam:
May 25 20:13:09 taal qm[2677934]: stop VM 108: UPID:taal:0028DCAE:00A243AA:628E71B5:qmstop:108:root@pam:
May 25 20:13:09 taal QEMU[2677852]: kvm: terminating on signal 15 from pid 2677934 (task PID:taal:0028DCAE:00A243AA:628E71B5:qmstop:108:root@pam:)
May 25 20:13:09 taal qm[2677933]: <root@pam> end task UPID:taal:0028DCAE:00A243AA:628E71B5:qmstop:108:root@pam: OK
May 25 20:13:10 taal kernel: [106342.293536] vmbr0: port 3(tap108i0) entered disabled state
May 25 20:13:10 taal kernel: [106342.524051] vmbr2: port 2(tap108i2) entered disabled state
May 25 20:13:10 taal kernel: [106342.753718] vmbr3: port 2(tap108i3) entered disabled state
May 25 20:13:10 taal qmeventd[1868]: read: Connection reset by peer
May 25 20:13:10 taal systemd[1]: 108.scope: Succeeded.
mayon, pinatubo and ragang are the old nodes that didn't change. The reinstalled node is named taal. On those logs, mayon is the origin node initiating the migration. I suspect lrm to be in trouble as on all the node status is active, except on taal where it's idle. Restarting pve-ha-lrm service is not fixing the issue.
Thank you for your help.
Last edited: