LRM Hangs when updating while Migration is running

Sebastian Schubert

Well-Known Member
Aug 28, 2017
67
12
48
45
Hi there,
today i updated some of our clusteres and started the "Upgrade" while still evacuating the VMs off the Host... the dpgk configure for pve-ha-lrm tries to restart .. but gets stuck



Code:
Apr 26 13:02:37 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:42 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:47 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:50 an1-kvm01-bt01-b systemd[1]: Stopping PVE Local HA Resource Manager Daemon...
Apr 26 13:02:50 an1-kvm01-bt01-b pve-ha-lrm[4559]: received signal TERM
Apr 26 13:02:50 an1-kvm01-bt01-b pve-ha-lrm[4559]: restart LRM, freeze all services
Apr 26 13:02:52 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:57 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:58 an1-kvm01-bt01-b pve-ha-lrm[2936243]: <root@pam> end task UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam: OK

systemctl stop pve-ha-lrm does not finish , manually killing the pid and restarting the pve-ha-lrm results in the node instantly getting fenced and rebooted ...

Any ideas how this can be done better, or shall i open a Ticket ?
 
Last edited:
what do you mean with 'does not finish'? at which point did you kill the LRM? it waits for the workers to finish before restarting..
 
In the pop-up Window where the "apt-get dist-upgrade" is running... its hanging for >20 minutes at
Code:
Processing triggers for pve-ha-manager (3.3-3) ...
Meanwhile in the journal we can see that the "trigger for pve-ha-manager" seems to invoke systemd to restart the LRM ...

after the last lrm task is finished the lrm is still not restrarted (and from what i could see) the machine (beeing migrated off the node while updating) is stuck in "fencing" (output from ha-manager status)
 
okay - I'll attempt to reproduce! which versions were installed prior to the upgrade? (/var/log/apt/history.log should contain the relevant data)
 
Code:
root@an1-kvm01-bt01-b:~# cat /var/log/apt/history.log

Start-Date: 2022-04-26  13:01:51
Commandline: apt-get dist-upgrade
Install: proxmox-websocket-tunnel:amd64 (0.1.0-1, automatic), libproxmox-rs-perl:amd64 (0.1.0, automatic)
Upgrade: udev:amd64 (247.3-6, 247.3-7), proxmox-widget-toolkit:amd64 (3.4-5, 3.4-7), libpve-rs-perl:amd64 (0.5.1, 0.6.0), pve-firmware:amd64 (3.3-5, 3.3-6), gpg:amd64 (2.2.27-2, 2.2.27-2+deb11u1), systemd-timesyncd:amd64 (247.3-6, 247.3-7), tzdata:amd64 (2021a-1+deb11u2, 2021a-1+deb11u3), zfs-zed:amd64 (2.1.2-pve1, 2.1.4-pve1), libpam-systemd:amd64 (247.3-6, 247.3-7), zfs-initramfs:amd64 (2.1.2-pve1, 2.1.4-pve1), libarchive13:amd64 (3.4.3-2+b1, 3.4.3-2+deb11u1), usb.ids:amd64 (2021.06.06-1, 2022.02.15-0+deb11u1), liblzma5:amd64 (5.2.5-2, 5.2.5-2.1~deb11u1), spl:amd64 (2.1.2-pve1, 2.1.4-pve1), pve-qemu-kvm:amd64 (6.1.1-1, 6.1.1-2), libnvpair3linux:amd64 (2.1.2-pve1, 2.1.4-pve1), tasksel-data:amd64 (3.68, 3.68+deb11u1), swtpm-libs:amd64 (0.7.0~rc1+2, 0.7.1~bpo11+1), swtpm-tools:amd64 (0.7.0~rc1+2, 0.7.1~bpo11+1), libuutil3linux:amd64 (2.1.2-pve1, 2.1.4-pve1), tasksel:amd64 (3.68, 3.68+deb11u1), libpve-storage-perl:amd64 (7.0-15, 7.1-1), libtiff5:amd64 (4.2.0-1, 4.2.0-1+deb11u1), libsystemd0:amd64 (247.3-6, 247.3-7), libzpool5linux:amd64 (2.1.2-pve1, 2.1.4-pve1), libnss-systemd:amd64 (247.3-6, 247.3-7), libpve-guest-common-perl:amd64 (4.0-3, 4.1-1), gnupg:amd64 (2.2.27-2, 2.2.27-2+deb11u1), gpg-wks-server:amd64 (2.2.27-2, 2.2.27-2+deb11u1), libflac8:amd64 (1.3.3-2, 1.3.3-2+deb11u1), swtpm:amd64 (0.7.0~rc1+2, 0.7.1~bpo11+1), libxml2:amd64 (2.9.10+dfsg-6.7, 2.9.10+dfsg-6.7+deb11u1), xz-utils:amd64 (5.2.5-2, 5.2.5-2.1~deb11u1), systemd:amd64 (247.3-6, 247.3-7), libudev1:amd64 (247.3-6, 247.3-7), gpg-agent:amd64 (2.2.27-2, 2.2.27-2+deb11u1), novnc-pve:amd64 (1.3.0-1, 1.3.0-2), libc6:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), locales:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), gpgv:amd64 (2.2.27-2, 2.2.27-2+deb11u1), libpve-access-control:amd64 (7.1-6, 7.1-7), pve-container:amd64 (4.1-3, 4.1-4), base-files:amd64 (11.1+deb11u2, 11.1+deb11u3), gzip:amd64 (1.10-4, 1.10-4+deb11u1), libtpms0:amd64 (0.9.0+1, 0.9.2~bpo11+1), gpgsm:amd64 (2.2.27-2, 2.2.27-2+deb11u1), pve-kernel-5.13.19-6-pve:amd64 (5.13.19-14, 5.13.19-15), libc-dev-bin:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), libssl1.1:amd64 (1.1.1k-1+deb11u2, 1.1.1n-0+deb11u1), pve-manager:amd64 (7.1-10, 7.1-12), libpve-common-perl:amd64 (7.1-2, 7.1-5), libc-l10n:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), libc-bin:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), libc-devtools:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), libc6-dev:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), dirmngr:amd64 (2.2.27-2, 2.2.27-2+deb11u1), libzfs4linux:amd64 (2.1.2-pve1, 2.1.4-pve1), systemd-sysv:amd64 (247.3-6, 247.3-7), gnupg-utils:amd64 (2.2.27-2, 2.2.27-2+deb11u1), sysvinit-utils:amd64 (2.96-7, 2.96-7+deb11u1), gnupg-l10n:amd64 (2.2.27-2, 2.2.27-2+deb11u1), gpg-wks-client:amd64 (2.2.27-2, 2.2.27-2+deb11u1), zlib1g:amd64 (1:1.2.11.dfsg-2, 1:1.2.11.dfsg-2+deb11u1), libpve-u2f-server-perl:amd64 (1.1-1, 1.1-2), gpgconf:amd64 (2.2.27-2, 2.2.27-2+deb11u1), pve-kernel-helper:amd64 (7.1-12, 7.1-14), zfsutils-linux:amd64 (2.1.2-pve1, 2.1.4-pve1), openssl:amd64 (1.1.1k-1+deb11u2, 1.1.1n-0+deb11u1), linux-libc-dev:amd64 (5.10.103-1, 5.10.106-1)
Error: Sub-process /usr/bin/dpkg returned an error code (1)
End-Date: 2022-04-26  13:22:01

Start-Date: 2022-04-26  13:26:51
Commandline: apt-get dist-upgrade
End-Date: 2022-04-26  13:26:59

Start-Date: 2022-04-26  13:27:23
Commandline: apt autoremove
Remove: pve-kernel-5.11.22-2-pve:amd64 (5.11.22-4), pve-kernel-5.11.22-3-pve:amd64 (5.11.22-7), pve-kernel-5.11.22-5-pve:amd64 (5.11.22-10)
End-Date: 2022-04-26  13:27:40
 
can reproduce the issue, will keep you posted on a fix!