LRM Hangs when updating while Migration is running

Aug 28, 2017
67
12
48
44
Hi there,
today i updated some of our clusteres and started the "Upgrade" while still evacuating the VMs off the Host... the dpgk configure for pve-ha-lrm tries to restart .. but gets stuck



Code:
Apr 26 13:02:37 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:42 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:47 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:50 an1-kvm01-bt01-b systemd[1]: Stopping PVE Local HA Resource Manager Daemon...
Apr 26 13:02:50 an1-kvm01-bt01-b pve-ha-lrm[4559]: received signal TERM
Apr 26 13:02:50 an1-kvm01-bt01-b pve-ha-lrm[4559]: restart LRM, freeze all services
Apr 26 13:02:52 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:57 an1-kvm01-bt01-b pve-ha-lrm[2936243]: Task 'UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam:' still active, waiting
Apr 26 13:02:58 an1-kvm01-bt01-b pve-ha-lrm[2936243]: <root@pam> end task UPID:an1-kvm01-bt01-b:002CCDB4:23DD61B2:6267D049:qmigrate:146:root@pam: OK

systemctl stop pve-ha-lrm does not finish , manually killing the pid and restarting the pve-ha-lrm results in the node instantly getting fenced and rebooted ...

Any ideas how this can be done better, or shall i open a Ticket ?
 
Last edited:
what do you mean with 'does not finish'? at which point did you kill the LRM? it waits for the workers to finish before restarting..
 
In the pop-up Window where the "apt-get dist-upgrade" is running... its hanging for >20 minutes at
Code:
Processing triggers for pve-ha-manager (3.3-3) ...
Meanwhile in the journal we can see that the "trigger for pve-ha-manager" seems to invoke systemd to restart the LRM ...

after the last lrm task is finished the lrm is still not restrarted (and from what i could see) the machine (beeing migrated off the node while updating) is stuck in "fencing" (output from ha-manager status)
 
okay - I'll attempt to reproduce! which versions were installed prior to the upgrade? (/var/log/apt/history.log should contain the relevant data)
 
Code:
root@an1-kvm01-bt01-b:~# cat /var/log/apt/history.log

Start-Date: 2022-04-26  13:01:51
Commandline: apt-get dist-upgrade
Install: proxmox-websocket-tunnel:amd64 (0.1.0-1, automatic), libproxmox-rs-perl:amd64 (0.1.0, automatic)
Upgrade: udev:amd64 (247.3-6, 247.3-7), proxmox-widget-toolkit:amd64 (3.4-5, 3.4-7), libpve-rs-perl:amd64 (0.5.1, 0.6.0), pve-firmware:amd64 (3.3-5, 3.3-6), gpg:amd64 (2.2.27-2, 2.2.27-2+deb11u1), systemd-timesyncd:amd64 (247.3-6, 247.3-7), tzdata:amd64 (2021a-1+deb11u2, 2021a-1+deb11u3), zfs-zed:amd64 (2.1.2-pve1, 2.1.4-pve1), libpam-systemd:amd64 (247.3-6, 247.3-7), zfs-initramfs:amd64 (2.1.2-pve1, 2.1.4-pve1), libarchive13:amd64 (3.4.3-2+b1, 3.4.3-2+deb11u1), usb.ids:amd64 (2021.06.06-1, 2022.02.15-0+deb11u1), liblzma5:amd64 (5.2.5-2, 5.2.5-2.1~deb11u1), spl:amd64 (2.1.2-pve1, 2.1.4-pve1), pve-qemu-kvm:amd64 (6.1.1-1, 6.1.1-2), libnvpair3linux:amd64 (2.1.2-pve1, 2.1.4-pve1), tasksel-data:amd64 (3.68, 3.68+deb11u1), swtpm-libs:amd64 (0.7.0~rc1+2, 0.7.1~bpo11+1), swtpm-tools:amd64 (0.7.0~rc1+2, 0.7.1~bpo11+1), libuutil3linux:amd64 (2.1.2-pve1, 2.1.4-pve1), tasksel:amd64 (3.68, 3.68+deb11u1), libpve-storage-perl:amd64 (7.0-15, 7.1-1), libtiff5:amd64 (4.2.0-1, 4.2.0-1+deb11u1), libsystemd0:amd64 (247.3-6, 247.3-7), libzpool5linux:amd64 (2.1.2-pve1, 2.1.4-pve1), libnss-systemd:amd64 (247.3-6, 247.3-7), libpve-guest-common-perl:amd64 (4.0-3, 4.1-1), gnupg:amd64 (2.2.27-2, 2.2.27-2+deb11u1), gpg-wks-server:amd64 (2.2.27-2, 2.2.27-2+deb11u1), libflac8:amd64 (1.3.3-2, 1.3.3-2+deb11u1), swtpm:amd64 (0.7.0~rc1+2, 0.7.1~bpo11+1), libxml2:amd64 (2.9.10+dfsg-6.7, 2.9.10+dfsg-6.7+deb11u1), xz-utils:amd64 (5.2.5-2, 5.2.5-2.1~deb11u1), systemd:amd64 (247.3-6, 247.3-7), libudev1:amd64 (247.3-6, 247.3-7), gpg-agent:amd64 (2.2.27-2, 2.2.27-2+deb11u1), novnc-pve:amd64 (1.3.0-1, 1.3.0-2), libc6:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), locales:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), gpgv:amd64 (2.2.27-2, 2.2.27-2+deb11u1), libpve-access-control:amd64 (7.1-6, 7.1-7), pve-container:amd64 (4.1-3, 4.1-4), base-files:amd64 (11.1+deb11u2, 11.1+deb11u3), gzip:amd64 (1.10-4, 1.10-4+deb11u1), libtpms0:amd64 (0.9.0+1, 0.9.2~bpo11+1), gpgsm:amd64 (2.2.27-2, 2.2.27-2+deb11u1), pve-kernel-5.13.19-6-pve:amd64 (5.13.19-14, 5.13.19-15), libc-dev-bin:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), libssl1.1:amd64 (1.1.1k-1+deb11u2, 1.1.1n-0+deb11u1), pve-manager:amd64 (7.1-10, 7.1-12), libpve-common-perl:amd64 (7.1-2, 7.1-5), libc-l10n:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), libc-bin:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), libc-devtools:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), libc6-dev:amd64 (2.31-13+deb11u2, 2.31-13+deb11u3), dirmngr:amd64 (2.2.27-2, 2.2.27-2+deb11u1), libzfs4linux:amd64 (2.1.2-pve1, 2.1.4-pve1), systemd-sysv:amd64 (247.3-6, 247.3-7), gnupg-utils:amd64 (2.2.27-2, 2.2.27-2+deb11u1), sysvinit-utils:amd64 (2.96-7, 2.96-7+deb11u1), gnupg-l10n:amd64 (2.2.27-2, 2.2.27-2+deb11u1), gpg-wks-client:amd64 (2.2.27-2, 2.2.27-2+deb11u1), zlib1g:amd64 (1:1.2.11.dfsg-2, 1:1.2.11.dfsg-2+deb11u1), libpve-u2f-server-perl:amd64 (1.1-1, 1.1-2), gpgconf:amd64 (2.2.27-2, 2.2.27-2+deb11u1), pve-kernel-helper:amd64 (7.1-12, 7.1-14), zfsutils-linux:amd64 (2.1.2-pve1, 2.1.4-pve1), openssl:amd64 (1.1.1k-1+deb11u2, 1.1.1n-0+deb11u1), linux-libc-dev:amd64 (5.10.103-1, 5.10.106-1)
Error: Sub-process /usr/bin/dpkg returned an error code (1)
End-Date: 2022-04-26  13:22:01

Start-Date: 2022-04-26  13:26:51
Commandline: apt-get dist-upgrade
End-Date: 2022-04-26  13:26:59

Start-Date: 2022-04-26  13:27:23
Commandline: apt autoremove
Remove: pve-kernel-5.11.22-2-pve:amd64 (5.11.22-4), pve-kernel-5.11.22-3-pve:amd64 (5.11.22-7), pve-kernel-5.11.22-5-pve:amd64 (5.11.22-10)
End-Date: 2022-04-26  13:27:40
 
can reproduce the issue, will keep you posted on a fix!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!