[SOLVED] cluster stuck upon last updates

ieronymous · Mar 28, 2023

Hi

I know that there were some similar issues since the earlier versions of proxmox about this matter and that is why I am going to include the outcome of those commands seen been asked for further assistance.

Issue happened before 2 hours when I tried to update the first node. I let it sit for quite long time and it got stuck at 98%. Following below the outcome of the update procedure.

Code:

root@dellprox3:~# apt upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following NEW packages will be installed:
  pve-kernel-5.15.102-1-pve
The following packages have been kept back:
  proxmox-ve pve-kernel-helper
The following packages will be upgraded:
  libnss-systemd libpam-systemd libpve-access-control libpve-cluster-api-perl libpve-cluster-perl libpve-common-perl
  libpve-guest-common-perl libpve-http-server-perl libpve-rs-perl libpve-storage-perl libsystemd0 libudev1
  proxmox-widget-toolkit pve-cluster pve-container pve-docs pve-edk2-firmware pve-firewall pve-firmware pve-ha-manager
  pve-i18n pve-kernel-5.15 pve-manager pve-qemu-kvm qemu-server systemd systemd-sysv tzdata udev
29 upgraded, 1 newly installed, 0 to remove and 2 not upgraded.
Need to get 229 MB of archives.
After this operation, 408 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 libsystemd0 amd64 247.3-7+1-pmx11u1 [376 kB]
Get:2 http://ftp.gr.debian.org/debian bullseye-updates/main amd64 tzdata all 2021a-1+deb11u9 [286 kB]
Get:3 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 libpam-systemd amd64 247.3-7+1-pmx11u1 [283 kB]
Get:4 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 libnss-systemd amd64 247.3-7+1-pmx11u1 [199 kB]
Get:5 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 systemd amd64 247.3-7+1-pmx11u1 [4,501 kB]
Get:6 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 udev amd64 247.3-7+1-pmx11u1 [1,464 kB]
Get:7 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 libudev1 amd64 247.3-7+1-pmx11u1 [168 kB]
Get:8 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 systemd-sysv amd64 247.3-7+1-pmx11u1 [113 kB]
Get:9 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 libpve-rs-perl amd64 0.7.5 [1,885 kB]
Get:10 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 libpve-cluster-api-perl all 7.3-3 [46.2 kB]
Get:11 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 libpve-cluster-perl all 7.3-3 [28.1 kB]
Found initrd image: /boot/initrd.img-5.15.85-1-pve
Found linux image: /boot/vmlinuz-5.15.74-1-pve
Found initrd image: /boot/initrd.img-5.15.74-1-pve
Found memtest86+ image: /ROOT/pve-1@/boot/memtest86+.bin
Found memtest86+ multiboot image: /ROOT/pve-1@/boot/memtest86+_multiboot.bin
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done
Setting up udev (247.3-7+1-pmx11u1) ...
Setting up pve-i18n (2.11-1) ...
Setting up libpam-systemd:amd64 (247.3-7+1-pmx11u1) ...
Setting up libpve-cluster-perl (7.3-3) ...
Setting up libpve-http-server-perl (4.2-1) ...
Setting up pve-edk2-firmware (3.20230228-1) ...
Setting up pve-kernel-5.15 (7.3-3) ...
Setting up libpve-storage-perl (7.4-2) ...
Setting up libpve-access-control (7.4-2) ...
Setting up libpve-cluster-api-perl (7.3-3) ...
Setting up libpve-guest-common-perl (4.2-4) ...
Setting up pve-firewall (4.3-1) ...
Setting up qemu-server (7.4-3) ...
Setting up pve-container (4.4-3) ...
Setting up pve-ha-manager (3.6.0) ...
watchdog-mux.service is a disabled or a static unit, not starting it.
Setting up pve-manager (7.4-3) ...
Progress: [ 98%] [################################################################################################..]

ssh-ing again to the server and issuing an upgrade command gives me
Waiting for cache lock: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 6729 (apt)

Same happened to the 3rd node I tried to update, got stuck to 98% as well and running upgrade again just gave me a different process id was locking /var/lib/dpkg/lock-frontend.

Some useful output for you (based on previous threads I found)

pveversion -v

Code:

proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: not correctly installed (running version: 7.4-3/9002ab8a)
pve-kernel-helper: 7.3-2
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: not correctly installed
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: not correctly installed
libpve-apiclient-perl: 3.2-1
libpve-common-perl: not correctly installed
libpve-guest-common-perl: not correctly installed
libpve-http-server-perl: not correctly installed
libpve-rs-perl: not correctly installed
libpve-storage-perl: not correctly installed
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: not correctly installed
lxcfs: not correctly installed
novnc-pve: not correctly installed
proxmox-backup-client: not correctly installed
proxmox-backup-file-restore: not correctly installed
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: not correctly installed
pve-cluster: not correctly installed
pve-container: not correctly installed
pve-docs: not correctly installed
pve-edk2-firmware: not correctly installed
pve-firewall: not correctly installed
pve-firmware: not correctly installed
pve-ha-manager: not correctly installed
pve-i18n: not correctly installed
pve-qemu-kvm: not correctly installed
pve-xtermjs: 4.16.0-1
qemu-server: not correctly installed
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: not correctly installed
vncterm: 1.7-1
zfsutils-linux: not correctly installed

....which gives a bunch of not correctly installed

ps waux | grep pveproxy

Code:

www-data    2055  0.0  0.8 352860 144044 ?       Ss   12:06   0:00 pveproxy
www-data   25536  0.0  0.8 353376 131688 ?       S    12:29   0:00 pveproxy worker
www-data   25537  0.0  0.8 353376 131688 ?       S    12:29   0:00 pveproxy worker
www-data   25538  0.0  0.8 353376 131688 ?       S    12:29   0:00 pveproxy worker
root       52354  0.0  0.0   6244   644 pts/2    S+   15:11   0:00 grep pveproxy

cat /proc/6729/stack

Code:

[<0>] do_select+0x57c/0x870
[<0>] core_sys_select+0x1b0/0x3e0
[<0>] do_pselect.constprop.0+0xca/0x170
[<0>] __x64_sys_pselect6+0x5c/0xa0
[<0>] do_syscall_64+0x59/0xc0
[<0>] entry_SYSCALL_64_after_hwframe+0x61/0xcb

and finally the long list ....... of ps faxl (ok I have to upload to a txt file for this one since it has to many lines)

When there is a cluster configuration all nodes should be online for updating individually each one of them?

What can I do now except from rebooting and render the cluster useless.

Thank you

Moayad · Mar 28, 2023

Hi,

Thank you for the above outputs!

May I ask you what is kind of your server, because I remember that I see a similar issue on the old server? and can you also provide us with additional information like the below:

Bash:

cat /var/log/apt/term.log
apt-cache policy pve-manager
apt list --upgradable
df -h

Chris · Mar 28, 2023

Hi,
first of all never run apt upgrade to upgrade your PVE nodes, this is asking for trouble, run apt-get dist-upgrade instead!

Regarding the stuck setup: I can remember a similar issue where the setup hangs with systemd-tty-ask-password-agent during upgrades when the cluster nodes have no quorum, see e.g. [0]. Please check pvecm status on the hanging node and try to temporarily give it quorum with pvecm expected 1.

[0] https://forum.proxmox.com/threads/proxmox-7-1-update-fails.108215/

ieronymous · Mar 28, 2023

Moayad said:
cat /var/log/apt/term.log

See attached file even if I didn t notice anything weird there.

Moayad said:
apt-cache policy pve-manager

Code:

pve-manager:
  Installed: 7.4-3
  Candidate: 7.4-3
  Version table:
 *** 7.4-3 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
        100 /var/lib/dpkg/status
     7.3-6 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.3-4 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.3-3 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.2-15 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.2-14 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.2-13 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.2-11 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.2-7 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.2-6 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.2-5 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.2-4 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.2-3 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-13 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-12 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-11 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-10 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-9 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-8 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-7 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-6 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-5 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.1-4 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-15 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-14+1 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-14 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-13 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-12 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-11 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-10 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-9 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-8 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-6 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-5 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.0-4 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages

Moayad said:
apt list --upgradable

Code:

proxmox-ve/stable 7.4-1 all [upgradable from: 7.3-1]
pve-kernel-helper/stable 7.3-8 all [upgradable from: 7.3-2]

Moayad said:
df -h

Code:

Filesystem        Size  Used Avail Use% Mounted on
udev              7.8G     0  7.8G   0% /dev
tmpfs             1.6G  1.2M  1.6G   1% /run
rpool/ROOT/pve-1  216G  2.4G  213G   2% /
tmpfs             7.8G   42M  7.8G   1% /dev/shm
tmpfs             5.0M     0  5.0M   0% /run/lock
dellstorage       900G  128K  900G   1% /dellstorage
rpool             213G  128K  213G   1% /rpool
rpool/ROOT        213G  128K  213G   1% /rpool/ROOT
rpool/data        213G  128K  213G   1% /rpool/data
tmpfs             1.6G     0  1.6G   0% /run/user/0
/dev/fuse         128M   28K  128M   1% /etc/pve

As for the h/w specs you asked. All 3 nodes are based on Dell optiplex sff 3050 with 6700 / 16gb ram / ssd 256 for boot-pool and 1Tb nvme for Vm storage.

ieronymous · Mar 28, 2023

Chris said:
first of all never run apt upgrade to upgrade your PVE nodes, this is asking for trouble, run apt-get dist-upgrade instead!

... thought that this was still on debate. Some noticed corruptions with upgrade but others with dist-upgrade .
-get is deprecated and apt alone will do the job with the same results I think (at least I not using it).

So in a cluster environment is there a specific way to run updates, than each node separately?

When I run updates on first node, the other two were closed. Maybe this caused the issue ?
I tried not one a standalone node (different from the clustered) and updated fine.
Is there an issue with the latest updates and cluster environment or just me having the other 2 nodes closed.

Chris said:
pvecm status

Code:

Cluster information
-------------------
Name:             dellopti-ring
Config Version:   3
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Mar 28 16:41:55 2023
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.53
Quorate:          No

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      1
Quorum:           2 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.10.1 (local)

Activity blocked ....... so it mattered that the other 2 nodes were closed

ieronymous · Mar 28, 2023

Chris said:
pvecm expected 1

Hey that did the trick as you mentioned. I issued this on the first and third node and the stuck 98% went all along to finish line.
So was my previous assumption right? I need all nodes active in order to update them?

After the pvecm expected 1 how do I establish the normal behavior of quorum back?
Since there are 2 nodes i just enter pvecm expected 3 on each node?

cave · Mar 28, 2023

ieronymous said:
When I run updates on first node, the other two were closed.

what is the state "closed"?

Chris · Mar 28, 2023

ieronymous said:
Hey that did the trick as you mentioned. I issued this on the first and third node and the stuck 98% went all along to finish line.
So was my previous assumption right? I need all nodes active in order to update them?

After the pvecm expected 1 how do I establish the normal behavior of quorum back?
Since there are 2 nodes i just enter pvecm expected 3 on each node?

Bring the other nodes up in order for the cluster to get quorate again. You should not have shut down the other nodes in the cluster in order to start the upgrade process. The cluster needs to be quorate for that.

ieronymous said:
... thought that this was still on debate.

There is no debate about wheter to use apt-get upgrade or apt-get dist-upgrade, always use the later, as it will also handle changing dependencies and also remove some packages if needed. apt has no stable api so the use of apt-get is preferred, although not really an issue (I also prefer to use apt most of the time).

ieronymous · Mar 28, 2023

cave said:
what is the state "closed"?

oh... I meant shutted down - not online. I opened only one for other purpose and thought to update it first.

ieronymous · Mar 28, 2023

Chris said:
Bring the other nodes up in order for the cluster to get quorate again. You should not have shut down the other nodes in the cluster in order to start the upgrade process. The cluster needs to be quorate for that.

Didn t know that ... I assumed it afterwards.

Chris said:
There is no debate about wheter to use apt-get upgrade or apt-get dist-upgrade, always use the later,

Noted.

Thank you for all your answers and quick response in general.

So finally, how do I revent the command pvecm expected 1 back in normal mode? Except having all nodes online is there anything else I should do for the nodes I issued the command in order to unblock - unfreeze?

Chris · Mar 29, 2023

ieronymous said:
So finally, how do I revent the command pvecm expected 1 back in normal mode? Except having all nodes online is there anything else I should do for the nodes I issued the command in order to unblock - unfreeze?

No, you don't need to execute any additional command, as soon as the other nodes come online and corosync rejoins them in the cluster, also the expected votes get updated. Just make sure that all nodes are online and check the status of the cluster by running pvecm status.

elBradford · Mar 30, 2023

Chris said:
Hi,
first of all never run apt upgrade to upgrade your PVE nodes, this is asking for trouble, run apt-get dist-upgrade instead!

Regarding the stuck setup: I can remember a similar issue where the setup hangs with systemd-tty-ask-password-agent during upgrades when the cluster nodes have no quorum, see e.g. [0]. Please check pvecm status on the hanging node and try to temporarily give it quorum with pvecm expected 1.

[0] https://forum.proxmox.com/threads/proxmox-7-1-update-fails.108215/

TIL - I've been using proxmox for years and have missed that I need to use dist-upgrade. That fixed the issue of held-back packages.

Search

Search

[SOLVED] cluster stuck upon last updates

ieronymous

Well-Known Member

Attachments

Moayad

Proxmox Staff Member

Chris

Proxmox Staff Member

ieronymous

Well-Known Member

Attachments

ieronymous

Well-Known Member

ieronymous

Well-Known Member

cave

Renowned Member

Chris

Proxmox Staff Member

ieronymous

Well-Known Member

ieronymous

Well-Known Member

Chris

Proxmox Staff Member

elBradford

Renowned Member