Problem upgrading 7.0-11 to 7.1

jsterr · Nov 17, 2021

Hello, this Error happens on a clean 7.0-11cluster setup when upgrading from 7.0-11 to 7.1-4, Strange thing is it only happend on the first node pve1. pve2 and pve3 upgraded without issues.

Code:

Failed to restart corosync.service: Transaction for corosync.service/restart is destructive (corosync.service has 'stop' job queued, but 'restart' is included in transaction).
See system logs and 'systemctl status corosync.service' for details.
Setting up samba-libs:amd64 (2:4.13.13+dfsg-1~deb11u2) ...
Setting up pve-docs (7.1-2) ...
Setting up zfsutils-linux (2.1.1-pve3) ...
Installing new version of config file /etc/zfs/zfs-functions ...
Failed to start zfs-import.target: Transaction for zfs-import.target/start is destructive (systemd-reboot.service has 'start' job queued, but 'stop' is included in transaction).
See system logs and 'systemctl status zfs-import.target' for details.
Failed to start zfs-share.service: Transaction for zfs-share.service/start is destructive (local-fs.target has 'stop' job queued, but 'start' is included in transaction).
See system logs and 'systemctl status zfs-share.service' for details.
Failed to start zfs-volumes.target: Transaction for zfs-volumes.target/start is destructive (reboot.target has 'start' job queued, but 'stop' is included in transaction).
See system logs and 'systemctl status zfs-volumes.target' for details.
Failed to start zfs.target: Transaction for zfs.target/start is destructive (local-fs.target has 'stop' job queued, but 'start' is included in transaction).
See system logs and 'systemctl status zfs.target' for details.

Nothing happens. I killed dpkg then and did dpkg --configure -a but it gets on zfs .. it continued then skipping zfs but other errors happend.

Code:

Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/FA8C-FFA1
        Copying kernel and creating boot-entry for 5.11.22-4-pve
        Copying kernel and creating boot-entry for 5.11.22-7-pve
        Copying kernel and creating boot-entry for 5.13.19-1-pve
Copying and configuring kernels on /dev/disk/by-uuid/FA8E-A3DB
        Copying kernel and creating boot-entry for 5.11.22-4-pve
        Copying kernel and creating boot-entry for 5.11.22-7-pve
        Copying kernel and creating boot-entry for 5.13.19-1-pve
Processing triggers for libc-bin (2.31-13+deb11u2) ...
Processing triggers for man-db (2.9.4-2) ...
Errors were encountered while processing:
 zfsutils-linux
 zfs-initramfs
 zfs-zed
 pve-manager
 proxmox-ve

Code:

root@pve1:~# systemctl status zfs.target
* zfs.target - ZFS startup target
     Loaded: loaded (/lib/systemd/system/zfs.target; enabled; vendor preset: enabled)
     Active: inactive (dead) since Wed 2021-11-17 14:15:45 CET; 14min ago

Nov 17 12:34:02 pve1 systemd[1]: Reached target ZFS startup target.
Nov 17 14:15:45 pve1 systemd[1]: Stopped target ZFS startup target.

How can fix this?

jsterr · Nov 17, 2021

I fixed it with rebooting the server and doing upgrade again with dpkg --configure -a. The only thing I noticed is that my vnc-shell still says:

Code:

System is going down. Unprivileged users are not permitted to log in anymore. For technical details, see pam_nologin(8).

the other nodes dont have this messages on vnc. I did a: rm /run/nologin

fabian · Nov 17, 2021

could you get the full journal (or syslog, if the journal is not persistent and was cleared by the reboot) starting before you started the upgrade?

jsterr · Nov 17, 2021

Unfortunately I dont have. Before this I shutted down all 3 nodes, this one pve1 was still up. Maybe this was the reason for this. Anyway to still provide some useful information for you? This is 3 node ceph with full mesh.

fabian · Nov 17, 2021

the symptoms look to me like the node was in progress of shutting down when you were doing the upgrade, which is obviously not a good combination

syslog should contain the info (/var/log/syslog or the already logrotated files)

jsterr · Nov 17, 2021

This is the syslog. Yeah it seems the host wanted to shut down, but could not for any reason. It didnt shutdown after putting the other 2 nodes again, so I thought yeah all fine. That was the problem I guess. Anyway heres the syslog if you still wanna take a look at it.

Thanks.

fabian · Nov 17, 2021

yeah, the logs clearly show the node starting a shutdown, and while that is still going on, an 'aptupdate' task refreshing package versions, and then the systemd errors from your first post. also the following doesn't look good and might be worth a closer look on your end as well:

Code:

Nov 17 14:09:19 pve1 systemd[1]: Startup finished in 55.667s (firmware) + 3.575s (loader) + 7.809s (kernel) + 1h 35min 20.099s (userspace) = 1h 36min 27.152s.

node startup takes very very long, possibly for the same reason that shutdowns take a while?

jsterr · Nov 19, 2021

The shutdown takes incredibly long - any tip howto fix this? That only happens on the last node of three. First 2 Ceph/Proxmox Nodes shutdown normally, this one takes ages Im at 10-15 min atm. Im not using iSCSI. Used this:

1. Shutdown all VMs on every node
2. Set the following flags:
# ceph osd set noout
# ceph osd set nobackfill
# ceph osd set norecover
3. Shutdown the nodes only after All the VMs on all nodes was shut down.
4. Restore CEPH flags when all the nodes boot up.

fabian · Nov 19, 2021

you need to look at the full shutdown log (especially the start and end times of stopping services). in a clustered setting it's possible that some service (ceph-related?) refuses to shutdown because it waits for a reply from other nodes that never comes..

zaphyre · Dec 8, 2021

fabian said:
you need to look at the full shutdown log (especially the start and end times of stopping services). in a clustered setting it's possible that some service (ceph-related?) refuses to shutdown because it waits for a reply from other nodes that never comes..

Hi, in case this happens, a service refusing shutdown, while shutting down pve/ceph nodes in a cluster, how to deal with it? Is it safe to just kill / poweroff the node?

Background: I am currently in process of planning a relocation of a ceph cluster, see: https://forum.proxmox.com/threads/p...node-pve-ceph-cluster-to-another-city.100986/ and stumbled upon the above "process" from @jsterr which includes the ceph flags.

1. Shutdown all VMs on every node
2. Set the following flags:
# ceph osd set noout
# ceph osd set nobackfill
# ceph osd set norecover
3. Shutdown the nodes only after All the VMs on all nodes was shut down.
4. Restore CEPH flags when all the nodes boot up.

In https://forum.proxmox.com/threads/p...node-pve-ceph-cluster-to-another-city.100986/ I outlined my planned process so far which currently was planned as:

1. Shutdown all VMs on every node
3. Shutdown the nodes only after All the VMs on all nodes was shut down.

fabian · Dec 9, 2021

I can't give a blanket statement on that

zaphyre · Dec 9, 2021

fabian said:
I can't give a blanket statement on that

Sure. Understand.

asilverio · Apr 5, 2022

hi, having problems upgrading from ver 7.0.11 to 7.1, I tried all above but still have the following issues.
libpve-access-control : Breaks: pve-manager (< 7.0-15) but 7.0-11 is to be installed
libpve-common-perl : Breaks: qemu-server (< 7.0-19) but 7.0-13 is to be installed
libpve-rs-perl : Breaks: pve-manager (< 7.1-11) but 7.0-11 is to be installed
E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.

Search

Search

Problem upgrading 7.0-11 to 7.1

jsterr

Well-Known Member

jsterr

Well-Known Member

fabian

Proxmox Staff Member

jsterr

Well-Known Member

fabian

Proxmox Staff Member

jsterr

Well-Known Member

Attachments

fabian

Proxmox Staff Member

jsterr

Well-Known Member

fabian

Proxmox Staff Member

zaphyre

Member

fabian

Proxmox Staff Member

zaphyre

Member

asilverio

New Member