Problem upgrading 7.0-11 to 7.1

jsterr

Renowned Member
Jul 24, 2020
784
220
68
32
Hello, this Error happens on a clean 7.0-11cluster setup when upgrading from 7.0-11 to 7.1-4, Strange thing is it only happend on the first node pve1. pve2 and pve3 upgraded without issues.

Code:
Failed to restart corosync.service: Transaction for corosync.service/restart is destructive (corosync.service has 'stop' job queued, but 'restart' is included in transaction).
See system logs and 'systemctl status corosync.service' for details.
Setting up samba-libs:amd64 (2:4.13.13+dfsg-1~deb11u2) ...
Setting up pve-docs (7.1-2) ...
Setting up zfsutils-linux (2.1.1-pve3) ...
Installing new version of config file /etc/zfs/zfs-functions ...
Failed to start zfs-import.target: Transaction for zfs-import.target/start is destructive (systemd-reboot.service has 'start' job queued, but 'stop' is included in transaction).
See system logs and 'systemctl status zfs-import.target' for details.
Failed to start zfs-share.service: Transaction for zfs-share.service/start is destructive (local-fs.target has 'stop' job queued, but 'start' is included in transaction).
See system logs and 'systemctl status zfs-share.service' for details.
Failed to start zfs-volumes.target: Transaction for zfs-volumes.target/start is destructive (reboot.target has 'start' job queued, but 'stop' is included in transaction).
See system logs and 'systemctl status zfs-volumes.target' for details.
Failed to start zfs.target: Transaction for zfs.target/start is destructive (local-fs.target has 'stop' job queued, but 'start' is included in transaction).
See system logs and 'systemctl status zfs.target' for details.

Nothing happens. I killed dpkg then and did dpkg --configure -a but it gets on zfs .. it continued then skipping zfs but other errors happend.

Code:
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/FA8C-FFA1
        Copying kernel and creating boot-entry for 5.11.22-4-pve
        Copying kernel and creating boot-entry for 5.11.22-7-pve
        Copying kernel and creating boot-entry for 5.13.19-1-pve
Copying and configuring kernels on /dev/disk/by-uuid/FA8E-A3DB
        Copying kernel and creating boot-entry for 5.11.22-4-pve
        Copying kernel and creating boot-entry for 5.11.22-7-pve
        Copying kernel and creating boot-entry for 5.13.19-1-pve
Processing triggers for libc-bin (2.31-13+deb11u2) ...
Processing triggers for man-db (2.9.4-2) ...
Errors were encountered while processing:
 zfsutils-linux
 zfs-initramfs
 zfs-zed
 pve-manager
 proxmox-ve

Code:
root@pve1:~# systemctl status zfs.target
* zfs.target - ZFS startup target
     Loaded: loaded (/lib/systemd/system/zfs.target; enabled; vendor preset: enabled)
     Active: inactive (dead) since Wed 2021-11-17 14:15:45 CET; 14min ago

Nov 17 12:34:02 pve1 systemd[1]: Reached target ZFS startup target.
Nov 17 14:15:45 pve1 systemd[1]: Stopped target ZFS startup target.

How can fix this?
 
Last edited:
I fixed it with rebooting the server and doing upgrade again with dpkg --configure -a. The only thing I noticed is that my vnc-shell still says:

Code:
System is going down. Unprivileged users are not permitted to log in anymore. For technical details, see pam_nologin(8).

the other nodes dont have this messages on vnc. I did a: rm /run/nologin
 
Last edited:
could you get the full journal (or syslog, if the journal is not persistent and was cleared by the reboot) starting before you started the upgrade?
 
Unfortunately I dont have. Before this I shutted down all 3 nodes, this one pve1 was still up. Maybe this was the reason for this. Anyway to still provide some useful information for you? This is 3 node ceph with full mesh.
 
Last edited:
the symptoms look to me like the node was in progress of shutting down when you were doing the upgrade, which is obviously not a good combination ;) syslog should contain the info (/var/log/syslog or the already logrotated files)
 
This is the syslog. Yeah it seems the host wanted to shut down, but could not for any reason. It didnt shutdown after putting the other 2 nodes again, so I thought yeah all fine. That was the problem I guess. Anyway heres the syslog if you still wanna take a look at it.

Thanks.
 

Attachments

  • syslog.zip
    555.2 KB · Views: 1
yeah, the logs clearly show the node starting a shutdown, and while that is still going on, an 'aptupdate' task refreshing package versions, and then the systemd errors from your first post. also the following doesn't look good and might be worth a closer look on your end as well:

Code:
Nov 17 14:09:19 pve1 systemd[1]: Startup finished in 55.667s (firmware) + 3.575s (loader) + 7.809s (kernel) + 1h 35min 20.099s (userspace) = 1h 36min 27.152s.

node startup takes very very long, possibly for the same reason that shutdowns take a while?
 
  • Like
Reactions: jsterr
The shutdown takes incredibly long - any tip howto fix this? That only happens on the last node of three. First 2 Ceph/Proxmox Nodes shutdown normally, this one takes ages Im at 10-15 min atm. Im not using iSCSI. Used this:

1. Shutdown all VMs on every node
2. Set the following flags:
# ceph osd set noout
# ceph osd set nobackfill
# ceph osd set norecover
3. Shutdown the nodes only after All the VMs on all nodes was shut down.
4. Restore CEPH flags when all the nodes boot up.

1637312912086.png
 
Last edited:
you need to look at the full shutdown log (especially the start and end times of stopping services). in a clustered setting it's possible that some service (ceph-related?) refuses to shutdown because it waits for a reply from other nodes that never comes..
 
you need to look at the full shutdown log (especially the start and end times of stopping services). in a clustered setting it's possible that some service (ceph-related?) refuses to shutdown because it waits for a reply from other nodes that never comes..
Hi, in case this happens, a service refusing shutdown, while shutting down pve/ceph nodes in a cluster, how to deal with it? Is it safe to just kill / poweroff the node?



Background: I am currently in process of planning a relocation of a ceph cluster, see: https://forum.proxmox.com/threads/p...node-pve-ceph-cluster-to-another-city.100986/ and stumbled upon the above "process" from @jsterr which includes the ceph flags.

1. Shutdown all VMs on every node
2. Set the following flags:
# ceph osd set noout
# ceph osd set nobackfill
# ceph osd set norecover
3. Shutdown the nodes only after All the VMs on all nodes was shut down.
4. Restore CEPH flags when all the nodes boot up.

In https://forum.proxmox.com/threads/p...node-pve-ceph-cluster-to-another-city.100986/ I outlined my planned process so far which currently was planned as:

1. Shutdown all VMs on every node
3. Shutdown the nodes only after All the VMs on all nodes was shut down.
 
  • Like
Reactions: jsterr
I can't give a blanket statement on that ;)
 
hi, having problems upgrading from ver 7.0.11 to 7.1, I tried all above but still have the following issues.
libpve-access-control : Breaks: pve-manager (< 7.0-15) but 7.0-11 is to be installed
libpve-common-perl : Breaks: qemu-server (< 7.0-19) but 7.0-13 is to be installed
libpve-rs-perl : Breaks: pve-manager (< 7.1-11) but 7.0-11 is to be installed
E: Error, pkgProblemResolver::Resolve generated breaks, this may be caused by held packages.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!