Can't apt ugrade: "pve-manager is not configured yet" and "Failed to start The Proxmox VE cluster filesystem"

sidoni

Member
Aug 22, 2020
7
0
6
47
Hi,
I cannot apt upgrade my Proxmox VE anymore:

apt update && apt upgrade
Hit:1 http://ftp.debian.org/debian bullseye InRelease
Hit:2 http://security.debian.org/debian-security bullseye-security InRelease
Hit:3 http://ftp.debian.org/debian bullseye-updates InRelease
Hit:4 http://download.proxmox.com/debian/pve bullseye InRelease
Hit:5 http://download.proxmox.com/debian bullseye InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
libzpool4linux pve-kernel-5.11 pve-kernel-5.11.22-4-pve pve-kernel-5.11.22-7-pve pve-kernel-5.13.19-1-pve pve-kernel-5.4 pve-kernel-5.4.128-1-pve
Use 'apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
2 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] y
Setting up pve-manager (7.2-4) ...
Job for pvestatd.service failed because the control process exited with error code.
See "systemctl status pvestatd.service" and "journalctl -xe" for details.
dpkg: error processing package pve-manager (--configure):
installed pve-manager package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of proxmox-ve:
proxmox-ve depends on pve-manager; however:
Package pve-manager is not configured yet.

dpkg: error processing package proxmox-ve (--configure):
dependency problems - leaving unconfigured
Errors were encountered while processing:
pve-manager
proxmox-ve
E: Sub-process /usr/bin/dpkg returned an error code (1)
Here are some additional informations:
systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2022-06-10 12:09:16 CEST; 11min ago
Process: 200697 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)
CPU: 42ms

Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jun 10 12:09:16 pmcitbck systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Jun 10 12:09:16 pmcitbck systemd[1]: Stopped The Proxmox VE cluster filesystem.
Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Start request repeated too quickly.
Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jun 10 12:09:16 pmcitbck systemd[1]: Failed to start The Proxmox VE cluster filesystem.
cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.10.240 pmbck.lan pmbck

# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
neofetch
root@pmbck
OS: Proxmox VE 7.2-4 x86_64
Host: ProLiant MicroServer
Kernel: 5.13.19-5-pve
Uptime: 2 days, 18 hours, 41 mins
Packages: 1102 (dpkg)
Shell: bash 5.1.4
CPU: AMD Turion II Neo N40L (2) @ 1.500GHz
GPU: AMD ATI Mobility Radeon HD 4225/4250
Memory: 1384MiB / 1945MiB
Could you please help me to fix this issue?

Thanks for your kind help.
 
Hi,

could you maybe post the output of the command systemctl status pvestatd.service and also journalctl -xe?
 
Sorry guys for my late answer, I didn't received any notification for your kind answers.
Hi,

could you maybe post the output of the command systemctl status pvestatd.service and also journalctl -xe?
Yes of course:

systemctl status pvestatd.service : http://paste.debian.net/hidden/b8d180f9
journalctl -xe : http://paste.debian.net/hidden/5ef2e87a/



If you really ran this command, it should be:
apt full-upgrade
Same error message:
http://paste.debian.net/hidden/c130a661/
 
OK. the linked post was mostly for the hints in hosts and hostname.
I see a different hostname in your last paste then in your opening post.
Did you perhaps (try to) rename your pve host? Renaming it can be tricky, as can be found in other forum posts.
There must be other services giving errors in your journal, I think you should inspect a full journal boot and see what the earliest errors are. If you find them please post them or post a boot log journal.

One other hint if pve-cluster.service also doesn't start:
https://forum.proxmox.com/threads/pveproxy-dev-fuse-not-mounted-to-etc-pve.34968/#post-368567
 
Hi janssensm,
no there was no renaming of pve host, the different hostname in my opening post was a copy/paste mistake :)

Could you please tell me how to get a boot log journal?

Maybe a hint found thanks to journalctl -b -u pve-cluster.service: http://paste.debian.net/hidden/47746cdb/
Especially this error message:
[database] crit: unable to set WAL mode: disk I/O error#010
This error message can be found in this other forum post:
https://forum.proxmox.com/threads/proxmox-ve-cluster-not-start-after-a-power-fail.51870/

Which suggests to check the disk(s).

ls /dev/sd*
/dev/sda /dev/sda1 /dev/sda2 /dev/sda3 /dev/sdb /dev/sdb1 /dev/sdb2 /dev/sdb3
smartctl -a /dev/sda : http://paste.debian.net/hidden/ab56d7f7/
smartctl -a /dev/sdb : http://paste.debian.net/hidden/4510398b/

I don't see an error in these logs. I've launched a long smartctl test (-t long), I may get another log tomorrow.

df -Th
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 942M 0 942M 0% /dev
tmpfs tmpfs 195M 1.1M 194M 1% /run
rpool/ROOT/pve-1 zfs 3.6T 222G 3.3T 7% /
tmpfs tmpfs 973M 0 973M 0% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
rpool zfs 3.3T 128K 3.3T 1% /rpool
rpool/ROOT zfs 3.3T 128K 3.3T 1% /rpool/ROOT
rpool/data zfs 3.3T 128K 3.3T 1% /rpool/data
tmpfs tmpfs 195M 0 195M 0% /run/user/0

zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 11.6M in 00:00:15 with 24 errors on Sun Nov 21 13:25:12 2021
config:

NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sda3 DEGRADED 0 0 302 too many errors
sdb3 ONLINE 0 0 302

errors: Permanent errors have been detected in the following files:

//var/lib/pve-cluster/config.db-wal
rpool/ROOT/pve-1:<0x26bf1>
rpool/ROOT/pve-1:<0x2c9fe>
Now we found these more specific errors, could you please tell me how to fix them? :)
 
Last edited:
Could you please tell me how to get a boot log journal?
That would be journalctl -b, you could also do journalctl -S 2022-01-01 to check journal from this year or another date as you like, because you seem to have a degraded pool with also permanent errors since 21-11-2021.

Now the root cause is becoming a bit more in sight: zfs pool and/or storage.

I hope you don´t run your zfs pool on top of a hardware raid controller, that would be asking for trouble.
There could still be multiple causes to these errors, but I doubt if there is a reliable fix for this error, because it's a file from the proxmox system itself. Fortunately I don't speak from experience with this particular error and how to fix it, maybe another forum member or staff has and can advise.

Perhaps you had a power failure or a bad shutdown earlier and corruption occured. Don't know if you are using ecc ram and/or reliable disks.
Checking your storage is indeed an import first step so a long smart test indeed for every disk.
Next double-checking your (sata?) cables, maybe replacing them.
You could do a scrub of your pool, but permanent error would indicate that it cannot be repaired. The linked article from your zpool status has info on that.
So you should also prepare yourself that it's possible that your host has to be installed from scratch, assuming that you backup your vm's to another place and you could restore them easily after your host reinstall.
But then too, first you have to make sure your hardware is reliable.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!