Can't apt ugrade: "pve-manager is not configured yet" and "Failed to start The Proxmox VE cluster filesystem"

sidoni · Jun 10, 2022

Hi,
I cannot apt upgrade my Proxmox VE anymore:

apt update && apt upgrade
Hit:1 http://ftp.debian.org/debian bullseye InRelease
Hit:2 http://security.debian.org/debian-security bullseye-security InRelease
Hit:3 http://ftp.debian.org/debian bullseye-updates InRelease
Hit:4 http://download.proxmox.com/debian/pve bullseye InRelease
Hit:5 http://download.proxmox.com/debian bullseye InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages were automatically installed and are no longer required:
libzpool4linux pve-kernel-5.11 pve-kernel-5.11.22-4-pve pve-kernel-5.11.22-7-pve pve-kernel-5.13.19-1-pve pve-kernel-5.4 pve-kernel-5.4.128-1-pve
Use 'apt autoremove' to remove them.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
2 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] y
Setting up pve-manager (7.2-4) ...
Job for pvestatd.service failed because the control process exited with error code.
See "systemctl status pvestatd.service" and "journalctl -xe" for details.
dpkg: error processing package pve-manager (--configure):
installed pve-manager package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of proxmox-ve:
proxmox-ve depends on pve-manager; however:
Package pve-manager is not configured yet.

dpkg: error processing package proxmox-ve (--configure):
dependency problems - leaving unconfigured
Errors were encountered while processing:
pve-manager
proxmox-ve
E: Sub-process /usr/bin/dpkg returned an error code (1)

Here are some additional informations:

systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2022-06-10 12:09:16 CEST; 11min ago
Process: 200697 ExecStart=/usr/bin/pmxcfs (code=exited, status=255/EXCEPTION)
CPU: 42ms

Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jun 10 12:09:16 pmcitbck systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Jun 10 12:09:16 pmcitbck systemd[1]: Stopped The Proxmox VE cluster filesystem.
Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Start request repeated too quickly.
Jun 10 12:09:16 pmcitbck systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Jun 10 12:09:16 pmcitbck systemd[1]: Failed to start The Proxmox VE cluster filesystem.

cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.10.240 pmbck.lan pmbck

# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

neofetch
root@pmbck
OS: Proxmox VE 7.2-4 x86_64
Host: ProLiant MicroServer
Kernel: 5.13.19-5-pve
Uptime: 2 days, 18 hours, 41 mins
Packages: 1102 (dpkg)
Shell: bash 5.1.4
CPU: AMD Turion II Neo N40L (2) @ 1.500GHz
GPU: AMD ATI Mobility Radeon HD 4225/4250
Memory: 1384MiB / 1945MiB

Could you please help me to fix this issue?

Thanks for your kind help.

sterzy · Jun 10, 2022

Hi,

could you maybe post the output of the command systemctl status pvestatd.service and also journalctl -xe?

janssensm · Jun 10, 2022

sidoni said:
apt update && apt upgrade

If you really ran this command, it should be:
apt full-upgrade

sidoni · Jul 3, 2022

Sorry guys for my late answer, I didn't received any notification for your kind answers.

sterzy said:
Hi,

could you maybe post the output of the command systemctl status pvestatd.service and also journalctl -xe?

Yes of course:

systemctl status pvestatd.service : http://paste.debian.net/hidden/b8d180f9
journalctl -xe : http://paste.debian.net/hidden/5ef2e87a/

janssensm said:
If you really ran this command, it should be:
apt full-upgrade

Same error message:
http://paste.debian.net/hidden/c130a661/

janssensm · Jul 3, 2022

Seeing the errors from your journal, could this be the issue and solution?:
https://forum.proxmox.com/threads/f...share-perl5-pve-apiserver-anyevent-pm.103709/

sidoni · Jul 3, 2022

janssensm said:
Seeing the errors from your journal, could this be the issue and solution?:
https://forum.proxmox.com/threads/f...share-perl5-pve-apiserver-anyevent-pm.103709/

I'm afraid not, because:
- I don't use docker
- my /etc/hosts file seems correct: http://paste.debian.net/hidden/a45f4012/

janssensm · Jul 4, 2022

OK. the linked post was mostly for the hints in hosts and hostname.
I see a different hostname in your last paste then in your opening post.
Did you perhaps (try to) rename your pve host? Renaming it can be tricky, as can be found in other forum posts.
There must be other services giving errors in your journal, I think you should inspect a full journal boot and see what the earliest errors are. If you find them please post them or post a boot log journal.

One other hint if pve-cluster.service also doesn't start:
https://forum.proxmox.com/threads/pveproxy-dev-fuse-not-mounted-to-etc-pve.34968/#post-368567

sidoni · Jul 5, 2022

Hi janssensm,
no there was no renaming of pve host, the different hostname in my opening post was a copy/paste mistake

Could you please tell me how to get a boot log journal?

Maybe a hint found thanks to journalctl -b -u pve-cluster.service: http://paste.debian.net/hidden/47746cdb/
Especially this error message:

[database] crit: unable to set WAL mode: disk I/O error#010

This error message can be found in this other forum post:
https://forum.proxmox.com/threads/proxmox-ve-cluster-not-start-after-a-power-fail.51870/

Which suggests to check the disk(s).

ls /dev/sd*
/dev/sda /dev/sda1 /dev/sda2 /dev/sda3 /dev/sdb /dev/sdb1 /dev/sdb2 /dev/sdb3

smartctl -a /dev/sda : http://paste.debian.net/hidden/ab56d7f7/
smartctl -a /dev/sdb : http://paste.debian.net/hidden/4510398b/

I don't see an error in these logs. I've launched a long smartctl test (-t long), I may get another log tomorrow.

df -Th
Filesystem Type Size Used Avail Use% Mounted on
udev devtmpfs 942M 0 942M 0% /dev
tmpfs tmpfs 195M 1.1M 194M 1% /run
rpool/ROOT/pve-1 zfs 3.6T 222G 3.3T 7% /
tmpfs tmpfs 973M 0 973M 0% /dev/shm
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
rpool zfs 3.3T 128K 3.3T 1% /rpool
rpool/ROOT zfs 3.3T 128K 3.3T 1% /rpool/ROOT
rpool/data zfs 3.3T 128K 3.3T 1% /rpool/data
tmpfs tmpfs 195M 0 195M 0% /run/user/0

zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: resilvered 11.6M in 00:00:15 with 24 errors on Sun Nov 21 13:25:12 2021
config:

NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sda3 DEGRADED 0 0 302 too many errors
sdb3 ONLINE 0 0 302

errors: Permanent errors have been detected in the following files:

//var/lib/pve-cluster/config.db-wal
rpool/ROOT/pve-1:<0x26bf1>
rpool/ROOT/pve-1:<0x2c9fe>

Now we found these more specific errors, could you please tell me how to fix them?

janssensm · Jul 5, 2022

sidoni said:
Could you please tell me how to get a boot log journal?

That would be journalctl -b, you could also do journalctl -S 2022-01-01 to check journal from this year or another date as you like, because you seem to have a degraded pool with also permanent errors since 21-11-2021.

Now the root cause is becoming a bit more in sight: zfs pool and/or storage.

I hope you don´t run your zfs pool on top of a hardware raid controller, that would be asking for trouble.
There could still be multiple causes to these errors, but I doubt if there is a reliable fix for this error, because it's a file from the proxmox system itself. Fortunately I don't speak from experience with this particular error and how to fix it, maybe another forum member or staff has and can advise.

Perhaps you had a power failure or a bad shutdown earlier and corruption occured. Don't know if you are using ecc ram and/or reliable disks.
Checking your storage is indeed an import first step so a long smart test indeed for every disk.
Next double-checking your (sata?) cables, maybe replacing them.
You could do a scrub of your pool, but permanent error would indicate that it cannot be repaired. The linked article from your zpool status has info on that.
So you should also prepare yourself that it's possible that your host has to be installed from scratch, assuming that you backup your vm's to another place and you could restore them easily after your host reinstall.
But then too, first you have to make sure your hardware is reliable.

Search

Search

Can't apt ugrade: "pve-manager is not configured yet" and "Failed to start The Proxmox VE cluster filesystem"

sidoni

Active Member

sterzy

Proxmox Staff Member

janssensm

Famous Member

sidoni

Active Member

janssensm

Famous Member

sidoni

Active Member

janssensm

Famous Member

sidoni

Active Member

janssensm

Famous Member

We value your privacy