Proxmox doesn´t work after power outage

CrunkFX

New Member
Feb 6, 2021
1
0
1
24
Hey there,
I am completely desperate. After a power outage my Proxmox Server is no longer working. I´m searching for like 2 weeks but can´t figure out what is causing all of the problems.

systemctl --failed
Bash:
  UNIT                LOAD   ACTIVE SUB    DESCRIPTION
● pve-cluster.service loaded failed failed The Proxmox VE cluster filesystem
● pve-guests.service  loaded failed failed PVE guests
● pve-ha-crm.service  loaded failed failed PVE Cluster HA Resource Manager Daemon
● pve-ha-lrm.service  loaded failed failed PVE Local HA Resource Manager Daemon
● pvesr.service       loaded failed failed Proxmox VE replication runner
● pvestatd.service    loaded failed failed PVE Status Daemon

systemctl status pve-cluster
Code:
● pve-cluster.service - The Proxmox VE cluster filesystem
   Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
   Active: failed (Result: signal) since Sat 2021-02-06 21:27:26 CET; 20min ago

Feb 06 21:27:26 pvee systemd[1]: pve-cluster.service: Service RestartSec=100ms expired, scheduling restart.
Feb 06 21:27:26 pvee systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 5.
Feb 06 21:27:26 pvee systemd[1]: Stopped The Proxmox VE cluster filesystem.
Feb 06 21:27:26 pvee systemd[1]: pve-cluster.service: Start request repeated too quickly.
Feb 06 21:27:26 pvee systemd[1]: pve-cluster.service: Failed with result 'signal'.
Feb 06 21:27:26 pvee systemd[1]: Failed to start The Proxmox VE cluster filesystem.

systemctl status pveproxy
Code:
● pveproxy.service - PVE API Proxy Server
   Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2021-02-06 21:18:22 CET; 30min ago
 Main PID: 1109 (pveproxy)
    Tasks: 4 (limit: 4915)
   Memory: 131.9M
   CGroup: /system.slice/pveproxy.service
           ├─1109 pveproxy
           ├─2684 pveproxy worker
           ├─2685 pveproxy worker
           └─2686 pveproxy worker

Feb 06 21:49:18 pvee pveproxy[1109]: worker 2676 finished
Feb 06 21:49:18 pvee pveproxy[1109]: worker 2684 started
Feb 06 21:49:18 pvee pveproxy[2684]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1775.
Feb 06 21:49:18 pvee pveproxy[2677]: worker exit
Feb 06 21:49:18 pvee pveproxy[1109]: worker 2677 finished
Feb 06 21:49:18 pvee pveproxy[1109]: starting 2 worker(s)
Feb 06 21:49:18 pvee pveproxy[1109]: worker 2685 started
Feb 06 21:49:18 pvee pveproxy[1109]: worker 2686 started
Feb 06 21:49:18 pvee pveproxy[2685]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1775.
Feb 06 21:49:18 pvee pveproxy[2686]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1775.

I don´t know how important that is but
zpool status outputs
no pools available


All other failed Services output:
Code:
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

I don´t have a cluster. It´s just a single NUC running for a few months.

lsblk shows
Code:
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                            8:0    0 238.5G  0 disk
├─sda1                         8:1    0  1007K  0 part
├─sda2                         8:2    0   512M  0 part /boot/efi
└─sda3                         8:3    0   238G  0 part
  ├─pve-swap                 253:0    0     7G  0 lvm  [SWAP]
  ├─pve-root                 253:1    0  59.3G  0 lvm  /
  ├─pve-data_tmeta           253:2    0   1.6G  0 lvm
  │ └─pve-data-tpool         253:4    0 152.6G  0 lvm
  │   ├─pve-data             253:5    0 152.6G  0 lvm
  │   ├─pve-vm--100--disk--0 253:6    0    50G  0 lvm
  │   ├─pve-vm--101--disk--0 253:7    0    10G  0 lvm
  │   ├─pve-vm--102--disk--0 253:8    0    10G  0 lvm
  │   ├─pve-vm--103--disk--0 253:9    0    30G  0 lvm
  │   ├─pve-vm--105--disk--0 253:10   0    20G  0 lvm
  │   ├─pve-vm--104--disk--0 253:11   0    20G  0 lvm
  │   ├─pve-vm--106--disk--0 253:12   0    20G  0 lvm
  │   ├─pve-vm--107--disk--0 253:13   0    10G  0 lvm
  │   ├─pve-vm--111--disk--0 253:14   0     4M  0 lvm
  │   ├─pve-vm--111--disk--1 253:15   0    36G  0 lvm
  │   ├─pve-vm--113--disk--0 253:16   0    20G  0 lvm
  │   └─pve-vm--108--disk--0 253:17   0     8G  0 lvm
  └─pve-data_tdata           253:3    0 152.6G  0 lvm
    └─pve-data-tpool         253:4    0 152.6G  0 lvm
      ├─pve-data             253:5    0 152.6G  0 lvm
      ├─pve-vm--100--disk--0 253:6    0    50G  0 lvm
      ├─pve-vm--101--disk--0 253:7    0    10G  0 lvm
      ├─pve-vm--102--disk--0 253:8    0    10G  0 lvm
      ├─pve-vm--103--disk--0 253:9    0    30G  0 lvm
      ├─pve-vm--105--disk--0 253:10   0    20G  0 lvm
      ├─pve-vm--104--disk--0 253:11   0    20G  0 lvm
      ├─pve-vm--106--disk--0 253:12   0    20G  0 lvm
      ├─pve-vm--107--disk--0 253:13   0    10G  0 lvm
      ├─pve-vm--111--disk--0 253:14   0     4M  0 lvm
      ├─pve-vm--111--disk--1 253:15   0    36G  0 lvm
      ├─pve-vm--113--disk--0 253:16   0    20G  0 lvm
      └─pve-vm--108--disk--0 253:17   0     8G  0 lvm

This seems to represent all VM´s i had.


Is there anyway of getting this System back to Life? If not will there be a way of Backing up VM´s as i can´t reach the GUI and vzdump doesn´t work because of the Connection refused error ?

Thanks in advance
 
the problem seems to be that pmxcfs cannot start

what does
Code:
journalctl -u pve-cluster -b

say?
 
Hi Folks,

Was there any progress on this one? I have this exact same setup (single server, no cluster) and no GUI or anything else PVE related is working after a power failure. I can still do SSH, but it takes a very long time to authenticate and several commands take also a long time to provide output. Here's the details from mine:

Code:
qm list
root@server-1:~# qm list
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused
root@server-1:~#
root@server-1:~# systemctl status pve-cluster
Failed to list units: Transport endpoint is not connected
root@server-1:~#

Everything related to sytemctl gives me a "Transport endpoint is not connected" error message.

To answer the previous question from Dominik, here's my output:

Code:
root@server-1:~# journalctl -u pve-cluster -b
-- Journal begins at Mon 2021-08-02 19:16:55 IST, ends at Tue 2022-04-19 01:48:04 IST. --
-- No entries --
root@pserver-1:~#

Some additional output:
Code:
root@server-1:~# pvecm status
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused
root@server-1:~#

Code:
root@server-1:~# pveversion --verbose
proxmox-ve: 7.1-1 (running kernel: 5.13.19-6-pve)
pve-manager: not correctly installed (running version: 7.1-12/b3c09de3)
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-3-pve: 5.11.22-7
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-7
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-5
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: not correctly installed
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-6
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.2.0-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
root@server-1:~#

This output shows that pve-manager and libpve-storage-perl are "not correctly installed". There was no change at all in the configurations from what I've been using for a very long time now. This was a "set and forget" install which I come back once every quarter to do an apt dist-upgrade and nothing else.

I did an apt dist-upgrade to see if this could be fixed on its own, but it doesn't seem like it. It also took a very long time and failed multiple times because every single package tried to manipulate the system services and with that, every single package failed with the "Transport endpoint is not connected" message, like below:

Code:
Setting up udev (247.3-7) ...
Failed to reload daemon: Transport endpoint is not connected
Failed to retrieve unit state: Transport endpoint is not connected

Any idea guys?
 
To answer the previous question from Dominik, here's my output:
seems there was no attempt to start the pmxcfs at all

I did an apt dist-upgrade to see if this could be fixed on its own, but it doesn't seem like it. It also took a very long time and failed multiple times because every single package tried to manipulate the system services and with that, every single package failed with the "Transport endpoint is not connected" message, like below:
can you post the output from

Code:
apt-get update
apt-get dist-upgrade
?
 
Thanks for the very quick reply.

Here's the required output:

Code:
root@server-1:~# apt update
Hit:1 http://security.debian.org bullseye-security InRelease
Hit:2 http://ftp.ie.debian.org/debian bullseye InRelease
Get:3 http://ftp.ie.debian.org/debian bullseye-updates InRelease [39.4 kB]
Hit:4 http://download.proxmox.com/debian/pve bullseye InRelease
Hit:5 http://download.proxmox.com/debian/ceph-pacific bullseye InRelease
Fetched 39.4 kB in 9s (4,394 B/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
W: Target Packages (pve-no-subscription/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:6 and /etc/apt/sources.list.d/pve-no-subscription.list:1
W: Target Packages (pve-no-subscription/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:6 and /etc/apt/sources.list.d/pve-no-subscription.list:1
W: Target Translations (pve-no-subscription/i18n/Translation-en_US) is configured multiple times in /etc/apt/sources.list:6 and /etc/apt/sources.list.d/pve-no-subscription.list:1
W: Target Translations (pve-no-subscription/i18n/Translation-en) is configured multiple times in /etc/apt/sources.list:6 and /etc/apt/sources.list.d/pve-no-subscription.list:1
W: Target Packages (pve-no-subscription/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:6 and /etc/apt/sources.list.d/pve-no-subscription.list:1
W: Target Packages (pve-no-subscription/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:6 and /etc/apt/sources.list.d/pve-no-subscription.list:1
W: Target Translations (pve-no-subscription/i18n/Translation-en_US) is configured multiple times in /etc/apt/sources.list:6 and /etc/apt/sources.list.d/pve-no-subscription.list:1
W: Target Translations (pve-no-subscription/i18n/Translation-en) is configured multiple times in /etc/apt/sources.list:6 and /etc/apt/sources.list.d/pve-no-subscription.list:1
root@server-1:~#
root@server-1:~# sudo apt dist-upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following package was automatically installed and is no longer required:
  pve-kernel-5.11.22-3-pve
Use 'sudo apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
4 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] y
Setting up udev (247.3-7) ...
Failed to reload daemon: Transport endpoint is not connected
Failed to retrieve unit state: Transport endpoint is not connected
invoke-rc.d: could not determine current runlevel
Failed to restart udev.service: Transport endpoint is not connected
See system logs and 'systemctl status udev.service' for details.
invoke-rc.d: initscript udev, action "restart" failed.
Failed to get properties: Transport endpoint is not connected
dpkg: error processing package udev (--configure):
 installed udev package post-installation script subprocess returned error exit status 1
dpkg: dependency problems prevent configuration of libpve-storage-perl:
 libpve-storage-perl depends on udev; however:
  Package udev is not configured yet.

dpkg: error processing package libpve-storage-perl (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of pve-kernel-helper:
 pve-kernel-helper depends on udev; however:
  Package udev is not configured yet.

dpkg: error processing package pve-kernel-helper (--configure):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of pve-manager:
 pve-manager depends on libpve-storage-perl (>= 7.0-15); however:
  Package libpve-storage-perl is not configured yet.

dpkg: error processing package pve-manager (--configure):
 dependency problems - leaving unconfigured
Processing triggers for initramfs-tools (0.140) ...
update-initramfs: Generating /boot/initrd.img-5.13.19-6-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
No /etc/kernel/proxmox-boot-uuids found, skipping ESP sync.
Errors were encountered while processing:
 udev
 libpve-storage-perl
 pve-kernel-helper
 pve-manager
E: Sub-process /usr/bin/dpkg returned an error code (1)
root@server-1:~#

It looks to me like udev failure is cascading down, Not sure if that's related with this issue, but it also blocks pve-manager to update.

Just FYI, because that's what the output asks me to do, as with all systemctl commands, it fails:

Code:
root@server-1:~# systemctl status udev.service
Failed to get properties: Transport endpoint is not connected
root@server-1:~#

BTW, another thing that surprises me a lot: not even shutdown and reboot commands are working. They just fail silently:

Code:
root@server-1:~# shutdown now
root@server-1:~#
root@server-1:~# reboot now
root@server-1:~#
 
root@server-1:~# systemctl status udev.service
Failed to get properties: Transport endpoint is not connected
ok there seems to be something quite broken, it looks like systemd or dbus is not working properly...

can you check if there is any log from the boot ? (with journalctl or from /var/log/syslog)

you could try to reinstall those packages again, but i cannot say if it helps (i am not sure if that is broken):

Code:
apt install --reinstall systemd
 
Hey Dominik, thanks a lot for the help, but I'm giving up. Reinstall of the systemd also failed many times with "Transport endpoint is not connected". Coming back to register this in here for people that might have the same issue in the future. Wasted many hours trying all that I could to no avail, I'll try my luck with my weekly VM backups and just reinstall it from scratch.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!