[SOLVED] Neither GUI nor ZFS pool after power surge

plexorium

Member
Nov 4, 2020
3
0
6
42
Hi,

I had a power surge at home a couple days ago and since then my proxmox server has stopped working. I've been reading lots of posts and documentation to try to find the reason but I can't find anything conclusive.

My setup is 5 hard drives: 1 ssd, 2 individual 500 gb in lvm-thin and 2 1TB in a pool with zfs raid1. I boot proxmox from a 32GB USB pen drive and I have an external USB hard drive for isos and generic storage. The system has been working flawlessly for the last couple of months.

My problems and conclusions so far:

1º The server boots and I can log in through SSH but no web GUI. The port is listening as well as opened on the firewall. I've tried to curl into it and I get:

curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 192.168.1.90:8006

The error I get seemed a problem with the SSL certificate, so I tried to regenerate the certificates following different posts and tutorials I found like this post:

https://forum.proxmox.com/threads/proxmox-6-gui-problem-certificate-has-been-revoked.60357/

I've also tried to restart the proxy, daemon etc. Analysing proxmox logs, everytime I try to log in I get this msg:

pveproxy worker[1234] general protection fault ip:7f7c3cd585f7 sp:7fffd13afd8f error:0 in libssl.so.1.1[7f7c3cd35000+4d000]

Also calls my attention after booting:

bug kernel: traps: dbus-daemon[1350] general protection fault ip:7f1baf1a5206 sp:7ffc34b7aff8 error:0 in libc-2.28.so[7f1baf12f000+148000]

2º VMs set up to autoboot don't boot (apart from one LXC that seems to be on and off inconsistently) I've also tried to boot them manually through SSH but I get the same error in both cases:

org.freedesktop.DBus.Error.Disconnected: Connection is closed

3º I've also tested for bad sectors as well as memtest++ to see if any of my dims were dead. Everything seemed okay.

4º The zfs pool with the 2 1TB hard drives is gone. The hard drives are loaded but the pool is not found. (zpool status -v shows "no pools available")
I've tried to import it manually as well as many other options like:

zpool import -D -f (poolname)
zpool import -a

None worked. I've tried to restart the .cache zpool service, delete and create it again, restart the pool name service.... nothing seems to work. The pool is not present anywhere.

5º I've also tried to boot from a proxmox .iso in rescue mode, hoping for at least accessing all the data, but I face the same problems. No GUI, no zpool etc.

I'm honestly running out of options. I'm not the kind of person that asks for help on the first problem, but I really don't know what else to check.

Seeing all the posts here and what is usually requested, I attach the info that I think it will be useful (I'm running the last version of proxmox as well as all packages updated)

Thanks in advance for any help
 

Attachments

  • journalctl.txt
    117.8 KB · Views: 3
  • pveversion.txt
    1.1 KB · Views: 0
  • services.txt
    11.2 KB · Views: 3
Last edited:
I boot proxmox from a 32GB USB pen drive
What kind of pen drive? One that is specifically built to handle a lot of writes and is basically an SSD in a pen drive form factor? Normal USB drives usually don't last long as PVE is writing quite extensively to it's system disk.

Regarding the libssl and libc errors: They might be corrupted. Try to reinstall them with
Code:
apt install --reinstall libc6 libssl1.1

Regarding the zpool:
Do you see the drives if you run lsblk? Does anything show up if you run zpool import (without a pool it only scans available disks)?
 
If You want run PVE from flash devices You can run it as live system without persistence.
I've test setup (on production envoirment - we want to move from ESXi) running like this from sdcard with uptime around 300days. PVE have overlayfs with 1GB storage in RAM and after that time it uses around 100MB with 11 CT/VM running from ZFS RAID10 pool.
But this setup have drawbacks. If You change configuration or update PVE host You must rebuild squashfs image. You have rebuild image and write to boot sdcard before any reboot if You want logs, and statistics to stay.
I've perl script to automate this, and some systemd timers for rebuilding system images in ram and writing to boot device. It isn't complete in 100% because it lacks writing latest image when UPS start working from battery, but it's work in progress. If would be any interest in this I'll write tutorial for this and share the scripts (after cleanup because I'm not programmer).
 
Last edited:
What kind of pen drive? One that is specifically built to handle a lot of writes and is basically an SSD in a pen drive form factor? Normal USB drives usually don't last long as PVE is writing quite extensively to it's system disk.

Regarding the libssl and libc errors: They might be corrupted. Try to reinstall them with
Code:
apt install --reinstall libc6 libssl1.1

Regarding the zpool:
Do you see the drives if you run lsblk? Does anything show up if you run zpool import (without a pool it only scans available disks)?
Hi Aaron,

Thanks you very much for your answer. I managed to fix the GUI problem reinstalling those packages as you suggested.

After that, the problem with the dbus was still happening every time I tried to run a VM. I reinstalled dbus package, restarted the server and that fixed the problem, so I have my machines back and running.

About the USB, it's a pro kingston USB stick with around 150 read and write speed. I didn't know there were specific pen drives for this so it has been probably my fault to just assume this would be ok. I'll investigate more about this and find a better solution for my installation.

And the zpool... well, this is embarrassing... because I set all this up months ago, dumb myself completely forgot that eventually I passed through those 2 hard drives to an open media vault machine and configured the raid within the VM and not in proxmox. First I tested it straight in proxmox (hence that ZFS error in the log) but eventually I removed the pool and passed through the hard drives... So sorry for the mistake.

So summing up, all fixed and running. Thanks again for your answer and time.
 
If You want run PVE from flash devices You can run it as live system without persistence.
I've test setup (on production envoirment - we want to move from ESXi) running like this from sdcard with uptime around 300days. PVE have overlayfs with 1GB storage in RAM and after that time it uses around 100MB with 11 CT/VM running from ZFS RAID10 pool.
But this setup have drawbacks. If You change configuration or update PVE host You must rebuild squashfs image. You have rebuild image and write to boot sdcard before any reboot if You want logs, and statistics to stay.
I've perl script to automate this, and some systemd timers for rebuilding system images in ram and writing to boot device. It isn't complete in 100% because it lacks writing latest image when UPS start working from battery, but it's work in progress. If would be any interest in this I'll write tutorial for this and share the scripts (after cleanup because I'm not programmer).
This is actually a very interesting approach but in my case I think I will move to a regular hard drive installation (after all, I use this as a home server)

Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!