PVE kernel panic - how to recover?

dah

New Member
Mar 26, 2023
10
1
3
Hi,

I am freaking out a bit, because this happened when I need it the least ...
A few month ago, I turned off my last raspberry pi and migrated everything to PVE. For that purpose built a server using the following (brand new) components:
- Intel NUC 12 Pro Kit NUC12WSHI70Z(Intel Core i7 , 1260P)
- 2x Crucial Laptop Memory, SO-DIMM, DDR4, 32GB, 3200MHz
- Transcend MTS420S (120 GB, M.2 2242) as system disk
- WD Red SN700 NVMe M.2 - 1TB as VM storage drive
Nuc.png


I installed PVE (I can't tell which version, because it is not booting up anymore, but it must have been a version I downloaded ~ mid of March) and used the Transcend SSD as system drive and put the VMs on the WD red NVMe.
Later I connected my NAS via NFS and set up a PBS as virtual machine. The NAS was mounted via NFS on the PBS server.
Then I configured a bunch of virtual machines and containers and configured nightly backups via the PBS.

Backups were run nightly of containers and once per week of every VM and stored on my NAS via the PBS.
The server was running continuously for a couple of months without issues. From time to time I did a apt-get update/upgrade, last time I did that was probably two days ago. However, there were actually no updates for quite a while, so I think it might auto update from time to time(?).

Now I had my NAS turned off for a few days (therefore no recent backups were done - no big issue). Today I turned it back on and since I configured a few things in some containers, I run a backup job.
Approx. half an hour later I noticed that the PVE was not working anymore. I had to reboot it by pressing the power button and then I got this screen:

Kernel_Panic_small.png

What I did so far:
- rebooted --> (same thing)
- run a memory test --> (doesn't show anything on the screen ... just a black screen ...)
- took out a memory bar at a time and booted --> (same thing)
- downloaded PVE 8.0-2 and booted in the recovery mode --> (same thing)

I guess there is no way to recover the current installation?

I guess I have to reinstall everything?
To be safe, should I order a new system SSD in case it is a hardware issue?

If I have to reinstall everything, could anyone point me to some instructions on how to recover the VMs/containers? They should be on the WD Red NVMe and in principle on my NAS as backup from the PBS.
However, I find plenty of instructions on the web on how to restore VMs and containers from backups, but no instruction on how to restore everything in case the PVE fails.
What I would have to restore is:
- The PVE installation
- The volumes on the WD Red NVMe and the VMs re-added to PVE (I don't know how, does PVE recognize automatically that there are VMs/containers on or do I have to import them somehow?)
- all the PVE configurations (users, backup plans, mount points etc.)
- The PBS setup ...

Are there any instructions on how to do that efficiently and without e.g. reformatting the WD Red NVMe?

This also raises the question for me why it is possible to backup all VMs/containers but not the actual PVE with all its settings? Is there a way to do that in the future?

Thanks a lot!

Best,
 
So my PVE host is the first one in history that failed and had to be reinstalled?
Nobody ever had to reinstall a PVE host and then recover from backups?
 
downloaded PVE 8.0-2 and booted in the recovery mode --> (same thing)
The recovery mode fails aswell? This does not involve the local SSDs, if this is the case it seems you have a hardware problem. Try updating BIOS and also booting up other live linux distros like Ubuntu LTS (which is very similar to PVE kernel-speaking).

So my PVE host is the first one in history that failed and had to be reinstalled?
With desktop components, you have more problems (according to the problems I saw here on the forums) than with server hardware. New hardware has also a lot of quirks in the beginning, so you may want to wait on problems.

Nobody ever had to reinstall a PVE host and then recover from backups?
At least I never did. A linux system is much better to troubleshoot and fix than any other system I ever worked with.
 
Thanks or your response!

Yes, the recovery mode (with 8.0 and 7.4) fails too.

It only happens with the transcent system SSD.
An ubuntu live image boots up just fine. I used it to clone the disk before doing anything else.

Since it is my home-server, loud and energy hungry server hardware is not an option. Up to my knowledge NUCs used widely for proxmox hosts.

Anyhow, I am now going to attempt a reinstallation ... What really would be helpful to know (because I don't find any information about it anywhere) is, how things can be seamlessly restored from backups (VMs, containers, etc.). Notably, I am not talking about having a running proxmox host in which I can select an earlier backup (which is the only information I can find online about recovering from backups), I am talking about:

- If I freshly install proxmox and then attach an SSD that was previously attached to another proxmox host (before it failed) and contains all VMs and containers, will Proxmox recognize them out of the box? Or do I have to edit some config files to have proxmox recognize them? Or (worst case) is proxmox going to wipe the device and delete all VMs/contains once I attach it?

- Once I have (re)added my proxmox backup server (istalled on a VM), which backed up all data on my NAS, Is proxmox going to automatically recover/recognize all backups?

- Since I haven't wiped the system SSD yet and can access it via a live image (e.g. ubuntu), is there any way to backup all the configurations of my proxmox host? (PVE and linux users, mount points, some config files that might be required to reconnect the VMs seamlessly?)

Thank you!
 
- If I freshly install proxmox and then attach an SSD that was previously attached to another proxmox host (before it failed) and contains all VMs and containers, will Proxmox recognize them out of the box?
No

Or do I have to edit some config files to have proxmox recognize them?
Yes. Configuration is stored on the main disk in /var/lib/pve-cluster/config.db.

Or (worst case) is proxmox going to wipe the device and delete all VMs/contains once I attach it?
If you only have one disk, yes.

- Once I have (re)added my proxmox backup server (istalled on a VM), which backed up all data on my NAS, Is proxmox going to automatically recover/recognize all backups?
Yes, and you can then restore them. I'm puzzled why you didn't test this beforehand? Why do backups if you don't know how to restore them in a catastrophic failure?

- Since I haven't wiped the system SSD yet and can access it via a live image (e.g. ubuntu), is there any way to backup all the configurations of my proxmox host? (PVE and linux users, mount points, some config files that might be required to reconnect the VMs seamlessly?)
Yes, if the data is readable. The main problem I see is the kind of error you got with your SSD and it seems only with this SSD. The full error message (the text before the one on your screen) would be helpful. Have you got another disk that you could try?
 
If you only have one disk, yes.
VMs/containers (nvme0n1) and the system (sda) are on separate disks. There was another SSD (sdb) that was added to one of the lxc containers.
lsblk.png

Yes. Configuration is stored on the main disk in /var/lib/pve-cluster/config.db.
It is a binary file, I installed pve 8.0.2 now, the old one was 7.X. Can I copy this file over from my former installation? It seems to be an SQLlite file(?).
Will this restore the entire PVE setup including the integration of VMs, containers, and mount points?




Yes, and you can then restore them. I'm puzzled why you didn't test this beforehand? Why do backups if you don't know how to restore them in a catastrophic failure?

How should I? When I set up proxmox, I didn't know PBS. I got to know about it after the system was already up and running. In order to test it, I would have needed a second proxmox server or reinstall the existing machine for the sake of testing whether I am actually able to recover, which would have put me in the same situation as I am in right now. The PBS worked and I was able to restore backups when the PVE was running PBS was already integrated into it.

I kind of assumed that instructions for that would be out there, because this is what it is made for (I assumed). But it turns out that I was unable to find any instructions on how to reintegrate everything into a new PVE installation after a catastrophic failure, either because it is so trivial that nobody thinks that information on that is required, I haven't dug deep enough, or it never happened before. My assumption was that it must be trivial and I would figure it out once I need it ... now it is time and since I have never done that before, it raises questions on how to do that and on whether there are pitfalls that I should avoid.
I know two more people who also have set up PVE with PBS and were wondering what to do if PVE fails. They also assumed that recovering must be trivial, but they also didn't know how to do that ...


Yes, if the data is readable. The main problem I see is the kind of error you got with your SSD and it seems only with this SSD. The full error message (the text before the one on your screen) would be helpful. Have you got another disk that you could try?
I ordered a new SSD, but it hasn't arrived yet ...
I couldn't get more information. It boots and boom.


Thanks for helping me!
I appreciate!
 
How should I? When I set up proxmox, I didn't know PBS. I got to know about it after the system was already up and running. In order to test it, I would have needed a second proxmox server or reinstall the existing machine for the sake of testing whether I am actually able to recover, which would have put me in the same situation as I am in right now. The PBS worked and I was able to restore backups when the PVE was running PBS was already integrated into it.
Maybe you heard of this thing called virtualization ;)
From the beginning, I have virtualized PVE and PBS inside of PVE to play around with stuff, so that I can learn and not f*ck up my main systems. I've been running and testing everything therein, from clusters to hard failure scenarios like starting up from scratch, reintegrate into PBS and restore all.
Yes, it is not 100% the same, but it is sufficient to also test multi-disk setup, other filesystems like ZFS, even running a cluster on ceph or ZFS-over-iSCSI. With PCIe passthrough, I was also able to simulate MacOS or passthrough FiberChannel devices to have "a proper FC SAN". I've also running other hypervisors like Hyper-V, VMware and "userspace stuff" like KVM/QEMU or VirtualBox inside of VMs. The power of virtualization is almost unlimited and everything is very easy with PVE (to the extent of PCIe, which can be VERY tricky if it works at all).


Can I copy this file over from my former installation? It seems to be an SQLlite file(?).
Will this restore the entire PVE setup including the integration of VMs, containers, and mount points?
If you stop all pve-related services beforehand, yes. This is one way to do it and it'll include eveything that is inside of your /etc/pve folder. Mountpoints can be more tricky, it depends on how they were before. If they're configured over the GUI as storage, they'll be in it, but if they were configured via /etc/fstab, they will not.
 
  • Like
Reactions: dah
Thank you!
I kind of assumed that it is not possible to run a PVE host as VM on another PVE host! Thus, I never thought that far ... That of course opens quite some doors ... Thanks for pointing that out!

I stopped all services, copied the old config.db over, restarted the service, and it seams like almost everything that was previously configured in the WebUI is back. Thanks a lot! What is missing, is a directory of a local SSD (sdb1) that I previously had added to a container. It shows up with status unknown in the left menu that shows the server overview or in Datacenter>Storage as ID "nextcloudSSD" and path target "/mnt/pve/nextcloudSSD".

I guess this was not properly restored, because I didn't copy /etc/pve ... I didn't do that, because the directory was empty on the mounted former disk. I browsed this forum and found that /etc/pve is mounted by the pve-cluster service, which is of course not running on my PC on which I mounted the cloned former pve system disk.
On the new host, I can see that /dev/fuse is mounted on /etc/pve, but /dev/fuse is a device that I guess is created by the pve-cluster service, so if there is no pve-cluster service (on my PC), there is no fuse and no content. Where can I find the content?

Thanks a lot!


[edit]
All fixed and working again.
Anyhow, this was helpful to add the mountpoint for the local SSD again:
https://blog.netnerds.net/2022/03/add-existing-storage-proxmox-without-wiping-it-first/

Although I haven't copied /etc/pve, all config files incl the one with the passed through mountpoint for the nextcloudSSD (/etc/pve/lxc/104.conf) was back incl. all settings. Therefore, I believe that the pve-cluster service must have pulled this out of the /varlib/pve-cluster/config.db file. Is pve-cluster extracting this information from the config.db file, and crease /dev/fuse that will be mounted to /etc/pve and filled with all required configuration files based on the config.db?
[/edit]
 
Last edited:
I kind of assumed that it is not possible to run a PVE host as VM on another PVE host! Thus, I never thought that far ... That of course opens quite some doors ... Thanks for pointing that out!
It may be harder to get nested virtualization to work, yet if you "just" want to simulate PVE and containers inside of PVE, it is very easy. Actually, I started to play with PVE many years ago inside of virtualization and then it was upgraded to a first citizen in my environments and I never looked back.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!