Proxmox wont boot. Not sure where to start

TheJM

Member
May 3, 2019
45
1
8
33
As the title says, Proxmox wont boot, even after restarts and hangs on the line shown in this picture. It hangs up at:

"A start job is running for Import ZFS pools by cache file PANIC: blkptr at 000000000af0311db DVA 0 has invalid VDEV 131072."

and doesn't get past that screen. I have a lot of valuable data on this server and honestly dont know how to start troubleshooting it.

CPU: Intel 9900k
RAM: 64GB

Proxmox runs off of 2 SSDs in raid 1 (zfs raid 1) and all of my VMs and data are raided as well.

I looked in the bios and both boot SSDs show up.

Like i said, i bought the consumer version of proxmox for about $100 but dont know how to solve this. I obviosuly cant boot to run commands.

Your help is appreciated!

JM

EDIT: For some odd reason, it boots now but it has that Panic message very often and i cannot get any zpool commands to run and give me info i need. I was able to run an lablk and all of the disks are showing there. I just cant check the status of the pools. Then I made a Proxmox install disk, booted to it, and tried to enter recovery mode but i got the error message "error rescue boot unable to find boot disk automatically." Not sure why im getting this.
 
Last edited:
Hi, it seems like your cachefile got corrupted?
ZFS uses it do decide what pools to auto import, i.e., those which got imported already once.
Proxmox VE doesn't really uses this as it decides from reading /etc/pve/storage.cfg which pool it actively has to import.

What ZFS pools do you have on this machine?

You may either drop the cache completely, rm /etc/zfs/zpool.cache

Or set the cachefile to none for PVE managed pools: zpool set cachefile=none <poolname>
 
  • Like
Reactions: TheJM
Hi, it seems like your cachefile got corrupted?
ZFS uses it do decide what pools to auto import, i.e., those which got imported already once.
Proxmox VE doesn't really uses this as it decides from reading /etc/pve/storage.cfg which pool it actively has to import.

What ZFS pools do you have on this machine?

You may either drop the cache completely, rm /etc/zfs/zpool.cache

Or set the cachefile to none for PVE managed pools: zpool set cachefile=none <poolname>

I deleted the cache file like you suggested and rebooted but I still get that panic message and running zpool commands locks the system.
I have 3 pools on the server including rpool.

I recently installed the pve-kernel-4.15. not sure if that has anything to do with my error. If you look in the attached screenshot, it says '4.15.18-25-pve'
 
Last edited:
I recently installed the pve-kernel-4.15. not sure if that has anything to do with my error

You could try booting an older version to test that, they should be still installed or could be re-installed in parallel to other kernels from diferrent ABI versions.
 
You could try booting an older version to test that, they should be still installed or could be re-installed in parallel to other kernels from diferrent ABI versions.

How do i go about doing that?

Thank you for the help btw!
 
I deleted the cache file like you suggested and rebooted but I still get that panic message and running zpool commands locks the system.
I have 3 pools on the server including rpool.

Hmm, I find some similar reports for the 0.7 ZFS which is in use under Proxmox VE 5.x but they have all no clear resolution or the like.. @fabian any idea or seen this already?
 
How do i go about doing that?

On boot you should see the boot menu, either GRUB, or systemd-boot.
Both should also list older kernels, you can navigate to an older one with the arrow keys and just hit enter.

If you have no direct access and no IPMI/iKVM to this server there are some methods to pre-select the next kernel before reboot, but those are slightly more setup work.
 
I did actually do that, i booted into 4.15.18-24 and it gave me the same panic error. Im going to try to boot into a few more and see if that helps.

I just booted into 4.15.18-12 (the earliest version i have to choose from) and it gave me the same panic error.

Should i try reinstalling 4.15.25 (my current version)? Maybe it didnt get fully installed?
 
IMO it seems that one ZFS version registered some writes one a non existent VDEV and all ZFS versions notice that and produce the error message, that would match with the reports of similar error messages on ZFS On Linux GitHUb issues reports.

Are the pools are working OK for now? I.e., is this a rather cosmetic error?
 
IMO it seems that one ZFS version registered some writes one a non existent VDEV and all ZFS versions notice that and produce the error message, that would match with the reports of similar error messages on ZFS On Linux GitHUb issues reports.

Are the pools are working OK for now? I.e., is this a rather cosmetic error?

No they are not. i cannot start any VMs, i dont even know if the data in my zpools is still there and not corrupted.
 
This server is my main home server. It runs my home automation (Openhab), my camera system, my computer backups, plex and more and it is currently dead in the water. I just dont know what could have caused this.
 
I just dont know what could have caused this.

Anything else done unusual in recent times?

No they are not. i cannot start any VMs, i dont even know if the data in my zpools is still there and not corrupted.

All three? I mean the rpool obviously can get imported as else your boot wouldn't come that far..
What does a zpool status outputs?
 
Anything else done unusual in recent times?



All three? I mean the rpool obviously can get imported as else your boot wouldn't come that far..
What does a zpool status outputs?
I can't run any pool commands. When I do, the system locks. I get no output and I cannot press control-c to get out. I have to reboot to run any amore commands so im unable to check the status of any pools.
 
I can't run any pool commands. When I do, the system locks. I get no output and I cannot press control-c to get out. I have to reboot to run any amore commands so im unable to check the status of any pools.

@wolfgang sorry to bother you but would you have any advice for me?
 
You could try:
Code:
echo 1 > /sys/module/zfs/parameters/zfs_recover
zpool import -o readonly=on POOLNAME

Could need a while (minutes to hour). but should get you a readonly pool, if that hangs too the next thing you could try is booting the PVE Installer ISO in Debug mode, go to the second shell - there you have the ZFS stuff also available, could try to diagnose with that.

A 6.1 ISO may be even more worthy to try as it ships ZFS 0.8 which greatly improved importing, i.e., see https://github.com/openzfs/zfs/issues/6244#issuecomment-422187850
 
You could try:
Code:
echo 1 > /sys/module/zfs/parameters/zfs_recover
zpool import -o readonly=on POOLNAME

Could need a while (minutes to hour). but should get you a readonly pool, if that hangs too the next thing you could try is booting the PVE Installer ISO in Debug mode, go to the second shell - there you have the ZFS stuff also available, could try to diagnose with that.

A 6.1 ISO may be even more worthy to try as it ships ZFS 0.8 which greatly improved importing, i.e., see https://github.com/openzfs/zfs/issues/6244#issuecomment-422187850

I will try that command and let you know how it goes.

I made a Proxmox 5.4 ISO drive last night and tried to boot to recovery mode (or whatever its called) but on the grub screen of the installer, it kept telling me "error rescue boot unable to find boot disk automatically." Any way of getting around that so I can troubleshoot my system fro the ISO?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!