PVE keeps shutting down and I can't figure out why

blucobalt

New Member
Jun 2, 2023
15
1
3
Basically what the title says. Whenever I boot it normally, everything works correctly for 3-4 minutes before it halts and resets. If I boot into single user mode, it doesn't turn off.
I've tried going into single user and checking the journals of previous boots (think `journalctl -b-1`) but I've found nothing of substance. During boot there are kernel warnings about ACPI or something but I don't think that is where this issue is coming from. What could this be? I'm at a loss on how I should approach this.
Thank you.
 
The server is stable with unraid on a 6.1.79 kernel. How can I get proxmox to run with that? I tried installing pve-kernel-6.1 but whenever I try to boot it it fails mounting rpool because of zfs features, and I get stuck within the initramfs. I tried a "update-initramfs -k all -v -c" but that seemingly didn't do anything and booting 6.1 still fails.
 
Seeing that your server boots correctly with the older kernel you mentioned but not the latest PVE ones, it is possible that iommu settings which has changed recently, as described in the PVE 8.2 Roadmap wiki, maybe the source of your problem:

Kernel: intel_iommu now defaults to on​

The intel_iommu parameter defaults to on in the kernel 6.8 series. Enabling IOMMU can cause problems with older hardware, or systems with not up to date BIOS, due to bugs in the BIOS.

The issue can be fixed by explicitly disabling intel_iommu on the kernel commandline (intel_iommu=off) following the reference documentation.


Assuming you are using an Intel CPU, the above change may be affecting your system.

As noted in the attached reference documentation, the change you'll need to add intel_iommu=off to the Kernel Commandline, will depend whether you are using GRUB or Systemd-boot.
 
Yeah, I'm on intel. This is an old workstation board from 2008(?) that's running dual x5690s. Because I originally virtualized unraid (with a gpu passed through) I'm almost certain that I already had it working with iommu back in April when I still ran this as a proxmox node. The only reason that I moved unraid to bare metal was because of this issue that I had.
Regardless, I'll try disabling iommu directly.
 
As I suspected, I already had intel_iommu=on iommu=pt on my cmdline. Specifying intel_iommu=off didn't fix the issue. To prevent having to reinstall the node (and the headaches of having to have it leave and rejoin my cluster) is there another way to install the older kernel?
Well, I guess installing the older kernel is easy. Is there a way to make it work? How can I use a new version of zfs with 6.1?
1721200234082.png
This is where I'm left with 6.1. Can I include a new enough version of zfs in my initramfs to be able to import /? I run root-on-zfs on my (gentoo) desktop and it's fairly easy to setup there but I'm not sure how it's done here.

Apparently, that feature is something automatically enabled when imported by a new version of zfs. Is there a way to import it anyway ignoring the feature flag?
 
I got 6.1 booting by getting zfs-dkms from bookworm-backports. But now I'm still even getting the hard resets with 6.1. I am totally lost now. :(

1721203022139.png
 
I've been doing some digging and it looks like this server's ipmi can do serial redirection. If the kernel was panicking (or outputting any other useful info) is there a way that I could see it over serial? Over video, the screen freezes then just turns black.
 
To prevent having to reinstall the node
Well.. you still as of yet do not know why your server isn't behaving. There is nothing to guarantee that the older kernel will actually work to get rid of your current issues.
back in April when I still ran this as a proxmox node
Was this in the exact same cluster situation as now?

In light of the fact that we don't actually know what is wrong with the server, If I were you I'd do one of the following:

Either persist in trying to track down the actual problem.

Or:
1. Remove the node from the cluster, following this guide (do it EXACTLY as stated & it should work).
2. Install an older version as above & try & get it to work.

Edit: As I'm writing this I see you've posted an update confirming (probably) that its not a kernel linked problem. I guessed as much. I would bet that your NW & or cluster is where you should be looking. To confirm that; try installing PVE on that server but standalone not in the cluster, AFTER removing it from the cluster as above. If it works - you've got answer.
 
I reinstalled the node on probably the 17th or 18th and it's been working fine - up until now. I had it set up identically to how it was before and I wasn't having any issues until just now when I tried rebooting it to debug some drives. I am getting the hard resets again. I tried pinning the 6.5 kernel that came with the installer (current most up-to-date is 6.8) with the ~2 minutes of uptime I get and that didn't fix it. I am totally at a loss again. I really don't wanna reinstall this node again. I have no idea where to go from here.
 
Honestly, I think I'm just going to give up on running proxmox on this machine - unraid is by far the biggest vm that I'm running on it (so I might as well commit the whole thing) and it's not like unraid can't run its own virtual machines. This is super frustrating but I really just want this server running again.

To be fair, this is probably a problem with the hardware that proxmox is running into. I'm still in highschool so it's not in my budget to build a machine with newer parts, so I guess I am just going to have to roll with what works.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!