Random 6.8.4-2-pve kernel crashes

Der Harry · May 9, 2024

zzz09700 said:
So, basically updating to 6.8.9 crashes all those freezing and crashing?
That's easier than I thought.

Yes

- either someone downports the fixes to pve
- or upstream debian will update to a more recent kernel (and don't break it with their own "things")

Problem is fixed at least in (vanilla) >= 6.8.8, 6.8.9

Thank you Proxmox for letting me find this out. I tell my friends!

(Go an tell your customers what YOU know now!)

zzz09700 · May 9, 2024

We are doing fine. Main production servers are still in vSphere 6.7/7 as we were evaluation all sorts of "jumping the sinking Broadcom ship" solutions just before PVE 8.2 release.

A few of internal servers are all on PVE 8.1, as long as they don't update themselves it's all ok.

I'd say we really dodged one by inches this time.

Der Harry · May 9, 2024

zzz09700 said:
I'd say we really dodged one by inches this time.

You don't "need" 6.8.x at this point.

However.

If you get a 92 core EPYC - you can't wait.

It's more how the situation was handled - not a broken kernel. Even Linus hat to rollback from a Kernel he just released.

I absolutely left in the rain with a gigantic problem - and nobody cared.

zzz09700 · May 9, 2024

Der Harry said:
You don't "need" 6.8.x at this point.

However.

If you get a 92 core EPYC - you can't wait.

It's more how the situation was handled - not a broken kernel. Even Linus hat to rollback from a Kernel he just released.

I absolutely left in the rain with a gigantic problem - and nobody cared.

I don't recall any news of Linux kernel 6.5 not properly supporting high core count EPYC servers.

I could be wrong on this one, 32 cores per socket is very much enough for us and anything higher is causing heat problems.

Der Harry · May 9, 2024

zzz09700 said:
I don't recall any news of Linux kernel 6.5 not properly supporting high core count EPYC servers.

I could be wrong on this one, 32 cores per socket is very much enough for us and anything higher is causing heat problems.

As mentioned before ... 2x 92 cores will be the new normal.... There are patches for >=512 cores.

In other words - a posibility to easily get a custom kernel for pmox (and the fact that eventually - I have to make it happen) - that's my learning of this bug.

jsterr · May 10, 2024

Can anyone summarize the current state of the kernel crashes and potential fixes?

Der Harry · May 10, 2024

jsterr said:
Can anyone summarize the current state of the kernel crashes and potential fixes?

Ask Proxmox GmbH.

I am done wasting my time for nothing

zzz09700 · May 10, 2024

jsterr said:
Can anyone summarize the current state of the kernel crashes and potential fixes?

All PVE kernel 6.8 versions are unstable. Keep clear of those. Stay with 8.1/kernel 6.5.
No ETA for stable PVE kernel 6.8.
No tested fix available.

If necessary do a full offline reinstall to roll back to 8.1.
Here's the link for 8.1 ISO
https://enterprise.proxmox.com/iso/proxmox-ve_8.1-2.iso

I think I get it rounded up pretty good.
And I'm also pretty sure this post should be pinned at the top of this forum so no more users get hurt by an unstable kernel.

darthmaul0181 · May 10, 2024

I have a root server @ Hetzner (AX41-NVMe - AMD Ryzen 5 3600). I have also suffered from the instabilities of kernel 6.8.4.-2: host crash, no longer responding. According to a Hetzner technician, my server didn't show a video output and didn't respond to any keystrokes.
Impossible to see anything in the logs apart from a whole series of unreadable characters.

I went back to version 6.5.13-5 of the kernel a few days ago. This has enabled me to get back to a stable server. This was the last stable kernel my server ran under before upgrading to 6.8.

Der Harry · May 10, 2024

zzz09700 said:
If necessary do a full offline reinstall to roll back to 8.1.
Here's the link for 8.1 ISO
https://enterprise.proxmox.com/iso/proxmox-ve_8.1-2.iso

What also works (for me) - in case the Proxmox ISO doesn't even boot

- Install Debian 12 base system (just SSH and base - no gui ...)
- Install PVE on top of that: https://pve.proxmox.com/wiki/Install_Proxmox_VE_on_Debian_12_Bookworm

DomFel · May 10, 2024

I keep having reboots after these messages, reverting to kernel 6.5 didn't help

Code:

May 10 18:17:01 CRON[15203]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 10 18:17:01 CRON[15204]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 10 18:17:01 CRON[15203]: pam_unix(cron:session): session closed for user root

Der Harry · May 10, 2024

DomFel said:

I keep having reboots after these messages, reverting to kernel 6.5 didn't help

Code:

May 10 18:17:01 CRON[15203]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 10 18:17:01 CRON[15204]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 10 18:17:01 CRON[15203]: pam_unix(cron:session): session closed for user root

How is that a problem?

DomFel · May 10, 2024

Der Harry said:
How is that a problem?

No idea, that's why I'm asking.

The server reboots every single time after those messages (every 2 hours).

Der Harry · May 10, 2024

DomFel said:
No idea, that's why I'm asking.

The server reboots every single time after those messages (every 2 hours).

https://en.wikipedia.org/wiki/Cron

Read the section about "@hourly".

A total normal operation for even the linux that runs on a raspberry pi zero.

DomFel · May 10, 2024

Der Harry said:
https://en.wikipedia.org/wiki/Cron

Read the section about "@hourly".

A total normal operation for even the linux that runs on a raspberry pi zero.

I know very well what cron is, and I also checked that no new job has been added to the system. Just FYI, I have been running proxmox since v.4

Coincidence is, I get reboots every single time after those cron jobs are completed.

Now, either you are willing to come up with an idea, or just stop pretending to help with wikipedia links.

Der Harry · May 10, 2024

How do you know that cron is rebooting the system?

> Simple test - just disable cron and wait.

zzz09700 · May 11, 2024

Der Harry said:
in case the Proxmox ISO doesn't even boot

I would recommend switching to something else, like another Linux distro + VirtualBox headless + phpVirtualBox.

The unseen fate has decided that PVE is not for that machine, for reason we mortals cannot understand.

Der Harry · May 11, 2024

zzz09700 said:
I would recommend switching to something else, like another Linux distro + VirtualBox headless + phpVirtualBox.

The unseen fate has decided that PVE is not for that machine, for reason we mortals cannot understand.

I've seeh that very often, that the PVE iso doesn't boot.

To be honnest - I've also seen Debian being "so stable" that the default kernel also doesn't boot.

That is actually my learning - from the bug and from the reaction of Proxmox.

We need more (pve-) kernel options. I am working on that.

jsterr · May 13, 2024

https://www.thomas-krenn.com/de/wiki/Known_Issues_Proxmox_8.2

Ramalama · May 13, 2024

Did anyone with kernel crashes, tryed to disable Hyperthreading?

I have no kernel crashes here, but issues related to the scheduler, its not working how it should (i think). Still debugging the issue.
But my issue is definitively related to HT, because everything that runs on HT-Cores has only like 20-30% of the performance as it would run on a real core.
The CPU has nothing todo, so the task should normally run anyway on real cores instead of HT, but the scheduler priorizes the HT-Cores somehow and those are ultra slow.

Im Not sure if the issue is related, but you guys could test with disabling HT, if your freezes/crashes goes away.
And im still debugging here the Root Cause.

The affected systems are 2 Genoa Servers here. (both with 9374F + 12x64GB Dimms + 8x Micron7450 Max)
Turning Hyperthreading off, doesn't hurt me much, because i have still 32 Cores per Server.

Cheers

Random 6.8.4-2-pve kernel crashes

Active Member

Active Member

Active Member

Active Member

Active Member

Famous Member

Active Member

Active Member

Member

Active Member

New Member

Active Member

New Member

Active Member

New Member

Active Member

Active Member

Active Member

Famous Member

Renowned Member

We value your privacy