Proxmox 5.2 kernel panic

membranex

Member
Nov 6, 2018
20
3
8
40
Hi,

I've set up a fresh Proxmox 5.2 installation on HP Proliant DL385 G7 16GB ram with 6167SE Opteron on board. I've updated HP firmware to the latest version. Unfortunately the installation keeps randomly running into kernel panic (please see attached screenshot - sorry for bad quality, haven't got serial cable to get a proper log currently - the panic message is looped). I've ran memtest - no errors (it ran only 8 hours though). The only unusual thing about this setup is that it's a debian install with LUKS full disk encryption. Can this be related?

EDIT: I'm using LVM, no ZFS.
 

Attachments

  • panic.png
    panic.png
    980.3 KB · Views: 22
Last edited:
I've discovered that ksm-control-daemon was missing from my installation - installed it. Yet, I doubt this would be related?

My disks (all refurbished) do not report any errors currently but ADU reports in "Physical Drive Error Log Entries" some errors. I recall that when I was maxing out disk usage with iozone I got some kernel panics at random, so this definitely could be related. Any idea how can I track this down?
 
do you have upgrade the amd microcode to last version ? I had same kind of problem with opteron 61XX a long time ago.

(upgrade your bios, or "apt install amd64-microcode" from non-free debian repo)
 
Thanks so much for reply! I'm getting desperate with this problem :(

So - yes I had the firmware/bios updated after first panics using this patch: SP99289.exe (Updated the AMD processor microcode to the latest version. - I can't post a link due to antispam on the forum) - some older vms started to work a lot more stable with this patch (I had an ubuntu 14.04 imported from different KVM that was getting random panic every 10 minutes, even during boot - now it's all nice and quiet). Unfortunately host (proxmox) flipped with panic from the screen above even with bios patched. I'm perplexed about BIOS update - how come there are two places where the update is applied? Is it like some things like iLO might flip out the cpu and it's they who get an update? And amd64-microcode is for linux kernel? Or it's like each of those ways actually updates the cpu firmware?

Now I additionaly installed the amd64-microcode package you referred.

As for iozone - my former tests that caused panics were using files that would exceed total space on root partition - I suspect this is why eventually kernel flipped out in this particular case. Today I conducted another batch of iozone tests - this time I calculated file sizes carefully - the tests were utilizing 99% of the root partition (440mb free space left) and all went smoothly two times (initial write, rewrite, read, re-read, random read and random write tests with 5600mb files).

I also got rid of smartd since I can't upgrade the raid controller (utility provided by HPE does not work, I need to figure out how to make a bootable usb with fw update for p410i) and I discovered that there is an update that fixes reboots with some SATA drives and smartd. I have a SATA drive, but connected via usb, all other drives are SAS. But I got rid of smartd anyway since it can't see health status directly and HP utils solve same problem.

Also I tried increasing mem pressure to trigger panic (although panics happened with low system load). I maxed out ram and load (loadavd ~35) - everything worked ok.

What else could I possibly check?

EDIT: one other hunch - I'm still waiting for the UPS to be shipped and machine is powered from normal wall socket. Any chance it's some kind of power problem?
 
Last edited:
For anyone experiencing same feature - I hope I fixed the problem, system is running stable for 6 and a half days. Not sure which action fixed the issue, here are the ones I suspect are the fixes:

  • most likely p410i disk controller firmware update to latest version; the fix in latest firmware addressed deadlocks although in completely different context (SATA drive attached to the controller);
  • kernel update - 4.15.18-28 runs for past 6 days
Other factor might be: I'm not performing so many disk intensive operations as before - VMs are already set up, backup is once a week etc.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!