Random Proxmox Server Hang | No VMs | No Web-Gui

Hi @Stefan_R,

I assume the below steps are only valid for a grub bootloader scenario and not if using ZFS on EFI (systemd-boot)?

Thanks!


Alternatively, a more advanced method would be to install kdump-tools. This should provide you with a log even in case of a kernel panic. There's more too that (many tutorials online though), but in general:

Code:
# apt install kdump-tools

Select no for kexec reboots
Select yes for enabling kdump-tools

# $EDITOR /etc/default/grub

Add 'nmi_watchdog=1' to the end of 'GRUB_CMDLINE_LINUX_DEFAULT'

# $EDITOR /etc/default/grub.d/kdump-tools.cfg

Change 128M to 256M at the end of the line

# update-grub
# reboot
# cat /sys/kernel/kexec_crash_loaded

should show 1 now

Next time the system crashes, it should automatically reboot after a while. You can then find a crashlog in /var/crash/<date>/dmesg
 
My problem seems to related to this :

https://cgit.freedesktop.org/drm-intel/commit/?id=a75d035fedbdecf83f86767aa2e4d05c8c4ffd95

Can we hope that it will be include in the proxmox kernel one day?

The fix you mention has been mainlined in 5.3, so it will be fixed once we ship that kernel (currently the plan is to ship 5.3 with the next minor version, but no guarantees).

I assume the below steps are only valid for a grub bootloader scenario and not if using ZFS on EFI (systemd-boot)?

When using systemd-boot, just install kdump-tools as described and add the kernel cmdline stuff manually:

Code:
# $EDITOR /etc/kernel/cmdline

Append 'nmi_watchdog=1 crashkernel=384M-:256M' to the end

# pve-efiboot-tool refresh
 
The fix you mention has been mainlined in 5.3, so it will be fixed once we ship that kernel (currently the plan is to ship 5.3 with the next minor version, but no guarantees).

Great news. Do you have a idea for the delay ( 1 month , 6 month, 1 year ?)? If you want a beta tester for kernel, i am here :)

In the meantime i try some some quirks and grub parameter, so far no luck for a stable system under a heavy load ( a simple backup crash ramdomly a node !)

Can i use a "standard"kernel ? For the info i don't use ZFS ( but i use ceph)

Edit : i install an ubuntu version 5.3 kernel. Just waiting if the problem is solve


But thanks again for all the info
 
Last edited:
Great news. Do you have a idea for the delay ( 1 month , 6 month, 1 year ?)? If you want a beta tester for kernel, i am here
[...]
Edit : i install an ubuntu version 5.3 kernel. Just waiting if the problem is solve

What I'll say is that I'm currently running/testing a 5.3 kernel on my workstation without issue ;)

Also, the 5.3 sources are already in our git repository, if you want to build it yourself. It is also based on an Ubuntu kernel.
 
  • Like
Reactions: Dark26
Code:
git clone --depth 1 git://git.proxmox.com/git/pve-kernel.git
cd pve-kernel/
git submodule update --init --recursive

Work in progress for compiling the kernel with your sources.
 
The fix you mention has been mainlined in 5.3, so it will be fixed once we ship that kernel (currently the plan is to ship 5.3 with the next minor version, but no guarantees).

Thanks for the heads up, sounds like it wont take extremely long until this will be released ;).
I will disable power states on my problematic machines until this minor version is released and test if it fixes it.
 
FYI, a 5.3 kernel is now available in our pvetest repository (as the name implies: for testing only) as package pve-kernel-5.3 (has to be manually installed after adding the repo).
 
FYI, a 5.3 kernel is now available in our pvetest repository (as the name implies: for testing only) as package pve-kernel-5.3 (has to be manually installed after adding the repo).

it's running @home !

Code:
Version du noyau Linux 5.3.7-1-pve #1 SMP PVE 5.3.7-1 (Wed, 23 Oct 2019 19:00:21 +0200)

Thanks again.
 
Hi everyone. I'm sorry for reviving this topic again. I believe I have a very similar issue that no matter what I can't fix.
Initially I tough it might be related to some ZFS / RAM issues but after destroying the ZFS pool the issue persists.

I'm currently running the most recent version of proxmox (6.3-3) with the most recent kernal (I believe) "Linux 5.4.78-2-pve #1 SMP PVE 5.4.78-2 (Thu, 03 Dec 2020 14:26:17 +0100"

Between 3 to 5 days of uptime the server restarts by itself and after checking the /var/log/syslog file it is indeed showing something like ^@^@^@^@^@^@^@^@^@^@ before it reboots.

I have created a new post on this forum about my issue Constant restart | Proxmox Support Forum but I have found that maybe this topic might be the best one since we have very similar issues.

I have proxmox installed now for some months and it is restarting around 3 in 3 days. I'm really afraid that my VMs get corrupted.

Do you know what can I try to fix this issue?

Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!