Proxmox - Sudden Reboots

reg_ed

New Member
Apr 24, 2022
6
0
1
Howdy folks, I felt compelled to report this (and ask for help) as I have run out of ideas and noticed that another user is having a similar issue.

One of my nodes has begun restarting with no warning or logged info.

The system is a 6 NIC, fanless network appliance with an i7-8565U CPU, 32Gb DDR4 RAM (2x16Gb) and a pair of 250Gb SSDs in a ZFS Raidz1 mirror.

I only noticed this issue about 3 weeks ago when I upgraded to PVE 8. I didn't notice it before that, but I have been interacting with it much more since the upgrade.

At first it seemed to be random but since then I have found a few correlations:
  • Updating the system (Updates > Upgrade) has triggered it at least three times.
  • Creating large files full of random data on the local drive regularly (but not always) triggers it (I've also tested doing this inside a CT and it still triggers the restarts).
  • Importing a large ISO (>4Gb) via the webui to the local drive almost always triggers it, always at the part where the system copies the file from the temp directory to the ISO store (command: /usr/bin/scp -o BatchMode=yes -p -- /var/tmp/pveupload...:/var/lib/vz/template/iso/upload.iso)
I have attempted to rule out hardware causes thus:
  • Removing each of the disk drives in turn, booting the system in a degraded ZFS state and causing a reboot with either solo disk using the ISO upload method. This has ruled out the two disks.
  • I have run memtest for 8 hours and found no errors.
  • I have run a CPU stress test to check the CPU and also force the system into high power draw and therefore indirectly test the power brick.
The really odd thing, or possibly a clue, is the fact that if I remove either of the two RAM sticks, I can't induce the restarts. It only seems to happen when both RAM sticks are fitted in the system. I have noticed that with either of the RAM sticks removed, the behaviour of the ISO upload is quite different. With only one stick in the machine the local disk is writing almost constantly whilst the file transfers, however with both sticks in the system, the local disk hardly blinks whilst the file is transferring and then galvanises into action when the system runs the scp copy command at the end (which is when the restart happens).

Your thoughts and advice would be most welcome at this stage as I have completely run out of ideas and I can no longer consider this node to be stable.
 
The really odd thing, or possibly a clue, is the fact that if I remove either of the two RAM sticks, I can't induce the restarts. It only seems to happen when both RAM sticks are fitted in the system
Lots of posts out on the internet of similar behavior with dual channel memory. I would start by trying with 2 different memory sticks (different as manufacturer, compared to current sticks) and see what happens.
Here is one interesting post, though there are many out there. Crazy crash when using 2 RAM modules (32 GB)
 
In my situation described in this post i have only 1 ram module installed. I also just did memtest ant it passed. Took around 1h to do it. The device got really toasty - up to 70C. It didnt crash, Held up all the way.

My next guess would be the nvme, but smart test didnt find any errors. THe NVME is new too. The system is new in general.

My last guess it is just a botched device. Something wrong on motherboard or with some other component which is soldered onto the board. :( Not sure what that
 
Hello, I am having the same issues since the weekend with my Hetzner server were PVE 8.1.4 is installed with the latest updates. Or at least this is when I noticed it.
The server is randomly rebooting and I have no clue to why. The system has been running fine for nearly a year.
 
Google translate helps me.


Hello, I also have the same problems. I have a device that has been running for half a year without a fault. The first crash was on March 6 and the second crash was on March 12. I have the latest updates. The server is randomly rebooting and i dont know why.
 
Change the guest VM CPU type to something other than host and see if the crashes go away. My experience documented here.
 
I set it up. I'll let you know if there's a random reboot
Running a stress test tool like Aida64 simultaneously in multiple guest VMs can help quickly reproduce otherwise random reboots within minutes.
 
Last edited:
Same issue here, server was running for over 1.5 year without issue. Yesterday shutdown without reason, no recent system modification.
Hardware: ASRock AMD DeskMini X300 -> AMD Ryzen 7 5700G
Proxmox always up to date.

Code:
Mar 21 21:56:01 Zeus CRON[1792132]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 21 21:56:01 Zeus CRON[1792133]: (root) CMD (/usr/local/bin/chk_storage.sh >/dev/null 2>&1)
Mar 21 21:56:04 Zeus CRON[1792132]: pam_unix(cron:session): session closed for user root
Mar 21 21:57:01 Zeus CRON[1792630]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 21 21:57:01 Zeus CRON[1792631]: (root) CMD (/usr/local/bin/chk_storage.sh >/dev/null 2>&1)
Mar 21 21:57:04 Zeus CRON[1792630]: pam_unix(cron:session): session closed for user root
Mar 21 21:58:01 Zeus CRON[1793126]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Mar 21 21:58:01 Zeus CRON[1793127]: (root) CMD (/usr/local/bin/chk_storage.sh >/dev/null 2>&1)
Mar 21 21:58:05 Zeus CRON[1793126]: pam_unix(cron:session): session closed for user root
-- Reboot --
Mar 21 22:06:33 Zeus kernel: Linux version 6.5.13-1-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-1 (2024-02-05T13:50Z) ()
Mar 21 22:06:33 Zeus kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.13-1-pve root=/dev/mapper/pve-root ro quiet
Mar 21 22:06:33 Zeus kernel: KERNEL supported cpus:
Mar 21 22:06:33 Zeus kernel:   Intel GenuineIntel
Mar 21 22:06:33 Zeus kernel:   AMD AuthenticAMD
Mar 21 22:06:33 Zeus kernel:   Hygon HygonGenuine
Mar 21 22:06:33 Zeus kernel:   Centaur CentaurHauls
Mar 21 22:06:33 Zeus kernel:   zhaoxin   Shanghai

2024-03-22_08h11_02.png
 
Yesterday shutdown without reason
It appears you (just?) did the latest kernel update to 6.5.13-3-pve. Did this sudden shutdown occur after that or before? Do you always reboot after kernel updates?

From the image of your GUI, I can see the latest above kernel, but the CLI output after restart shows older kernel
Mar 21 22:06:33 Zeus kernel: Linux version 6.5.13-1-pve (build@proxmox)
Maybe you updated aftterwards?
 
It appears you (just?) did the latest kernel update to 6.5.13-3-pve. Did this sudden shutdown occur after that or before? Do you always reboot after kernel updates?

From the image of your GUI, I can see the latest above kernel, but the CLI output after restart shows older kernel

Maybe you updated aftterwards?
Yes, i updated afterwards and i'm always rebooting after kernel update.
There was NO kernel update involved last night and my server came not up again. I had so switch it ON with physical power button.
I will observe situation and keep post updated.
 
Change the guest VM CPU type to something other than host and see if the crashes go away. My experience documented here.


This didn't work for me. I have another problem.





My crashes were with kernel - 6.5.13-1.

I reinstalled proxmox (old version) and have kernel 6.5.11-4.

8 days so far with no random reboots.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!