Proxmox sudden -- Reboot --

Well the case is still warm.

Did more testing. And now im not sure if its hardware issue. Device was unplugged from power and network for 2 days. Yesterdays started doing stress tests using s-tui. no crash. Did iperf3 on all NIC ports. no crash. Left device idling with containers running overnight. 11h no crash.

Not sure what to do now. Anyone got any ideas?

I still havent plugged it to my network, because it was DDOSSING my network when it was having seizure. Ill try to use it over the weekend in production mode, to see if maybe network traffic crashes it. Other than that i dont know what else to do.
 
I too think it is not a hardware issue.

I have the exact same problem with an ASRock Rack B650D4U mobo. Replaced the motherboard, RAM sticks and NVME separately with new ones and the problem persists. Nothing in the dmesg or journalctl -b -1 logs.

Next I am considering taking XCP-ng for a spin.

Recorded the screen at the moment and proxmox does not log anything:

MomentOfCrash.gifMomentOfCrashProxmox.gif

At the time of crash:
AtTheTimeOfCrash.png
 
Last edited:
Well, i really lack the knowledge to say for sure if its software or hardware issue. All i can do is provide my observations.

UPDATE:
So yesterday, after the device was up for around whole day but not connected to network, i connected it to network and started to deploy it into production mode. It was working for an hour or two and then it rebooted.. :(

Next I did reset CMOS as per manufacturers recommendations, and also plugged the ram into the other slot AND i used different eth ports on the device (in total device have 6x rj45 sockets). Left device plugged into the network, deployed in production mode over night. So far 14h Uptime.

During the day i was updating containers, backing up jobs, deploying new ones, using it with VPN (wireguard) from remote location. And it is still up.

So atm my hunch is at the bad eth ports. If it will stay up for a week on these eth ports, i will retry to use the previous ports on the device.
 
Check if changing the guest VM's CPU type to quemu64 or something other than host helps.
 
Last edited:
Check if changing the guest VM's CPU type to quemu64 or something other than host helps.
brugh..

Device was set up with the same containers not VM's 2 months ago or so, and was working with no problems for that time.

So far the uptime is 1d 11h, since i switched to different eth ports on the device and all the network traffic is going through only 2 ports.

So im thinking that the eth ports are the culprit that crash the device + the thing i noticed when it is in crashing state then my whole network goes DDOS mode and conencting to LAN services takes really long time.
 

Attachments

  • uxbk2z9LJN.png
    uxbk2z9LJN.png
    19.8 KB · Views: 8
Well atleast for me, its back to stable now. Been a week no crashes..

All i did was:
1) formatted the device
2) reinstalled proxmox
3) was still crashing
4) Restet CMOS, Plugged the ram into other ram socket (i got 2) and plugged it into my router on a different eth socket on the device

So far its been running, backing up, and idling no problem
 

Attachments

  • firefox_MLlV7BuLl6.png
    firefox_MLlV7BuLl6.png
    58.3 KB · Views: 7
Hello, I am having the same issues since the weekend with my Hetzner server were PVE 8.1.4 is installed with the latest updates. Or at least this is when I noticed it.
The server is randomly rebooting and I have no clue to why. The system has been running fine for nearly a year.
Same issue here, when we ask for HW replacement servers stop rebooting, but HW seems like its not core of the issue since when running stress tests inside VM server is fine, only with real word stress it reboots.

Sometimes it reboots only once in a month+, sometimes it is 3-5x a day, very very strange...

So far we have tried kernels 6.8.8-4-pve and 6.5.13-5-pve both have same issue, we tried even editing GRUB commands such as:
GRUB_
CMDLINE_LINUX_DEFAULT="consoleblank=0 nomodeset noapic pci=assign-busses apicmaintimer idle=poll reboot=cold,hard" or GRUB_CMDLINE_LINUX_DEFAULT="quiet splash processor.max_cstate=1 idle=nomwait" none of them has made any differance

At this point we are willing to try anything since we have large quantities of server with the same issue, do you have any more recommendations what to try?
 
At this point we are willing to try anything
Well I'll try & offer "anything"!

Since it appears your problem isn't linked to the latest kernels, since even with 6.5 you are showing problems, then possibly your issue is linked to the newer Qemu version 9. IDW your update/s situation.

Maybe start here to learn about this.
 
Well I'll try & offer "anything"!

Since it appears your problem isn't linked to the latest kernels, since even with 6.5 you are showing problems, then possibly your issue is linked to the newer Qemu version 9. IDW your update/s situation.

Maybe start here to learn about this.

I just checked one of the servers that was rebooting yesterday and its using `pve-qemu-kvm: 8.1.5-6` currently we downgraded that server to kernel 6.5.11-4 since there were some reports that on that version there was no random reboots.

When we get a server that is rebooting at least 3x a day we will upgrade it to Qemu v9 to see if that fixes it...

Once I get results on multiple servers testing 6.5.11-4 and Qemu v9 I will follow up with results
 
Here are more things that we have tried:
  • QEMU-KVM V8 and V9 system are still rebooting.
  • i440fx vs q35 system are still rebooting.
  • Changing the CPU type from host to x86-64-v4, the system is still rebooting.
  • Reducing CPU usage to max. 20%
After all these tests, we concluded that it must be a hardware fault, so we asked Hetzner to perform a hardware check. Five hours later, all tests passed. :)

Even though all tests passed, I asked them to replace all components except the SSD. They recommended first changing the RAM from Micron to Samsung and updating to the latest BIOS. The system was still rebooting.

After that, they replaced all components except the SSDs, and the server has been stable since then.

So now you have to wonder: if all hardware checks are passing and the system stops rebooting after a hardware change, where do you point fingers and where do you keep digging further?
 
Last edited:
I have the same issue here I have tested with ram tested with cards out updates to latest bios and firmware with Dell R930. I even reinstalled the pve 8.2.1 on Friday and rebooting Monday again not sure what to do now. I have check omsa and no issue with hardware so not sure what to do next.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!