Proxmox sudden -- Reboot --

andzejsp · Mar 8, 2024

Well the case is still warm.

Did more testing. And now im not sure if its hardware issue. Device was unplugged from power and network for 2 days. Yesterdays started doing stress tests using s-tui. no crash. Did iperf3 on all NIC ports. no crash. Left device idling with containers running overnight. 11h no crash.

Not sure what to do now. Anyone got any ideas?

I still havent plugged it to my network, because it was DDOSSING my network when it was having seizure. Ill try to use it over the weekend in production mode, to see if maybe network traffic crashes it. Other than that i dont know what else to do.

ProxyUser · Mar 8, 2024

I too think it is not a hardware issue.

I have the exact same problem with an ASRock Rack B650D4U mobo. Replaced the motherboard, RAM sticks and NVME separately with new ones and the problem persists. Nothing in the dmesg or journalctl -b -1 logs.

Next I am considering taking XCP-ng for a spin.

Recorded the screen at the moment and proxmox does not log anything:

At the time of crash:

andzejsp · Mar 9, 2024

Well, i really lack the knowledge to say for sure if its software or hardware issue. All i can do is provide my observations.

UPDATE:
So yesterday, after the device was up for around whole day but not connected to network, i connected it to network and started to deploy it into production mode. It was working for an hour or two and then it rebooted..

Next I did reset CMOS as per manufacturers recommendations, and also plugged the ram into the other slot AND i used different eth ports on the device (in total device have 6x rj45 sockets). Left device plugged into the network, deployed in production mode over night. So far 14h Uptime.

During the day i was updating containers, backing up jobs, deploying new ones, using it with VPN (wireguard) from remote location. And it is still up.

So atm my hunch is at the bad eth ports. If it will stay up for a week on these eth ports, i will retry to use the previous ports on the device.

ProxyUser · Mar 10, 2024

Check if changing the guest VM's CPU type to quemu64 or something other than host helps.

andzejsp · Mar 10, 2024

ProxyUser said:
Check if changing the guest VM's CPU type to quemu64 or something other than host helps.

brugh..

Device was set up with the same containers not VM's 2 months ago or so, and was working with no problems for that time.

So far the uptime is 1d 11h, since i switched to different eth ports on the device and all the network traffic is going through only 2 ports.

So im thinking that the eth ports are the culprit that crash the device + the thing i noticed when it is in crashing state then my whole network goes DDOS mode and conencting to LAN services takes really long time.

andzejsp · Mar 15, 2024

Well atleast for me, its back to stable now. Been a week no crashes..

All i did was:
1) formatted the device
2) reinstalled proxmox
3) was still crashing
4) Restet CMOS, Plugged the ram into other ram socket (i got 2) and plugged it into my router on a different eth socket on the device

So far its been running, backing up, and idling no problem

Dorex · Aug 12, 2024

eeeeb said:
Hello, I am having the same issues since the weekend with my Hetzner server were PVE 8.1.4 is installed with the latest updates. Or at least this is when I noticed it.
The server is randomly rebooting and I have no clue to why. The system has been running fine for nearly a year.

Same issue here, when we ask for HW replacement servers stop rebooting, but HW seems like its not core of the issue since when running stress tests inside VM server is fine, only with real word stress it reboots.

Sometimes it reboots only once in a month+, sometimes it is 3-5x a day, very very strange...

So far we have tried kernels 6.8.8-4-pve and 6.5.13-5-pve both have same issue, we tried even editing GRUB commands such as:
GRUB_
CMDLINE_LINUX_DEFAULT="consoleblank=0 nomodeset noapic pci=assign-busses apicmaintimer idle=poll reboot=cold,hard" or GRUB_CMDLINE_LINUX_DEFAULT="quiet splash processor.max_cstate=1 idle=nomwait" none of them has made any differance

At this point we are willing to try anything since we have large quantities of server with the same issue, do you have any more recommendations what to try?

gfngfn256 · Aug 12, 2024

Dorex said:
At this point we are willing to try anything

Well I'll try & offer "anything"!

Since it appears your problem isn't linked to the latest kernels, since even with 6.5 you are showing problems, then possibly your issue is linked to the newer Qemu version 9. IDW your update/s situation.

Maybe start here to learn about this.

Dorex · Aug 12, 2024

gfngfn256 said:
Well I'll try & offer "anything"!

Since it appears your problem isn't linked to the latest kernels, since even with 6.5 you are showing problems, then possibly your issue is linked to the newer Qemu version 9. IDW your update/s situation.

Maybe start here to learn about this.

I just checked one of the servers that was rebooting yesterday and its using `pve-qemu-kvm: 8.1.5-6` currently we downgraded that server to kernel 6.5.11-4 since there were some reports that on that version there was no random reboots.

When we get a server that is rebooting at least 3x a day we will upgrade it to Qemu v9 to see if that fixes it...

Once I get results on multiple servers testing 6.5.11-4 and Qemu v9 I will follow up with results

Dorex · Aug 14, 2024

Here are more things that we have tried:

QEMU-KVM V8 and V9 system are still rebooting.
i440fx vs q35 system are still rebooting.
Changing the CPU type from host to x86-64-v4, the system is still rebooting.
Reducing CPU usage to max. 20%

After all these tests, we concluded that it must be a hardware fault, so we asked Hetzner to perform a hardware check. Five hours later, all tests passed.

Even though all tests passed, I asked them to replace all components except the SSD. They recommended first changing the RAM from Micron to Samsung and updating to the latest BIOS. The system was still rebooting.

After that, they replaced all components except the SSDs, and the server has been stable since then.

So now you have to wonder: if all hardware checks are passing and the system stops rebooting after a hardware change, where do you point fingers and where do you keep digging further?

gfngfn256 · Aug 15, 2024

Dorex said:
after a hardware change, where do you point fingers

I guess it is down to either PSU, CPU, MB & NW. If these had been changed one by one with testing in the process, you'd have a better idea.

CrawfordHulk · Aug 19, 2024

I have the same issue here I have tested with ram tested with cards out updates to latest bios and firmware with Dell R930. I even reinstalled the pve 8.2.1 on Friday and rebooting Monday again not sure what to do now. I have check omsa and no issue with hardware so not sure what to do next.

sub2o5 · Oct 9, 2024

Same issue here. Any solutions or updates so far?

ShaMAD · Jun 10, 2025

Same issue here. Any solutions or updates so far?

sub2o5 · Jun 10, 2025

In my case i trashed the Asrockrack-Board and went with a Supermicro H13SAE-MF.
No further problems so far!

Proxmox sudden -- Reboot --

andzejsp

New Member

ProxyUser

Member

andzejsp

New Member

ProxyUser

Member

andzejsp

New Member

Attachments

andzejsp

New Member

Attachments

Dorex

New Member

gfngfn256

Distinguished Member

Dorex

New Member

Dorex

New Member

gfngfn256

Distinguished Member

CrawfordHulk

Member

sub2o5

Active Member

ShaMAD

New Member

sub2o5

Active Member

We value your privacy