mystery reboots

Have you tested the RAM? You can use memtest86+ from the Proxmox ISO. I just mention this as a few years back I had a similar issue with a Linux box (not running proxmox) and finally decided to test the RAM. Turns out one module was bad even though it was bought brand new a few months earlier.
yes sir, ran it all day and overnight , no issues, "you can use memtest86+ from the Proxmox ISO." yes did that, i do that first thing because i have been using computers since bbc micro lol , so testing rams in Pentium one and Celerons was a must
 
Last edited:
  • Like
Reactions: IIEP_IT
Have you tried intel_idle. max_cstate=1 (or eventually even 0)?
well never tried that . this processor is of when it z790 came new so no bios fix was there, now i do have the latest 129 microcode, but if cpu was damaged before that no ways to know. i was thinking of getting a new processor but hen 15th gen is near the corner so rather wait. well let me pu it at cstate 1 and try it
 
well never tried that . this processor is of when it z790 came new so no bios fix was there, now i do have the latest 129 microcode, but if cpu was damaged before that no ways to know. i was thinking of getting a new processor but hen 15th gen is near the corner so rather wait. well let me pu it at cstate 1 and try it

I had some 12gen inexplicably freezing on some kernel, limiting the C states miraculously stopped it. True it was not rebooting, but that was not PVE, there's the watchdogs ... and that something does not leave any logs may simply mean it was frozen before watchdog expired so nothing flushed.
 
I had some 12gen inexplicably freezing on some kernel, limiting the C states miraculously stopped it. True it was not rebooting, but that was not PVE, there's the watchdogs ... and that something does not leave any logs may simply mean it was frozen before watchdog expired so nothing flushed.
yeah right , there is on bsod in linux to create crash report :)

Edit /etc/defaults/grub:

intel_idle.max_cstate=1 put in default int he end

update-grub
shutdown -r now

done
lets see now
 
yeah right , there is on bsod in linux to create crash report :)

There is kdump [1], there's quite a few guides out there [2].

But my issue with PVE is that it is useless for me because I am not going to be building the kernel every time myself [3].

But if your CPU falls asleep like snowy white, I am afraid no screen nothing would help you, not even record-capturing video output (you typically get strace on the screen even with drives not flushed).

[1] https://www.kernel.org/doc/Documentation/kdump/kdump.txt
[2] https://www.cyberciti.biz/faq/how-to-on-enable-kernel-crash-dump-on-debian-linux/
[3] https://forum.proxmox.com/threads/where-to-get-dbg-kernel.141686/#post-634966
 
Also, if you want to be sure it's not the softdog rebooting your machine and rather see it frozen, you may put:

options softdog soft_noboot=1

into:

/etc/modprobe.d/softdog.conf
 
Sep 07 15:41:27 Prox1 watchdog-mux[1231]: Watchdog driver 'Software Watchdog', version 0
Sep 07 15:41:27 Prox1 kernel: softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
Sep 07 15:41:27 Prox1 kernel: softdog: soft_reboot_cmd=<not set> soft_active_on_boot=0

yeah i got that this i saw in many of my logs in course of months but i ignored to look into it because i was focused on looking into hardware and wiring.

sometimes a loose neutral can cause spikes in the voltage line , or relay like in APC where it cuts neutral too on battery mode in my ups. if that relay is chattery even in neutral then also i have see out of 10 systems one will randomly reboot. but since that is rooted out now. i am exploring this

so
for me right now i have done
# intel_idle. max_cstate=1
if this does not work
i will try
# intel_idle. max_cstate=0
if this also does not work. Then remove the above
and put
# options softdog soft_noboot=1

if it was softdoggie doing it then it means the system was not frozen and it could have made some log.
and thanks for everyone for the forum help and suggestions
 
Last edited:
Sep 07 15:41:27 Prox1 watchdog-mux[1231]: Watchdog driver 'Software Watchdog', version 0
Sep 07 15:41:27 Prox1 kernel: softdog: initialized. soft_noboot=0 soft_margin=60 sec soft_panic=0 (nowayout=0)
Sep 07 15:41:27 Prox1 kernel: softdog: soft_reboot_cmd=<not set> soft_active_on_boot=0

This is totally fine, this is just loading the module, it is essentially active on every PVE install.

yeah i got that this i saw in many of my logs in course of months but i ignored to look into it because i was focused on looking into hardware and wiring.

sometimes a loose neutral can cause spikes in the voltage line , or relay like in APC where it cuts neutral too on battery mode in my ups. if that relay is chattery even in neutral then also i have see out of 10 systems one will randomly reboot. but since that is rooted out now. i am exploring this

I see.

so
for me right now i have done
# intel_idle. max_cstate=1
if this does not work
i will try
# intel_idle. max_cstate=0

The first one limits the C state, the second basically prevents using the driver.

if this also does not work. Then remove the above
and put
# options softdog soft_noboot=1

So this one is completely independent from my point of view, i.e. you can put it there even now.

if it was softdoggie doing it then it means the system was not frozen and it could have made some log.

That's not entirely true, you can have the system frozen to the point that it cannot flush log onto disk, but the softdog manages to reboot it. If you instead got to the system frozen (hours later), you would see trace on the screen where it went belly up, at the least.

and thanks for everyone for the forum help and suggestions

Cheers!
 
This is totally fine, this is just loading the module, it is essentially active on every PVE install.



I see.



The first one limits the C state, the second basically prevents using the driver.



So this one is completely independent from my point of view, i.e. you can put it there even now.



That's not entirely true, you can have the system frozen to the point that it cannot flush log onto disk, but the softdog manages to reboot it. If you instead got to the system frozen (hours later), you would see trace on the screen where it went belly up, at the least.



Cheers!
besides intel cstate i found that one of my nvr agent dvr windows vm was put changed from host cpu to default at after reloading of proxmox.
way before i did a nvfix entry with host set cpu and that line was still there so the vm was still using host cpu , changed that too.
5 day now still running, lets see
 
  • Like
Reactions: esi_y

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!