Random Restarting

Did you try the new kernel?

I'm sure that all the tips and settings described here with "zfs set" and "vm.swappiness" are good to know and are useful. But that they are neither the cause nor the solution to the real "random restarting" problem.
My affected server was a factory new Supermicro server with two CPUs and 64 GB RAM and for testing only a small Windows Server 2008 VM, there was almost no load...
Maybe watchdog maybe BIOS maybe hardware or kernel I do not know
Our server came today from the warranty workshop, with the statement the problem is solved by BIOS adjustments
I do not believe that yet, the next days I will test again and report, if there is something to report.
 
No,
Our server came today from the warranty workshop, with the statement the problem is solved by BIOS adjustments
I do not believe that yet, the next days I will test again and report, if there is something to report.
Here was the server for over eight weeks! With this problem we have been busy for almost a whole year! Our first random restart was in April 2016. But now I will try it with the latest Kernel
 
Hello,

Has anyone found a solution to this?

I've been experiencing random reboots on Proxmox 4.x, and there's nothing relevant in the logs.

It started some months ago with a Supermicro X10-SLL-F with Xeon E3-1230v3 and 24GB ECC RAM. The reboots have no pattern, as it could run fine for weeks and then reboot, or it could reboot every few hours or even minutes.

I first suspected RAM but there were no memory errors and a run of memtest86 for 4+ hours showed nothing either. Then I replaced the power supply (Seasonic S12-II 430W), but the reboots persisted. Finally I replaced the motherboard, CPU and RAM with a similar system I had around (Gigabyte Q87M-D2H with i5-4570 and 16GB RAM non-ECC), but the reboots were still present.

I always kept the install up-to date, especially when a new kernel was available, but that did not seem to make any difference. Finally I upgraded to Proxmox 5, but again, the reboots are there.

I don't use ZFS. I tried with a "stock" install, then I used a custom install on top of Debian 9 with no LVM. Same problem.

It's not a low memory issue either, as I have another system with a Celeron J1900 and 8GB of RAM that hosts 3 VMs (pfsense, RockStor, a Win10 install for a Security Cam DVR software) that has been up for two years (reboots only for kernel upgrade), and it does not have random reboot problems.
 
Sorry @Glock24, but the problem we were all having involved ZFS. Yours is something different, and you should probably open a new thread.

What sort of storage system are you using? Same disk(s), RAID controller, or other aspects? What sort of error is it? Hard crash, kernel panic, or something else? SMART look good on the drive(s)?

And then last and least since It shouldn't happen, but maybe there's something in one of your VM's which is causing the crash.
 
Hi,

our proxmox 4 system has been running greatly stable for about 6 months
We update the BIOS from the Supermicro MB and the onboard LSI HW-Raid Controller and (the main sulution) install the intel-microcode:
Code:
 apt install intel-microcode
have a look at google: "intel bug microcode debian linux skylake kaby lake"
and also https://www.thomas-krenn.com/de/wik..._all_CPUs_entered_broadcast_exception_handler

In the next weeks I will upgrade to Proxmox 5 I think you should try that, too(?)

regards,
maxprox
 
Last edited:
@joshin I'm using an Intel SSD 320 80GB and 3 WD RED 2TB disks. The Intel 320 is used for the host system and has the root partition of the 2 VMs that I run. One VM runs a mail server with 2 of the 2TB disks in RAID 1 (mdadm) for /home. The other VM has the other 2TB disk and is used for backups. Both VMs run debian 8. All disks are healthy.

@ktecho the host system has no exposed services to the Internet, I doubt there are any attacks, but I'll install fail2ban just to be sure. One of the VMs runs a mail server, and it has always had fail2ban. That one gets bruteforce attacks all the time from spammers trying to steal credentials, but I highly doubt that would put a considerable load on the system as my Internet connection is only 10Mbit/s

@maxprox The host system did not have the intel-microcode package installed, I'll install it and see if that fixes the reboots. Both motherboards I used have up-to-date BIOSes, but are not affected by that Intel big as they are both Haswell chips, not Skylake nor Kabylake. I upgraded to Proxmox 5 hoping the reboots would go away but they persist. I'll report back after the microde package is installed.
 
Hello,

Since my last post I did the following:

- Configure fail2ban in the host system
- Install the intel-microcode package
- Set vm.swappiness = 1

After this the server ran without issued for two days, then yesterday there was a reboot and now this morning it's rebooting constantly, exactly 10 minutes after it boots it reboots again, all in a row. After that the reboots are constan but again at random intervals between reboots.

There's nothing in the fail2ban logs, there are no suspicious cron jobs. I found this in the syslog before the last reboot:

Code:
Sep  5 10:04:47 pve1 ntpd[958]: receive: Unexpected origin timestamp 0xdd5949a0.2e300ce1 does not match aorg 0000000000.00000000 from server@64.6.144.6 xmt 0xdd59499f.481df356

and then this

Code:
Sep  5 10:10:40 pve1 rrdcached[936]: queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-vm/100) failed with status -1. (/var/lib/rrdcached/db/pve2-vm/100: illegal attempt to update using time 1504627701 when last update time is 1504627750 (minimum one second step))

Any ideas?
 
Hello,

I did have random lockups in a hosted server that stopped happening after I disabled swap on ZFS (proxmox configures it like this by default)

I got yesterday a fresh new server and installed latest proxmox-ve_5.1-722cc488-1 installation went smotth
I then setup simple stress test with stress-ng and could reproduce the hangs with little effort, the moment the system needed to use swap space the hang would happen in less that a minute
Swap partition configuration didn't help (settings below and other variations tried):
zfs set primarycache=none rpool/swap
zfs set secondarycache=none rpool/swap
zfs set compression=off rpool/swap
zfs set sync=disabled rpool/swap
zfs set checksum=off rpool/swap

Disabling the swap did help and machine would not crash

I did try ZRAM configurations as an alternative to not using swap and in general the machine stood up, except when trying 50% RAM for ZRAM and stressing it a little above the memory limits

I can still use the machine for a few days for testing, I'm open to try something if you give me a hint

Update:
I applied these changes and it helped solve the issue:
https://forum.proxmox.com/threads/f...sts-during-high-io-on-host.30702/#post-159242
 
Last edited:
May to reaktivate this older topic:

My machine also suffered from random reboots for quite a long time. Exchanging hardware, even building a complete new machine and trying a lot of kernel versions didn't help me. The final solution was a hint for some members: When the RAM is crowded that the zfs doesn't get all the RAM it needs, then the system restarts without any note in logs or what ever. This seems to be some kind of "security feature", hoping that after the reboot the machine has enough RAM. Pretty strange and disgusting for me. So, since I reduced the ARC to a size that all VM's together plus ARC and some space for the OS will not become as big as the physical RAM on the machine is, both machines work well. Even very high workload of CPU and HDD are no problem any more.

The full topic discussed in German can be found here:
https://forum.proxmox.com/threads/zufällige-und-unkontrollierte-neustarts.33911/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!