Microserver Gen8 reboots under high load

OH24

New Member
Jun 1, 2017
15
6
3
46
Hey guys,

since the first Proxmox 4 release we've got problems with reboots, especially during nightly backup jobs and high load. Most of the used servers are the HP Microserver Gen8 version with Xeon E3-1220L v2, 16 GB RAM, 2x4TB HDD’s with ZFS Raid and newest available BIOS. These issues still exist with the newest 4.4 releases.
I know that there are a lot of „random reboot“ threads over the last year in this forum and we’ve tried a lot of suggestions, but couldn’t really figure out stable settings.

What we’ve tried:
- blacklisted hpwtd (since 4.1 (?) default setting anyway)
- disable HP ASR in BIOS
- different vm.swappiness settings in /etc/sysctl.conf
- disable swap
- different ZFS settings with zfs_arc_max/min


Is there anyone out there who could find out stable settings? Is it a ZFS/low memory problem, a special HP problem, both, anything else?
Thanks a lot.
 
Hey guys,

since the first Proxmox 4 release we've got problems with reboots, especially during nightly backup jobs and high load. Most of the used servers are the HP Microserver Gen8 version with Xeon E3-1220L v2, 16 GB RAM, 2x4TB HDD’s with ZFS Raid and newest available BIOS. These issues still exist with the newest 4.4 releases.
I know that there are a lot of „random reboot“ threads over the last year in this forum and we’ve tried a lot of suggestions, but couldn’t really figure out stable settings.

What we’ve tried:
- blacklisted hpwtd (since 4.1 (?) default setting anyway)
- disable HP ASR in BIOS
- different vm.swappiness settings in /etc/sysctl.conf
- disable swap
- different ZFS settings with zfs_arc_max/min


Is there anyone out there who could find out stable settings? Is it a ZFS/low memory problem, a special HP problem, both, anything else?
Thanks a lot.

16G of ram on a ZFS host seems low to me. How many VM's are on this host and how much ram do they have allocated?
 
Hey adamb,
it's different, cause we've got this setup for different customers with different amount of VM's and systems.
From one VM with 4GB RAM (Windows) to 3 Linux VM's with total 7GB RAM to 6 Linux VM's with a total amount of 11GB RAM.
 
Hey adamb,
it's different, cause we've got this setup for different customers with different amount of VM's and systems.
From one VM with 4GB RAM (Windows) to 3 Linux VM's with total 7GB RAM to 6 Linux VM's with a total amount of 11GB RAM.

Not running dedup correct? Does any of them reboot more often than others? Is it during any hard sequential IO?

We run alot of zfs here and 99% of the time we have issues its due to memory. ZFS is just so hungry sometimes.
 
Not running dedup correct? Does any of them reboot more often than others? Is it during any hard sequential IO?

We run alot of zfs here and 99% of the time we have issues its due to memory. ZFS is just so hungry sometimes.

How to check if dedup is running correct? Yeah could be that the servers with more VM's and more RAM usage of the VM's rebooting more often, especially seen after backup tasks.
So are there settings for zfs you are using for more stability? Any advice of how much RAM in percent should max be distributed to the VM's?
 
How to check if dedup is running correct? Yeah could be that the servers with more VM's and more RAM usage of the VM's rebooting more often, especially seen after backup tasks.
So are there settings for zfs you are using for more stability? Any advice of how much RAM in percent should max be distributed to the VM's?

This should show you if dedup is enabled.

zfs get all | grep dedup

When it does reboot/crash, anything being logged on the iLO?

Well that makes sense as a backup task is most likely sequential. Does the proxmox OS itself reside on ZFS to?

We don't run any VM environments on ZFS but we run alot of very large filesystem servers for backup purposes. Typically we end up just throwing more ram at them. Am I correct in thinking that you have roughly 4TB in total for zfs storage?
 
I had this behavior myself. The culprit in my case was zfs using up too much ram so that the hosts ran into timing problems and rebooted.
I solved this problem with the creation of an config file in /etc/modprobe.d/zfs.conf containing:
Code:
## in Byte, hier =
## 2GB=2147483648
## 4GB=4294967296
## 8GB=8589934592
## 12GB=12884901888
## 24GB=25769803776
options zfs zfs_arc_min=8589934592
options zfs zfs_arc_max=25769803776

Before that my hosts rebooted around 1 time per week due to high load while backupping. After that my hosts runs stable since 50+ days

P.S: If you work with an Cluster be sure that corosync isn't the culprit. if you backup too much over the same network sometimes the corosync messages could be delayed too much so that the node might think there's an error and tries to solve it via rebooting
 
Thanks guys for your answers!

No, dedup is "off".
rpool dedup off default
rpool/ROOT dedup off default
rpool/ROOT/pve-1 dedup off default
rpool/swap dedup off default

Found nothing helpful in iLO imo. But I will recheck. Yes Proxmox should be direct on the ZFS too. It's the default installation ZFS Raid1 during installation wizard. 4TB total ZFS strorage right. 16GB RAM is sadly the maximum possible amount of RAM in this Microserver.

No Cluster used for these systems. Is a # update-initramfs -u needed for the changes? Found that in another thread. I limit the options zfs zfs_arc_max to 4294967296 and we will see in the upcoming weeks. 4GB limit with total 7GB VM RAM should be enough with 16GB System RAM?
 
My usual baseline comment here would be, 'why on earth complicate a simple / fairly small server config with ZFS" ? You lack sufficient RAM, CPU, DISK {no SSD for ZFS_Cache_ZIL etc} to really use ZFS in the way that (I think?) it will be of the greatest benefit. I would do a stock install, no ZFS, ideally hardware raid but if not just linux SW raid 'custom install via debian install; then add proxmox after the fact' - and be done with it.

Then filesystem stays out of the way, it 'just works' and life goes on, nice and boring and reliable...

But I do know some people are highly attracted to the 'idea' of zfs.


Tim
 
Hey Tim,
thanks for your opinion, I will consider it for next small Proxmox setups without hardware RAID.
Sadly there's no MDADM option with Proxmox guided installation and I wanted a "clean" installation.

The zfs_arc_max parameter seems to stabilize the systems. Big thanks to you iffi so far!
 
Thanks for summarizing back to the thread - glad to hear that this was possible / to make your config more solid. I agree that mdadm linux raid via 'custom install' feels complex compared to any version of the 'appliance install of proxmox' - especially if you are not familiar with doing linux swraid installs. So - very glad to hear this config is now working out well.

Tim
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!