[SOLVED] ZFS swap crashes system

LnxBil

Famous Member
Feb 21, 2015
5,549
630
133
Germany
Hi all,

I encountered stability problems with swapping on ZFS (swap on zvol). It panics repeatedly in minutes with alternating errors on 4.2.6-1-pve
  • swapper/5: page allocation failure
  • General Protection fault (multiple times)
  • Blocked process kswapd
Using e.g. exclusively zram or another non-ZFS-based block device works like a charm.

Did someone encounter this problem too?

Best,
LnxBil
 

windinternet

Member
Oct 8, 2015
159
9
18
@LnxBill,

Yes, I can definitely say we have seen problems with swap on ZFS zvol in Proxmox 4.0. Also reproduced it on a virtual machine with the 4.2.6-1-pve kernel.

On 4.0 machines we are seeing sudden reboots, that are absent when we turn swap off. Sometimes these reboots are preceded by page allocation failures in the syslog or SLUB debugging messages with the 4.2.6-1-pve build, but usually it just blinks off.

A bit of testing on a virtual machine leads us to think that the stability can be improved by making the settings below.
I must stress however that this is ONLY an indication based on testing on a virtual machine. It did NOT see implementation on a production machine yet.

Code:
zfs set primarycache=none rpool/swap
zfs set secondarycache=none rpool/swap
zfs set compression=off rpool/swap
zfs set sync=disabled rpool/swap
zfs set checksum=off rpool/swap

Regards,
Gerrit
 
  • Like
Reactions: chrone

Nemesiz

Well-Known Member
Jan 16, 2009
678
47
48
Lithuania
Code:
zfs set primarycache=none rpool/swap
zfs set secondarycache=none rpool/swap
zfs set compression=off rpool/swap
zfs set sync=disabled rpool/swap
zfs set checksum=off rpool/swap

I suggest to use

Code:
zfs set primarycache=metadata rpool/swap
zfs set secondarycache=metadata rpool/swap
zfs set compression=off rpool/swap
zfs set sync=disabled rpool/swap
zfs set checksum=on rpool/swap
 

Erk

Active Member
Dec 11, 2009
165
4
38
I put vm.swappiness = 0 in /etc/sysctl.conf on all my machines so it doesn't try and use swap unless it's really out of RAM
 

windinternet

Member
Oct 8, 2015
159
9
18
Some further info:

I can get the 2.6.32 kernel to hang also if I start up 9 concurrent untars. With all ZFS features disabled on the swapping ZVOL on the 4.2 kernels it seems about as stable as the 2.6 kernel (with features enabled). I do not think you lose anything by disabling those features on a swapping volume. Checksumming swap pages is already present in the kernel itself, the other features seem superfluous for swap.

@Erk, disabling swapping avoids the crashes, but it robs the kernel of the opportunity to remove dead wood pages from RAM thereby robbing your system of performance. Not immediately, but after a while, when swapping would set in normally, this can be days, even a week.
 

windinternet

Member
Oct 8, 2015
159
9
18
IMHO, the *best* thing would be if the Proxmox installer would just reserve a swap partition at the start of disk and keep it outside of ZFS. Putting a filesystem between memory and swap disk is risky.
 
  • Like
Reactions: chrone

Erk

Active Member
Dec 11, 2009
165
4
38
IMHO, the *best* thing would be if the Proxmox installer would just reserve a swap partition at the start of disk and keep it outside of ZFS. Putting a filesystem between memory and swap disk is risky.

Not really, people have been using swapfiles for decades, typically when they realize they didn't allocate sufficient space when the drive was partitioned. Mac OS X uses swapfiles in /var/vm.
 

windinternet

Member
Oct 8, 2015
159
9
18
@Erk,

Usually it is OK and works correct, but only as long as your swapfile is indeed contiguous and fully block allocated. From the 2.6 kernel Linux will record the blocks used at time of swapon and try to bypass the filesystem. However as the differences in stability by enabling/disabling zfs features on the ZVOL pretty much prove, this is not the case with the ZVOL entirely, because probably it is also active at the block level.
 

LnxBil

Famous Member
Feb 21, 2015
5,549
630
133
Germany
Anyone using zram for swapping? I played around with it and deploying it now for all my VMs. I use a package from ubuntu for this (zram-config) and it works flawless for Jessie-based VMs (no need to backport it, just install)
 

windinternet

Member
Oct 8, 2015
159
9
18
No experience, but from what I read about it, it should be great if you have a lot of assigned RAM hanging around in VMs that are hardly used.
 

iffi

New Member
Feb 3, 2016
7
0
1
32
First a big thanks, i thought i was the only one with these sudden crashes and searched for an hardware issue.
But i've got a question:

this thread is marked as solved, but that exactly solves the problem?
  • Using an dedicated swap partition only avoids the problem
  • Disabling the swap partition...you can't be seriously call this an solution...
  • Do the settings from post #3 solve the Problem?
 

LnxBil

Famous Member
Feb 21, 2015
5,549
630
133
Germany
Disabling the swap partition...you can't be seriously call this an solution...

Why? Hopefully, your server is never swapping heavily. I'm going with zram-config from Ubuntu which adds a compressed, ram-based swap (zram) or I use a dedicated non-zfs system installation and only zfs for the data disks (mostly in external shelves).
 

windinternet

Member
Oct 8, 2015
159
9
18
If I look at servers that were running for a long time, they usually acquire about three quarters of a gigabyte of cruft that gets parked in SWAP and is never used. So, switching of swapping is indeed not completely without disadvantage, because over the course of a couple of days or weeks it will lead to a slightly decreasing amount of available RAM. Compared to the usual amounts of RAM, this does not amount to a large sacrifice.

Using another swap partition or zram completely solves the problem for swap. It would be nice, if the installer did that itself.
 

windinternet

Member
Oct 8, 2015
159
9
18
An important note on running with SWAP disabled. It may lead to unexpected OOM kills.

I can't absolutely positively confirm this, but based on a recent experience it seems as if Linux runs with an implicit vm.overcommit_memory=2 setting if you turn SWAP completely off. And because the default vm.overcommit_ratio is 50%, Linux refuses to commit more then 50% of physical memory outside of the kernel or the ZFS cache. The memory policy showed as 0, but the OOM killer got invoked at 50% committed, while ARC cache was sitting on the other 50% of memory.

Possibly it may have been the ARC cache not giving up memory quickly enough, but the CommitLimit in /proc/meminfo actually showed 50% of physical memory as the limit.
 

deludi

New Member
Oct 14, 2013
26
0
1
Hi all,

Yesterday i spent the whole day with random crashes from a newly installed proxmox 4.1 hypervisor (xeon e3-1245 v2, 16GB, 2 x 480GB Samsung enterprise ssd in raid 1). The machine was installed a couple of weeks earlier and had no problems when the vms where residing on the local zfs raid 1 storage).
Yesterday morning i did the zfs settings for creating a local zfs volume because we want to use pve-zsync for snapshotting.
I did the following settings:

1. created a zfs volume from prompt
2. added the zfs volume from gui
3. set the max zfs use to 2GB in zfs.conf (zfs_arc_max=)
4. set swappiness to 10

extra:

a. Downloaded the pve-zsync .deb and installed the package manually with "dpkg -i packagename"
This because the usage of the non-subscription repo is not advised.

b. I set up the email address in zed.rc

Strange thing is that i have another proxmox 4.1 machine with almost the same config running without issues.
The only difference is the zfs mem size on that hypervisor is set to min 2GB and max 4 GB.
That machine only has 1 Win 7 vm with 8GB allocated (actual mem usage of the vm is between 3 and 4GB in windows).

I read that setting the swappiness to 0 is not advised.
Is anyone already running with the settings:

zfs set primarycache=metadata rpool/swap
zfs set secondarycache=metadata rpool/swap
zfs set compression=off rpool/swap
zfs set sync=disabled rpool/swap
zfs set checksum=on rpool/swap


The hypervisors only have 2 SSD's on sata600 ports (the other 4 are sata300 ports), so i prefer a software solution and not adding 2 extra ssd's.

Thank you for your input.
 
Last edited:

windinternet

Member
Oct 8, 2015
159
9
18
Do you run the machine with SWAP on ZFS? That will be unstable under low memory even if you turn off every feature on rpool/swap. It just became more pronounced on newer kernels. The arc_max is not a hard limit, because then machines with ZFS would be prone to complete lockups on heavy IO, it is just a best effort limiting of the cache.

Because the swap is a ZVOL, the same might apply to any ZVOLs used for any purpose under low memory. I am interested to see if you would still have problems if you disabled swap (but take care of the vm.overcommit_ratio as explained earlier).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!