Hello everyone,
Using swap on ZFS is bad, everyone knows that. But how bad? We can put this to the test!
Let's suppose that PVE node has 32GB of RAM.
Let's suppose that no VM is running on it. If you ARC is free you have plenty of memory available (30GB-31GB).
Install stress tool (apt install stress).
Create unprivileged user and log into his account.
Run stress command to allocate little more memory than is available, this should invoke swapping pages.
$ stress -m 4 --vm-keep --vm-bytes 7700M
And now interesting part: After few seconds you will see machine hard reset. Reset will be without ANYTHING in log files / console etc.
For me this is serious security problem. Anyone can allocate a lot of memory fast. This should lead to swapping and OOM kills - not unexpected host resets!
A was able to invoke this issue on few different hosts. (some of them were in VM). I was using clean 5.2 Proxmox from ISO install or machine was upgraded to recent public version (free repo).
ARC size – seems to be irrelevant (here limited to 8GB)
ZFS group tx timeout was set to 2s/5s
Swap on zfs (rpool/swap) was tested with sync always/disabled, compression zle/off
Swappiness - 0 / 10 / 60
Main problem is that if You want to use ZFS or soft RAID you are forced to use this filesystem. Setup will always create swap on rpool/swap. So default configuration seems to be very very dangerous!
If You turn of swap, host will survive (is some way - OOM). When swap is on regular disk everything works as expected.
Maybe someone know how to tune swap on ZFS to avoid such crashes?
Is there a way to move swap partition outside rpool and put it directly on disks like boot partitions? (I know that ZFS like to have exclusive access but this idea is worth trying)
Using swap on ZFS is bad, everyone knows that. But how bad? We can put this to the test!
Let's suppose that PVE node has 32GB of RAM.
Let's suppose that no VM is running on it. If you ARC is free you have plenty of memory available (30GB-31GB).
Install stress tool (apt install stress).
Create unprivileged user and log into his account.
Run stress command to allocate little more memory than is available, this should invoke swapping pages.
$ stress -m 4 --vm-keep --vm-bytes 7700M
And now interesting part: After few seconds you will see machine hard reset. Reset will be without ANYTHING in log files / console etc.
For me this is serious security problem. Anyone can allocate a lot of memory fast. This should lead to swapping and OOM kills - not unexpected host resets!
A was able to invoke this issue on few different hosts. (some of them were in VM). I was using clean 5.2 Proxmox from ISO install or machine was upgraded to recent public version (free repo).
ARC size – seems to be irrelevant (here limited to 8GB)
ZFS group tx timeout was set to 2s/5s
Swap on zfs (rpool/swap) was tested with sync always/disabled, compression zle/off
Swappiness - 0 / 10 / 60
Main problem is that if You want to use ZFS or soft RAID you are forced to use this filesystem. Setup will always create swap on rpool/swap. So default configuration seems to be very very dangerous!
If You turn of swap, host will survive (is some way - OOM). When swap is on regular disk everything works as expected.
Maybe someone know how to tune swap on ZFS to avoid such crashes?
Is there a way to move swap partition outside rpool and put it directly on disks like boot partitions? (I know that ZFS like to have exclusive access but this idea is worth trying)