Official sollution for SWAP in PM 5.* and ZFS?

Discussion in 'Proxmox VE: Installation and configuration' started by mailinglists, Jan 7, 2019.

  1. mailinglists

    mailinglists Active Member

    Joined:
    Mar 14, 2012
    Messages:
    356
    Likes Received:
    33
    Hi,

    now that we are aware of the problem with SWAP in ZVOLs, I wonder what is the official and supported way to get SWAP in PM 5.3+ when using ZFS?

    I have installed a few nodes with 5.2 and 5.1 and have SWAP on ZVOLS.
    Should I disable swap there? According to 5.3 installer I should.
    Should I create SW RAID 1 partition with MDADM and put SWAP there?
    But that configuration will not be officially supported anymore, right?
     
  2. mailinglists

    mailinglists Active Member

    Joined:
    Mar 14, 2012
    Messages:
    356
    Likes Received:
    33
    Or we could use mirrored LVM devices..

    I really want an official oppinion how to do it in this use case.
     
  3. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,695
    Likes Received:
    329
    Why do you want to have SWAP on disks at all? You can use zram if you really need swap space. Faster and does not require disk space.
     
  4. mailinglists

    mailinglists Active Member

    Joined:
    Mar 14, 2012
    Messages:
    356
    Likes Received:
    33
    Will look into zram. Thank you for the hint.

    I need swap space for the same reasons we have always used SWAP and for the same reasons SWAP was invented.
    Basically, when there is high memory pressure (for various reasons, be it we can not add more RAM, app problems, etc) and we want to avoid invoking kernel OOM killing logic. While SWAPping drastically reduces the speed, at least it allows us to shut down gracefully. Also seldomly used parts of memory can get SWAPped out and we can use RAM where it's needed most.

    I still wonder what is the official position on the matter.
    All questions i asked still stand unanswered.
    Hopefully we will get an official response. :)
     
  5. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,695
    Likes Received:
    329
    Sure, but you most certainly don't want to have a ZFS without any ARC (no RAM).
     
  6. mailinglists

    mailinglists Active Member

    Joined:
    Mar 14, 2012
    Messages:
    356
    Likes Received:
    33
    Of course not, but I do not understand what are you hinting at with your comment.
     
  7. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,695
    Likes Received:
    329
    Normally, your ARC will will be purged until the minimum is reached before you're going to swap, therefore you will loose ZFS performance before actually swapping out. This means that if you swap, you will have already reduced ZFS performance. If you then swap to ZFS, it'll be even slower.

    You can test this by monitoring your arc and starting memory intensive applications.
     
  8. mailinglists

    mailinglists Active Member

    Joined:
    Mar 14, 2012
    Messages:
    356
    Likes Received:
    33
    Now I understand what you are saying.
    I always set zfs_arc_min and max, so I know how much ZFS will use and how performant it will be.
    SWAP is sometimes (depends on sysctl swappness) populated for less used pages even before system runs out of memory in RAM.

    Anyway, we went a bit off topic, and official answer is still needed.
    I wonder why they ignore the question...

    In the mean time, I will start by enabling swap on MD RAID mirrored devices on new install, and probably remove SWAP from ZVOL on all older installs, replacing them with the same MD RAID. Luckily I have SSD disks for cache and log, which have lots of space free and will jut put SWAP there.
     
  9. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,198
    Likes Received:
    102
    Giving an "official" recommendation for handling Swap with ZFS depends on your needs and acceptance of potential downtime:

    * You can install enough (ECC-)memory in your system, which, given that ZFS-performance is very tightly linked to having enough memory available is generally a good idea. However for some users this comes at a prohibitive cost.

    * You can create swapspace directly on a blockdevice/partition and use that (the installer has been adapted to make it possible to leave empty space on a disk). The downside here is that you lose the swapspace and the contained data, if the disk, which it resides on breaks - which most likely will lead to a crash/downtime. I personally would probably use a fast Enterprise SSD and monitor its wearout, and live with the risk of a downtime when it fails

    So YMMV as to the best approach w.r.t. swapping in general (apart from trying to avoid it if possible at all).

    Apart from that it happens that best-practices change (e.g. not using ZVOLs for swap anymore, because of ZOL codechanges) - and posts giving official recommendations still get quoted a lot despite being completely out of date ;)

    Hope that helps!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  10. mailinglists

    mailinglists Active Member

    Joined:
    Mar 14, 2012
    Messages:
    356
    Likes Received:
    33
    This is what i just did:
    Code:
    root@p28:/var/log# cat /etc/fstab | grep swap
    /dev/md/swap none swap sw 0 0
    root@p28:/var/log# cat /proc/mdstat
    Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
    md127 : active raid1 sdd3[1] sdc3[0]
          8380416 blocks super 1.2 [2/2] [UU]
    
    Beforehand I installed mdadm. Sdc and sdd are Intel DC SSDs.

    This is how I avoid downtime in case one of the SSDs fails before wearing out.
    I could do it with LVM mirroring also.

    @Stoiko Ivanov
    Why would you not rather "officially" recommend putting SWAP on SW RAID mirror like I did or on mirrored LVM?
    Am I missing something and am on path to problems in future, because I am using SW RAID?
     
    #10 mailinglists, Jan 10, 2019
    Last edited: Jan 10, 2019
  11. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,695
    Likes Received:
    329
    That is easy: mdadm is official not supported. There are two RAID options supported: ZFS and hardware raid.
     
  12. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,198
    Likes Received:
    102
    This sums it up more or less.

    Using mdraid (mdadm)/lvm-mirror/dmraid adds yet another layer of complexity, which might lead to problems (see https://bugzilla.kernel.org/show_bug.cgi?id=99171#c5 for a case where this can lead to guest-disk corruption).
    and adds complexity when it comes to replacing a broken disk.

    Currently I'm not aware that this particular problem would affect using a mdraid as swap, however I didn't read all codepaths regarding swapping in the kernel.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  13. Kurgan

    Kurgan New Member

    Joined:
    Apr 27, 2018
    Messages:
    18
    Likes Received:
    1
    mailinglists likes this.
  14. mailinglists

    mailinglists Active Member

    Joined:
    Mar 14, 2012
    Messages:
    356
    Likes Received:
    33
    I know :-/, i battled with ZFS along the way and it works fine for us now.
    We needed replication and differential backups, which only ZFS brings.
     
  15. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,695
    Likes Received:
    329
    The thing is ... do your need an "officially supported" version? You wrote that you use mdadm for decades, so probably won't need any help with that. I personally don't care what is supported and what is not .. but this is my personal opinion. I'm able to fix my own bugs, and I assume you also do. As I wrote recently on your other thread, try to use it on a bigger machines (mainly more disks) and it'll grow on you. The feature outweight any possible slowliness by far.
     
  16. mailinglists

    mailinglists Active Member

    Joined:
    Mar 14, 2012
    Messages:
    356
    Likes Received:
    33
    LnxBil, Kurgan wrote that about using mdadm for decades, but it ti true for me also.
    I am happy with ZFS. :)
     
  17. Kurgan

    Kurgan New Member

    Joined:
    Apr 27, 2018
    Messages:
    18
    Likes Received:
    1
    I understand that, performance apart, you managed to "tame" ZFS so it does not crash the host when it eventually ends up eating all of the available RAM.

    What did you do?

    I mean, what's your tuning procedure for a freshly installed PVE with ZFS? (I assume you are using RAIDZ-1 as disk configuration)

    I have tried (in older versions of PVE)

    zfs set primarycache=metadata rpool/swap
    zfs set logbias=throughput rpool/swap

    These settings should be default as of PVE 5, if I remember correctly.

    I have also tried setting swappiness to 1 or to 0 on the host to prevent a (supposed) runaway condition where using swap causes ZFS to allocate more RAM thus making the issue worse (you try to swap out memory, and you end up using more RAM instead of less). This is an issue I have read about here on this forum.

    I have not yet tried to set options zfs zfs_arc_max and options zfs zfs_arc_min in /etc/modprobe.d/zfs.conf. I also don't know what values should be used (how to calculate them based on disk space and available RAM).
     
  18. LnxBil

    LnxBil Well-Known Member

    Joined:
    Feb 21, 2015
    Messages:
    3,695
    Likes Received:
    329
    Honestly: Nothing.

    I'm using ZFS on over 10 systems now ranging from a Raspberry PI over laptop/desktop systems, multiple LUKS-encrypted one-node PVE servers on the internet to two-figure TB pools all without any crashes in the last years due to some ZFS memory malfunction stuff. They're all rock solid and do not use all the PVE ZFS, but also the ZFS that is included in Debian Stretch and Stretch-Backports.

    On some systems I tune the zfs_arc_max and min settings to get MORE memory to ZFS, because "only" half is often not enough for a big ZFS-only system (without virtualization).

    One side question: Did you enable deduplication? The only time I experienced massive OOM was with enabled deduplication. Be aware that once enabled, it cannot be disabled completely - only by recreating the pool.
     
  19. Kurgan

    Kurgan New Member

    Joined:
    Apr 27, 2018
    Messages:
    18
    Likes Received:
    1
    I am really baffled. I have not enabled dedup. I have just installed PVE from its ISO image, setting up disks to use RAIDZ-1. I am running 5 servers, in 5 different environments. No clustering, no "fanciness" at all. Just simple single servers with local storage. On different hardware, with 16 to 64 GB RAM. All of the servers using ZFS experience issues with RAM management. The ones that use LVM on hardware RAID (installed from PVE ISO) , or md raid1 with simple ext4 (installed with plain Debian 9 and then PVE repositories) have never shown any issues at all with RAM management, OOM, etc.

    I have just checked on one of my servers and dedup is off, while compression is on. And this is by default PVE installation, because the only thing I have done is:

    zfs set primarycache=metadata rpool/swap
    zfs set logbias=throughput rpool/swap

    And I have done it following the OOM reboot issues, not before.
     
  20. mailinglists

    mailinglists Active Member

    Joined:
    Mar 14, 2012
    Messages:
    356
    Likes Received:
    33
    Kugan, ZFS is slow for us too when used on just a few HDDs (fast enough only with SSDs or many HDDs), but stable once ARC max is limited and when host does not SWAP too much, which can one setup using swappiness sysctl parameter. But I moved away from ZWAP on ZVOLs to SWAP on MDRAID, where I actually need SWAP.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice