Installation: root + swap on SSD, data on ZFS

moose

Member
Nov 19, 2017
19
13
23
Bavarian Alps
Hi folks,

I'm running a PVE server for several years now w/o any problems.

Now the disk space for my LVM is running out and the HDDs are running for nearly 5 years now so I decided to replace them by bigger once and simultaneously switch to ZFS because of its advantages.

Because I want to keep the downtime as short as possible I'm looking forward for answers to my three questions about a ZFS related installation, especially the PVE installer before I start the rebuild of the system.

So far I read a couple of sources:

https://pve.proxmox.com/wiki/Storage:_ZFS
https://pve.proxmox.com/wiki/ZFS_on_Linux
https://pve.proxmox.com/wiki/ZFS:_Tips_and_Tricks
https://pve.proxmox.com/wiki/Installation


and these two boooks:

FreeBSD Mastery: ZFS
FreeBSD Mastery: Advanced ZFS


My thre questions are all concerned to the installation of ZFS using the PVE installer.

First let me explain what I want to do:

I have as system with one SSD and three HDDs. I'd like to install PVE in a way that it uses the SDD for root and swap and the HDDs as ZFS RAID-Z1 pool used for the VM disks only.

  1. My 1st questions is if this can all be done from within the PVE installer or if I do have to install PVE first completely to the SSD with the PVE installer and later then can add the HDDs from PVE GUI as ZFS RAID-Z1 (or by cmd line). But, in this case I think I'd have to remove the data part on my SSD (LVM) created by the PVE installer manually afterwards. I know it's possible to installl root and data on the same ZFS pool but that isn't what I want to do. I want to perform a 'splitted' installation as desribed above.

    And in https://pve.proxmox.com/wiki/System_Requirements I read

    OS storage: Use ... RAID ... or non-RAID with ZFS
    VM storage: For local storage, use ... ZFS and Ceph.


    So I think it should be possible to split the OS part and the VM storage part of PVE. But, how to perform this during the installation with PVE installer?

  2. My second question is related to the calculation of RAM used for/by ZFS.

    In https://pve.proxmox.com/wiki/ZFS_on_Linux I read:

    ZFS uses 50 % of the host memory for ARC by default. Allocating enough memory for the ARC is crucial for IO performance, so reduce it with caution. As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage.

    At another place (sadly currently don't know the exact location) I read about 4 GiB Base + 1 GiB/TiB-Storage.

    Sadly, I didn't find how the 1 GiB/TiB-Storage were calculated. Let's have an example: 3 disk with 4 TB will give 12 TB. But, if format as RADI-Z1 only 8 TB are available for storing data. And there is a recommendation not to fill it above 80%, i.e. 6.4 TB.

    So my question is if the 1 GiB/TiB-Storage are related to the 12 TB, 8 TB or 6.4 TB in this example.

  3. My third question is related to storing VM's disks on ZFS: is there an additional LVM (layer) build on top of a (one single) ZFS dataset or does PVE use a unique ZFS datasets for each VM disk?
Any hints/tipps/recommendations are greatly appreciated!

Have a nice weekend, guys!

Greetinx

moose
 
Hi folks,

I'm running a PVE server for several years now w/o any problems.

Now the disk space for my LVM is running out and the HDDs are running for nearly 5 years now so I decided to replace them by bigger once and simultaneously switch to ZFS because of its advantages.

Because I want to keep the downtime as short as possible I'm looking forward for answers to my three questions about a ZFS related installation, especially the PVE installer before I start the rebuild of the system.

So far I read a couple of sources:

https://pve.proxmox.com/wiki/Storage:_ZFS
https://pve.proxmox.com/wiki/ZFS_on_Linux
https://pve.proxmox.com/wiki/ZFS:_Tips_and_Tricks
https://pve.proxmox.com/wiki/Installation


and these two boooks:

FreeBSD Mastery: ZFS
FreeBSD Mastery: Advanced ZFS


My thre questions are all concerned to the installation of ZFS using the PVE installer.

First let me explain what I want to do:

I have as system with one SSD and three HDDs. I'd like to install PVE in a way that it uses the SDD for root and swap and the HDDs as ZFS RAID-Z1 pool used for the VM disks only.

  1. My 1st questions is if this can all be done from within the PVE installer or if I do have to install PVE first completely to the SSD with the PVE installer and later then can add the HDDs from PVE GUI as ZFS RAID-Z1 (or by cmd line). But, in this case I think I'd have to remove the data part on my SSD (LVM) created by the PVE installer manually afterwards. I know it's possible to installl root and data on the same ZFS pool but that isn't what I want to do. I want to perform a 'splitted' installation as desribed above.

    And in https://pve.proxmox.com/wiki/System_Requirements I read

    OS storage: Use ... RAID ... or non-RAID with ZFS
    VM storage: For local storage, use ... ZFS and Ceph.


    So I think it should be possible to split the OS part and the VM storage part of PVE. But, how to perform this during the installation with PVE installer?
First install PVE just to the SSD. You can create a ZFS pool with the HDDs later using CLI or WebUI. Root just needs about 16-32GB in case you don't want to store ISOs/templates/backups there, so there might be plenty of space left. In case you don't change the defaults in the installer (see "Advanced LVM Configuration Options" at https://pve.proxmox.com/wiki/Installation) most of your SSD will only be usable to store VMs/LXCs.
  1. My second question is related to the calculation of RAM used for/by ZFS.

    In https://pve.proxmox.com/wiki/ZFS_on_Linux I read:

    ZFS uses 50 % of the host memory for ARC by default. Allocating enough memory for the ARC is crucial for IO performance, so reduce it with caution. As a general rule of thumb, allocate at least 2 GiB Base + 1 GiB/TiB-Storage.

    At another place (sadly currently don't know the exact location) I read about 4 GiB Base + 1 GiB/TiB-Storage.

    Sadly, I didn't find how the 1 GiB/TiB-Storage were calculated. Let's have an example: 3 disk with 4 TB will give 12 TB. But, if format as RADI-Z1 only 8 TB are available for storing data. And there is a recommendation not to fill it above 80%, i.e. 6.4 TB.

    So my question is if the 1 GiB/TiB-Storage are related to the 12 TB, 8 TB or 6.4 TB in this example.
Usually its refering to "raw capacity", which is your 12TB. So 4 GB RAM + 12 GB RAM = 16 GB RAM. But this is just a rule of thumb. The more RAM you allow ZFS to use, the faster your pool will be. It will probably also work fine with just 8GB RAM for the ARC. But you can limit the ARC size like described here at any time: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage
You could start with 16GB RAM for the ARC and then run arc_summary to monitor your ARC hit rates and available dnode/metadata cache sizes. Then decrease the ARC size until you see that the hitrates suddenly drop or dnode/metadata caches run out of space.
  1. My third question is related to storing VM's disks on ZFS: is there an additional LVM (layer) build on top of a (one single) ZFS dataset or does PVE use a unique ZFS datasets for each VM disk?
There is no LVM. VMs will use zvols and not datasets. When using raidz1 with zvols don't forget to increase the volblocksize before creating your first VM. Otherwise you will waste alot of capacity due to padding overhead. With a 3 disks in raidz1 you probably need to increase the default volblocksize from 8K to 16K (Datacenter -> Storage -> YourZFSPool -> Edit -> Block Size: 16K). Otherweise you should loose 33% of your RAW capacity to parity + 17% of your raw capacity to padding overhead.
I would recommend to read this: https://www.delphix.com/blog/delphi...or-how-i-learned-stop-worrying-and-love-raidz
 
Last edited:
  • Like
Reactions: moose
Hello @Dunuin,

many thanks for your gorgeous support: extremly fast (despite to weekend), highly qualified and rich in detail! I think many comanies offering a payed 4th level platinum support can learn a lot from you ;)

Thanks also for your additional hints related to partioning the SSD and zvol block size which are also extremly worthfull for me. And I'll follow your recommendation and will read the article you mentioned!

And yes, you're totally right, dataset is the false expression, zvol is the right one! I mixed up both although meaning the same thing like you, a block device for a PVE VM disk. As far as I know ZFS has (at least) types of datasets: filesystems, volumes, snapshots, clones, and bookmarks. So I wasn't precise enough in my enunciation - sorry!

Again, a big THANK YOU, @Dunuin!

Have a nice evening,

moose
 
  • Like
Reactions: Spirog and Dunuin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!