HDSIZE of NVMe system disk during PVE installation with ZFS

Garry

Member
Jan 31, 2022
38
0
6
57
Hello All,

I have two NVMe M.2 SSD 256 Gb disks in my server and want to use them as a system mirrored disk.
To do mirror I want to try via ZFS RAID1.

I am a newbie in Linux and ZFS, so I doubt - what size of disk I should set while PVE installation?
I mean the PVE installation's stage "Hard disk options".
I can give PVE all 256Gb (entire disk) or less - for example, give PVE only 156 Gb and rest 100 Gb for smth further purpose.
What do you recommend?

Also I need advise for setting of advanced options
  1. ashift
  2. compress
  3. checksum
  4. copies
  5. hdsize (this is what i was asking at the beginning)
 
Hello All,

I have two NVMe M.2 SSD 256 Gb disks in my server and want to use them as a system mirrored disk.
To do mirror I want to try via ZFS RAID1.

I am a newbie in Linux and ZFS, so I doubt - what size of disk I should set while PVE installation?
I mean the PVE installation's stage "Hard disk options".
I can give PVE all 256Gb (entire disk) or less - for example, give PVE only 156 Gb and rest 100 Gb for smth further purpose.
What do you recommend?
Your pool will be thin provisioned and it can store VMs/LXCs and normal files. So I don't see a point not to use the full capacity.
Also I need advise for setting of advanced options
  1. ashift
  2. compress
  3. checksum
  4. copies
  5. hdsize (this is what i was asking at the beginning)
Defaults should be fine.

When using ZFS keep in mind that 20% of your pools capacity should be kept free or it will get slow. So if you use 250GB for your ZFS pool you shouldn't store more than 200GB of data on it.
 
So I don't see a point not to use the full capacity.
So what capacity of 256Gb disk I should use?

Defaults should be fine.

Defaults hdsize is the entire disk (as I remember).

When using ZFS keep in mind that 20% of your pools capacity should be kept free or it will get slow. So if you use 250GB for your ZFS pool you shouldn't store more than 200GB of data on it.
Is any option to keep 20% of pool capacity free automatically or I should watch it manually?
 
So what capacity of 256Gb disk I should use?
Defaults hdsize is the entire disk (as I remember).
Entire disk.
Is any option to keep 20% of pool capacity free automatically or I should watch it manually?
After filling it up more then 80% the pool will become slower and slower and above 90% the pool will switch into panic mode making it even more slow and if the pool gets completely filled up it will stop working.
I personally set a quota of 90% of the pools size, so no matter what happens 10% will always be free and can't even be filled up by accident. In addition to that I monitor the usage and as soon as it exceeds 80% of the total pool capacity (which will be shown as 89% because with a 90% quota ZFS will show you this 90% of the total capacity as 100%) I look what I can remove to bring it below 80% again.
 
May be better to leave some free space at SSD?
Why wasting capacity? I would only leave some space unpartitioned if you already now what you want to partition it with after installation.
Also, I read it is not recommended to use ZFS pool for swap.
For example, we will set swap on that unpartitioned space
Thats right. But proxmox will already create several partitions on that SSD and only one partition is used for ZFS. Not sure if the PVE installer will already create a swap partition for you. When choosing LVM it will do. If you want to make sure you could leave some GB free for that. You didn'T told us how much RAM you got, but in general 2-8GB for SWAP should be way enough.
How to do this?
Look at the quota commands: https://docs.oracle.com/cd/E23824_01/html/821-1448/gazvb.html
 
Last edited:
You didn'T told us how much RAM you got, but in general 2-8GB for SWAP should be way enough.
Now I have 16 Gb of RAM.
But in future, I should add disks to the host and make storage for data. The capacity of the storage will rise gradually and reach maybe about 15-20Tb (now I have for it 1 3Tb SATA disk only). If I will use ZFS for this storage too, RAM should be increased.
How much RAM is necessary for 20 Gb ZFS storage works fast enough?

Also, as I know it is not recommended to fill SDD entire cause it will work slow (at the physical level)
 
Now I have 16 Gb of RAM.
But in future, I should add disks to the host and make storage for data. The capacity of the storage will rise gradually and reach maybe about 15-20Tb (now I have for it 1 3Tb SATA disk only). If I will use ZFS for this storage too, RAM should be increased.
How much RAM is necessary for 20 Gb ZFS storage works fast enough?
Hard to tell, you need to test that. I would say 8GB should be fine.
Also, as I know it is not recommended to fill SDD entire cause it will work slow (at the physical level)
That depends on the SSD. Thats true for consumer SSDs where empty space is used for SLC caching to compensate the slow QLC/TLC NAND used and the spare area isn't that big. But enterprise SSDs should perform better when full.
But all SSDs got garbage collection and TRIM so partitioned but empty space will be fine. You don'T need that space to be unallocated.
 
Hard to tell, you need to test that. I would say 8GB should be fine.

That depends on the SSD. Thats true for consumer SSDs where empty space is used for SLC caching to compensate the slow QLC/TLC NAND used and the spare area isn't that big. But enterprise SSDs should perform better when full.
But all SSDs got garbage collection and TRIM so partitioned but empty space will be fine. You don'T need that space to be unallocated.

I have a simple consumer NVMe M,2 disks

Do I understand correctly that you recommend specifying in the installation the capacity of the entire disk minus 8GB for the future swap partition? So Hdsize = 256-8 = 248 GB?
 
I have a simple consumer NVMe M,2 disks

Do I understand correctly that you recommend specifying in the installation the capacity of the entire disk minus 8GB for the future swap partition? So Hdsize = 256-8 = 248 GB?
If you just want SWAP, yes.
 
Linux doesn't require swap but its useful to have a little bit to prevent processes getting killed by the OOM killer in case you run out of RAM. I usually set the swappiness very low so swap is really only used to prevent OOM but not to free up RAM in normal operation so the SSDs life longer. If you only got 16GB of RAM 2GB swap would be totally sufficient for that.
 
Entire disk.

During the installation process, the local storage is automatically created, which is located on the same partition as the operating system itself and is mounted at /var/lib/vz. It is intended for storing disk images, virtual machines and containers.

I have read the recommendation not to occupy this local storage. It is necessary to leave the necessary space there for the normal operation of the OS, and for virtual machines create separate storage in the unoccupied area of the hard drives, which must be left during installation, either on a separate hard drive or a raid array.

What do you think about this option?

And the question that follows from here is - if we what to do this option, then what is the minimum volume to allocate for the PVE installer? (in order to subsequently deploy storage for virtual machines on the unallocated volume of system disk(s))
 
During the installation process, the local storage is automatically created, which is located on the same partition as the operating system itself and is mounted at /var/lib/vz. It is intended for storing disk images, virtual machines and containers.

I have read the recommendation not to occupy this local storage. It is necessary to leave the necessary space there for the normal operation of the OS, and for virtual machines create separate storage in the unoccupied area of the hard drives, which must be left during installation, either on a separate hard drive or a raid array.

What do you think about this option?

And the question that follows from here is - if we what to do this option, then what is the minimum volume to allocate for the PVE installer? (in order to subsequently deploy storage for virtual machines on the unallocated volume of system disk(s))
You are using ZFS which is thin provisioned. If you got a 250GB ZFS partition these 250GB will be shared by the dataset that stores your root fs (proxmox OS) and the dataset that stores your guests. No need to create empty areas or other partitions. A dedicated pool on dedicated disks for your guests is always nice to have but really isn't needed.
 
You are using ZFS which is thin provisioned. If you got a 250GB ZFS partition these 250GB will be shared by the dataset that stores your root fs (proxmox OS) and the dataset that stores your guests. No need to create empty areas or other partitions. A dedicated pool on dedicated disks for your guests is always nice to have but really isn't needed.
I think it option can prevent from PVE host stopping to response since VMs full system disk too much.

My 256Gb SSD disk = 238 Gib
I have defined HDsize = 230 Gib during PVE installation in ZFS advanced options.
(I want to leave about 8GiB unallocated just in case).

And now I have this situation with disk's sizes:

Code:
root@pve:~# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0 238.5G  0 disk
├─nvme0n1p1 259:1    0  1007K  0 part
├─nvme0n1p2 259:2    0   512M  0 part
└─nvme0n1p3 259:3    0 229.5G  0 part
nvme1n1     259:4    0 238.5G  0 disk
├─nvme1n1p1 259:5    0  1007K  0 part
├─nvme1n1p2 259:6    0   512M  0 part
└─nvme1n1p3 259:7    0 229.5G  0 part

available disk space:
Code:
root@pve:~# df -h
Filesystem        Size  Used Avail Use% Mounted on
udev              7.8G     0  7.8G   0% /dev
tmpfs             1.6G  1.3M  1.6G   1% /run
rpool/ROOT/pve-1  221G  2.8G  219G   2% /
tmpfs             7.8G   49M  7.8G   1% /dev/shm
tmpfs             5.0M     0  5.0M   0% /run/lock
rpool             219G  128K  219G   1% /rpool
rpool/ROOT        219G  128K  219G   1% /rpool/ROOT
rpool/data        219G  128K  219G   1% /rpool/data
/dev/fuse         128M   16K  128M   1% /etc/pve
tmpfs             1.6G     0  1.6G   0% /run/user/0

Code:
root@pve:~# fdisk -l
Disk /dev/nvme0n1: 238.47 GiB, 256060514304 bytes, 500118192 sectors
Disk model: NE-256                                 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: F73F4ECD-7205-47DA-A77F-9E91E960F91F

Device           Start       End   Sectors   Size Type
/dev/nvme0n1p1      34      2047      2014  1007K BIOS boot
/dev/nvme0n1p2    2048   1050623   1048576   512M EFI System
/dev/nvme0n1p3 1050624 482344960 481294337 229.5G Solaris /usr & Apple ZFS
....

As I understand - there is no SWAP partition was created?
 
I think it option can prevent from PVE host stopping to response since VMs full system disk too much.
If you fear your VMs will completely fill up your pool so the PVE OS will run out of space you can use ZFS to create a quota. For example zfs set quota=164G rpool/data and zfs set quota=20G rpool/root so VMs can't use more than 164GB and root can't use more than 20GB. 20% of your pool than always should be free for best ZFS performance.
As I understand - there is no SWAP partition was created?
Jup, looks like there is no swap. So you can use fdisk or parted to create one (or two, one on each disk) yourself and then add it as a swap partition using fstab.
 
Last edited:
I'm not entirely clear about the structure of the ZFS file system.

As I see ZFS is at /dev/nvme0n1p3 + /dev/nvme1n1p3 only, and it works there on mirror.

Are /dev/nvme0n1p1 + /dev/nvme1n1p1 (BIOS boot partition) and /dev/nvme0n1p2 + /dev/nvme1n1p2 (EFI partition) work on mirror too?
Are that partitions protected from one of the disks failure?

If you fear your VMs will completely fill up your pool so the PVE OS will run out of space you can use ZFS to create a quota. For example zfs set quota=164G rpool/data and zfs set quota=20G rpool/root so VMs can't use more than 164GB and root can't use more than 20GB. 20% of your pool than always should be free for best ZFS performance.

How did you calculate 164G and 20G digits?

I see that rpool/root and rpool/data have the same size as 219G.

Jup, looks like there is no swap. So you can use fdisk or parted to create one (or two, one on each disk) yourself and then add it as a swap partition using fstab.
Since the PVE installer didn't create the swap automatically, does it mean that we don't have to have a swap in PVE?
In what cases is it recommended to make a swap for the system?
 
I'm not entirely clear about the structure of the ZFS file system.

As I see ZFS is at /dev/nvme0n1p3 + /dev/nvme1n1p3 only, and it works there on mirror.

Are /dev/nvme0n1p1 + /dev/nvme1n1p1 (BIOS boot partition) and /dev/nvme0n1p2 + /dev/nvme1n1p2 (EFI partition) work on mirror too?
Are that partitions protected from one of the disks failure?
They aren't mirrored. These are single partitions that are kept in sync by the proxmox-boot-tool. If you ever need to replace a disk you need to clone the partition table from the healthy to the new disk, tell the proxmox-boot-tool to sync over the bootloader and only then tell ZFS to replace the failed ZFS partition. Its explained in the wiki. But as both disks contain the same bootloaders your server still should be able to boot if either of the disks fails.
How did you calculate 164G and 20G digits?
I see that rpool/root and rpool/data have the same size as 219G
ZFS pools shouldn't be filled up more than 80% because the more you are over 80% the slower your pool will get until if finally switches into panic mode if you reach 90% where it gets even slower until the pool finally fails because it is using copy-on-write and therefore always needs alot of empty space for operation.
I calculated with your 238GB. If your ZFS partition is only 219GB and 20% should be kept free you only got 175GB for actual data. So if you want to reserve 20GB for PVE + ISO/Templates (here PVE uses 10GB right know without any ISOs or templates) there would be 155GB for guests.
Since the PVE installer didn't create the swap automatically, does it mean that we don't have to have a swap in PVE?
In what cases is it recommended to make a swap for the system?
See my previous answer:
Linux doesn't require swap but its useful to have a little bit to prevent processes getting killed by the OOM killer in case you run out of RAM. I usually set the swappiness very low so swap is really only used to prevent OOM but not to free up RAM in normal operation so the SSDs life longer. If you only got 16GB of RAM 2GB swap would be totally sufficient for that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!