Hello.
I have two similar machines (Intel 4 core CPU, 16GB RAM, with a 500G HDD boot drive, an 1T HDD and two 512G SSD on each machine) on a cluster.
Each system has both SSDs on a ZFS mirror and the single HDD on a ZFS single disk, both created using Proxmox webGUI.
Therefore, two ZFS storage pools on each node:
- Zpool name: hdd; consisting of a single 1T hard drive;
- Zpool name: ssd; consisting of two 512G solid state drives on a mirror.
In total I have 3 containers and 4 VMs running on the cluster.
The problem:
After enabling replication for a VM that had a 512G volume (VM uses 326G) assigned on the hdd zpool the volume jumped to 880G and apparently filled the hdd zpool.
I am having trouble understanding why it is doing this an why it did not before replication had been enabled.
I have seen a number of threads here on the site and my guess would be that the default 8k volblocksize causes a lot of overhead, but then I don't understand why this excessive usage wouldn't happen before replication was enabled.
Also ZFS optimization guide suggests using volblocksize matching the VM allocation block size, which in my case would be 4k, but from what I read using 8k should actually just lose a bit of performance on writes instead of causing additional overhead, right?
What could I look into to find out if this is normal behavior, and if not, how could I restore the space and prevent this from happening again?
Thanks!
I have two similar machines (Intel 4 core CPU, 16GB RAM, with a 500G HDD boot drive, an 1T HDD and two 512G SSD on each machine) on a cluster.
Each system has both SSDs on a ZFS mirror and the single HDD on a ZFS single disk, both created using Proxmox webGUI.
Therefore, two ZFS storage pools on each node:
- Zpool name: hdd; consisting of a single 1T hard drive;
- Zpool name: ssd; consisting of two 512G solid state drives on a mirror.
In total I have 3 containers and 4 VMs running on the cluster.
The problem:
After enabling replication for a VM that had a 512G volume (VM uses 326G) assigned on the hdd zpool the volume jumped to 880G and apparently filled the hdd zpool.
I am having trouble understanding why it is doing this an why it did not before replication had been enabled.
Code:
root@pve2500k:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
hdd 928G 328G 600G - - 0% 35% 1.00x ONLINE -
ssd 476G 74.4G 402G - - 0% 15% 1.00x ONLINE -
root@pve2500k:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
hdd 873G 25.5G 112K /hdd
hdd/subvol-100-disk-0 1.17G 6.83G 1.17G /hdd/subvol-100-disk-0
hdd/subvol-101-disk-0 712M 7.30G 712M /hdd/subvol-101-disk-0
hdd/subvol-102-disk-0 6.16M 1018M 6.16M /hdd/subvol-102-disk-0
hdd/vm-103-disk-0 34.0G 58.5G 980M -
hdd/vm-104-disk-0 838G 538G 326G -
ssd 239G 222G 96K /ssd
ssd/vm-104-disk-0 97.4G 296G 23.1G -
ssd/vm-105-disk-0 74.6G 271G 25.1G -
ssd/vm-106-disk-0 67.4G 263G 26.2G -
root@pve2500k:~# zfs get all hdd/vm-104-disk-0
NAME PROPERTY VALUE SOURCE
hdd/vm-104-disk-0 type volume -
hdd/vm-104-disk-0 creation Mon Apr 6 9:58 2020 -
hdd/vm-104-disk-0 used 838G -
hdd/vm-104-disk-0 available 538G -
hdd/vm-104-disk-0 referenced 326G -
hdd/vm-104-disk-0 compressratio 1.02x -
hdd/vm-104-disk-0 reservation none default
hdd/vm-104-disk-0 volsize 512G local
hdd/vm-104-disk-0 volblocksize 8K default
hdd/vm-104-disk-0 checksum on default
hdd/vm-104-disk-0 compression on inherited from hdd
hdd/vm-104-disk-0 readonly off default
hdd/vm-104-disk-0 createtxg 486 -
hdd/vm-104-disk-0 copies 1 default
hdd/vm-104-disk-0 refreservation 512G local
hdd/vm-104-disk-0 guid 17088501247693601542 -
hdd/vm-104-disk-0 primarycache all default
hdd/vm-104-disk-0 secondarycache all default
hdd/vm-104-disk-0 usedbysnapshots 0B -
hdd/vm-104-disk-0 usedbydataset 326G -
hdd/vm-104-disk-0 usedbychildren 0B -
hdd/vm-104-disk-0 usedbyrefreservation 512G -
hdd/vm-104-disk-0 logbias latency default
hdd/vm-104-disk-0 objsetid 395 -
hdd/vm-104-disk-0 dedup off default
hdd/vm-104-disk-0 mlslabel none default
hdd/vm-104-disk-0 sync standard default
hdd/vm-104-disk-0 refcompressratio 1.02x -
hdd/vm-104-disk-0 written 0 -
hdd/vm-104-disk-0 logicalused 333G -
hdd/vm-104-disk-0 logicalreferenced 333G -
hdd/vm-104-disk-0 volmode default default
hdd/vm-104-disk-0 snapshot_limit none default
hdd/vm-104-disk-0 snapshot_count none default
hdd/vm-104-disk-0 snapdev hidden default
hdd/vm-104-disk-0 context none default
hdd/vm-104-disk-0 fscontext none default
hdd/vm-104-disk-0 defcontext none default
hdd/vm-104-disk-0 rootcontext none default
hdd/vm-104-disk-0 redundant_metadata all default
hdd/vm-104-disk-0 encryption off default
hdd/vm-104-disk-0 keylocation none default
hdd/vm-104-disk-0 keyformat none default
hdd/vm-104-disk-0 pbkdf2iters 0 default
I have seen a number of threads here on the site and my guess would be that the default 8k volblocksize causes a lot of overhead, but then I don't understand why this excessive usage wouldn't happen before replication was enabled.
Also ZFS optimization guide suggests using volblocksize matching the VM allocation block size, which in my case would be 4k, but from what I read using 8k should actually just lose a bit of performance on writes instead of causing additional overhead, right?
What could I look into to find out if this is normal behavior, and if not, how could I restore the space and prevent this from happening again?
Thanks!