Strange ZFS vm disk usage after enabling replication

lmentz

Active Member
Feb 12, 2019
3
0
41
27
Brazil
Hello.

I have two similar machines (Intel 4 core CPU, 16GB RAM, with a 500G HDD boot drive, an 1T HDD and two 512G SSD on each machine) on a cluster.
Each system has both SSDs on a ZFS mirror and the single HDD on a ZFS single disk, both created using Proxmox webGUI.
Therefore, two ZFS storage pools on each node:
- Zpool name: hdd; consisting of a single 1T hard drive;
- Zpool name: ssd; consisting of two 512G solid state drives on a mirror.
In total I have 3 containers and 4 VMs running on the cluster.

The problem:
After enabling replication for a VM that had a 512G volume (VM uses 326G) assigned on the hdd zpool the volume jumped to 880G and apparently filled the hdd zpool.
I am having trouble understanding why it is doing this an why it did not before replication had been enabled.

Code:
root@pve2500k:~# zpool list
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
hdd    928G   328G   600G        -         -     0%    35%  1.00x    ONLINE  -
ssd    476G  74.4G   402G        -         -     0%    15%  1.00x    ONLINE  -

root@pve2500k:~# zfs list
NAME                    USED  AVAIL     REFER  MOUNTPOINT
hdd                     873G  25.5G      112K  /hdd
hdd/subvol-100-disk-0  1.17G  6.83G     1.17G  /hdd/subvol-100-disk-0
hdd/subvol-101-disk-0   712M  7.30G      712M  /hdd/subvol-101-disk-0
hdd/subvol-102-disk-0  6.16M  1018M     6.16M  /hdd/subvol-102-disk-0
hdd/vm-103-disk-0      34.0G  58.5G      980M  -
hdd/vm-104-disk-0       838G   538G      326G  -
ssd                     239G   222G       96K  /ssd
ssd/vm-104-disk-0      97.4G   296G     23.1G  -
ssd/vm-105-disk-0      74.6G   271G     25.1G  -
ssd/vm-106-disk-0      67.4G   263G     26.2G  -

root@pve2500k:~# zfs get all hdd/vm-104-disk-0 
NAME               PROPERTY              VALUE                  SOURCE
hdd/vm-104-disk-0  type                  volume                 -
hdd/vm-104-disk-0  creation              Mon Apr  6  9:58 2020  -
hdd/vm-104-disk-0  used                  838G                   -
hdd/vm-104-disk-0  available             538G                   -
hdd/vm-104-disk-0  referenced            326G                   -
hdd/vm-104-disk-0  compressratio         1.02x                  -
hdd/vm-104-disk-0  reservation           none                   default
hdd/vm-104-disk-0  volsize               512G                   local
hdd/vm-104-disk-0  volblocksize          8K                     default
hdd/vm-104-disk-0  checksum              on                     default
hdd/vm-104-disk-0  compression           on                     inherited from hdd
hdd/vm-104-disk-0  readonly              off                    default
hdd/vm-104-disk-0  createtxg             486                    -
hdd/vm-104-disk-0  copies                1                      default
hdd/vm-104-disk-0  refreservation        512G                   local
hdd/vm-104-disk-0  guid                  17088501247693601542   -
hdd/vm-104-disk-0  primarycache          all                    default
hdd/vm-104-disk-0  secondarycache        all                    default
hdd/vm-104-disk-0  usedbysnapshots       0B                     -
hdd/vm-104-disk-0  usedbydataset         326G                   -
hdd/vm-104-disk-0  usedbychildren        0B                     -
hdd/vm-104-disk-0  usedbyrefreservation  512G                   -
hdd/vm-104-disk-0  logbias               latency                default
hdd/vm-104-disk-0  objsetid              395                    -
hdd/vm-104-disk-0  dedup                 off                    default
hdd/vm-104-disk-0  mlslabel              none                   default
hdd/vm-104-disk-0  sync                  standard               default
hdd/vm-104-disk-0  refcompressratio      1.02x                  -
hdd/vm-104-disk-0  written               0                      -
hdd/vm-104-disk-0  logicalused           333G                   -
hdd/vm-104-disk-0  logicalreferenced     333G                   -
hdd/vm-104-disk-0  volmode               default                default
hdd/vm-104-disk-0  snapshot_limit        none                   default
hdd/vm-104-disk-0  snapshot_count        none                   default
hdd/vm-104-disk-0  snapdev               hidden                 default
hdd/vm-104-disk-0  context               none                   default
hdd/vm-104-disk-0  fscontext             none                   default
hdd/vm-104-disk-0  defcontext            none                   default
hdd/vm-104-disk-0  rootcontext           none                   default
hdd/vm-104-disk-0  redundant_metadata    all                    default
hdd/vm-104-disk-0  encryption            off                    default
hdd/vm-104-disk-0  keylocation           none                   default
hdd/vm-104-disk-0  keyformat             none                   default
hdd/vm-104-disk-0  pbkdf2iters           0                      default

I have seen a number of threads here on the site and my guess would be that the default 8k volblocksize causes a lot of overhead, but then I don't understand why this excessive usage wouldn't happen before replication was enabled.
Also ZFS optimization guide suggests using volblocksize matching the VM allocation block size, which in my case would be 4k, but from what I read using 8k should actually just lose a bit of performance on writes instead of causing additional overhead, right?

What could I look into to find out if this is normal behavior, and if not, how could I restore the space and prevent this from happening again?

Thanks!
 
this is just how snapshots for non-thin zvols work in ZFS - when you take the first snapshot, that snapshot takes up the currently used space, but the zvol itself also still takes up its full size. snapshots afterwards only use up the additional space for the changes since the last snapshot.
 
  • Like
Reactions: lmentz
Thank you, I think I get it now.
So if I use thin-provisioning on the ZFS storage then I would not see this behavior, is that correct?
I will try using thin-provision to see if it's an acceptable solution for us, given the possible performance hit.
 
thin-provisioning does not cause any performance difference for ZFS. it is a bit risky though - since with thin provisioning you can actually run out of space in ugly ways (e.g., a guest sees a 50G disk, but after writing 20G it gets I/O errors since the underlying pool is full).

the difference is really only on the logical level - with thick provisioning, ZFS says: this zvol is supposed to have 50G, so I have to ensure that you can always write 50G to it and reserves space accordingly. with thin provisioning, ZFS says: this zvol is 50G big, but I'll only allocate what is actually used - let's hope we still have enough later on when more data comes in. in both cases the physically used space is the same, and in both cases new data gets written to newly allocated blocks - but with thick provisioning, more space is reserved beforehand already.
 
  • Like
Reactions: lmentz

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!