Hi,
On a machine with 6x4 TB hdd I installed PVE 6.4 (up to date) choosing RAIDZ2 (ashift left to default 12), and this should leave 4x4=16 TB or 14.1 TiB usable.
On this PVE I created a single VM running debian with its default ext4 and fstrim and a 10000 GB (9.77 TiB) disk (discard checked).
No snapshot or anything other than just running this particular VM, it's been ok for a while.
But when the VM reached 7.05 TiB of data (reported by df -h) it suddenly got I/O error reported by PVE, and after checking various things I noticed PVE reported a full 14.1 TiB ZFS (!).
I cleaned up a few things so it got back to (very small) 4.12GiB free:
zfs get all (see below) says 14.1 TiB is allocated for this 9.77 TiB disk and "only" 7.05 TiB effectively used, meaning that I lost exactly half of the usable disk space with the default PVE ZFS parameters.
I read a few things about ZFS and it looks for zvol the parameter volblocksize (8k default) might be tunable to achieve a better disk utilization for zvol (and recordsize for datasets).
May be I missed other important parameters.
However I'm no ZFS expert, what would you recommand in my case?
And if I upgrade to a 8 x 8 TB disk RAIDZ2 ?
On a machine with 6x4 TB hdd I installed PVE 6.4 (up to date) choosing RAIDZ2 (ashift left to default 12), and this should leave 4x4=16 TB or 14.1 TiB usable.
Code:
# zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 1 days 00:35:33 with 0 errors on Mon Sep 13 00:59:36 2021
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2334PBKYN8HT-part3 ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2334PCG0XTRB-part3 ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2334PCG1H6TB-part3 ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2334PBKYBWET-part3 ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2334PBKY3SAT-part3 ONLINE 0 0 0
ata-HGST_HDN724040ALE640_PK2334PBKV0PRT-part3 ONLINE 0 0 0
On this PVE I created a single VM running debian with its default ext4 and fstrim and a 10000 GB (9.77 TiB) disk (discard checked).
No snapshot or anything other than just running this particular VM, it's been ok for a while.
But when the VM reached 7.05 TiB of data (reported by df -h) it suddenly got I/O error reported by PVE, and after checking various things I noticed PVE reported a full 14.1 TiB ZFS (!).
I cleaned up a few things so it got back to (very small) 4.12GiB free:
Code:
# zfs list -t all -o space
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
rpool 4.12G 14.1T 0B 208K 0B 14.1T
rpool/ROOT 4.12G 2.00G 0B 192K 0B 2.00G
rpool/ROOT/pve-1 4.12G 2.00G 0B 2.00G 0B 0B
rpool/data 4.12G 14.1T 0B 192K 0B 14.1T
rpool/data/vm-100-disk-0 4.12G 14.1T 0B 14.1T 0B 0B
zfs get all (see below) says 14.1 TiB is allocated for this 9.77 TiB disk and "only" 7.05 TiB effectively used, meaning that I lost exactly half of the usable disk space with the default PVE ZFS parameters.
I read a few things about ZFS and it looks for zvol the parameter volblocksize (8k default) might be tunable to achieve a better disk utilization for zvol (and recordsize for datasets).
May be I missed other important parameters.
However I'm no ZFS expert, what would you recommand in my case?
And if I upgrade to a 8 x 8 TB disk RAIDZ2 ?
Code:
# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-6
pve-kernel-helper: 6.4-6
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-1
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.5-pve1~bpo10+1
Code:
# zfs get all rpool/data/vm-100-disk-0
NAME PROPERTY VALUE SOURCE
rpool/data/vm-100-disk-0 type volume -
rpool/data/vm-100-disk-0 creation Fri Feb 19 10:02 2021 -
rpool/data/vm-100-disk-0 used 14.1T -
rpool/data/vm-100-disk-0 available 4.12G -
rpool/data/vm-100-disk-0 referenced 14.1T -
rpool/data/vm-100-disk-0 compressratio 1.00x -
rpool/data/vm-100-disk-0 reservation none default
rpool/data/vm-100-disk-0 volsize 9.77T local
rpool/data/vm-100-disk-0 volblocksize 8K default
rpool/data/vm-100-disk-0 checksum on default
rpool/data/vm-100-disk-0 compression on inherited from rpool
rpool/data/vm-100-disk-0 readonly off default
rpool/data/vm-100-disk-0 createtxg 423 -
rpool/data/vm-100-disk-0 copies 1 default
rpool/data/vm-100-disk-0 refreservation none default
rpool/data/vm-100-disk-0 guid 17557223982948174465 -
rpool/data/vm-100-disk-0 primarycache all default
rpool/data/vm-100-disk-0 secondarycache all default
rpool/data/vm-100-disk-0 usedbysnapshots 0B -
rpool/data/vm-100-disk-0 usedbydataset 14.1T -
rpool/data/vm-100-disk-0 usedbychildren 0B -
rpool/data/vm-100-disk-0 usedbyrefreservation 0B -
rpool/data/vm-100-disk-0 logbias latency default
rpool/data/vm-100-disk-0 objsetid 145 -
rpool/data/vm-100-disk-0 dedup off default
rpool/data/vm-100-disk-0 mlslabel none default
rpool/data/vm-100-disk-0 sync standard inherited from rpool
rpool/data/vm-100-disk-0 refcompressratio 1.00x -
rpool/data/vm-100-disk-0 written 14.1T -
rpool/data/vm-100-disk-0 logicalused 7.05T -
rpool/data/vm-100-disk-0 logicalreferenced 7.05T -
rpool/data/vm-100-disk-0 volmode default default
rpool/data/vm-100-disk-0 snapshot_limit none default
rpool/data/vm-100-disk-0 snapshot_count none default
rpool/data/vm-100-disk-0 snapdev hidden default
rpool/data/vm-100-disk-0 context none default
rpool/data/vm-100-disk-0 fscontext none default
rpool/data/vm-100-disk-0 defcontext none default
rpool/data/vm-100-disk-0 rootcontext none default
rpool/data/vm-100-disk-0 redundant_metadata all default
rpool/data/vm-100-disk-0 encryption off default
rpool/data/vm-100-disk-0 keylocation none default
rpool/data/vm-100-disk-0 keyformat none default
rpool/data/vm-100-disk-0 pbkdf2iters 0 default