root on ZFS full - because too huge guest disk

udo

Distinguished Member
Apr 22, 2009
5,975
196
163
Ahrensburg; Germany
Hi all,
I try zfs for an temp-server to save my home fileserver data to an new install.

The pve-server has 6x1TB-disks in zraid-2 and the fileserver was filled with rsync.

After some time (the sileserver has app. 1.7TB data) the pve-host filled the root filesystem to 100% and now I need some help...

Strange is, that the VM should only used round 2.6TB:
Code:
cat /etc/pve/qemu-server/310.conf
balloon: 0
boot: c
bootdisk: scsi0
cores: 2
ide2: local:iso/debian-stretch-DI-rc1-amd64-netinst.iso,media=cdrom,size=296M
memory: 2048
name: fileserver
net0: virtio=CA:6B:34:63:6A:91,bridge=vmbr0
numa: 0
ostype: l26
scsi0: local-zfs:vm-310-disk-1,size=6G
scsi1: local-zfs:vm-310-disk-2,size=2600G
scsihw: virtio-scsi-pci
smbios1: uuid=14c8f391-40a0-4a9d-ac49-f2dedb5810c0
sockets: 1
The zpool hasn't enough free space:
Code:
root@pve-temp:~# zfs list -t all -r -o space
NAME  AVAIL  USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
rpool  0  3.51T  0  192K  0  3.51T
rpool/ROOT  0  2.54G  0  192K  0  2.54G
rpool/ROOT/pve-1  0  2.54G  0  2.54G  0  0
rpool/data  0  3.50T  0  192K  0  3.50T
rpool/data/vm-310-disk-1  0  3.15G  0  3.15G  0  0
rpool/data/vm-310-disk-2  0  3.50T  0  3.50T  0  0
rpool/swap  7.33G  7.44G  0  110M  7.33G  0

root@pve-temp:~# df -h
Filesystem  Size  Used Avail Use% Mounted on
udev  10M  0  10M  0% /dev
tmpfs  1.6G  11M  1.6G  1% /run
rpool/ROOT/pve-1  2.6G  2.6G  0 100% /
tmpfs  3.9G  46M  3.9G  2% /dev/shm
tmpfs  5.0M  0  5.0M  0% /run/lock
tmpfs  3.9G  0  3.9G  0% /sys/fs/cgroup
rpool  128K  128K  0 100% /rpool
rpool/ROOT  128K  128K  0 100% /rpool/ROOT
rpool/data  128K  128K  0 100% /rpool/data
/dev/fuse  30M  16K  30M  1% /etc/pve

root@pve-temp:~# zpool status
  pool: rpool
 state: ONLINE
  scan: none requested
config:

   NAME  STATE  READ WRITE CKSUM
   rpool  ONLINE  0  0  0
    raidz2-0  ONLINE  0  0  0
    sda2  ONLINE  0  0  0
    sdb2  ONLINE  0  0  0
    sdc2  ONLINE  0  0  0
    sdd2  ONLINE  0  0  0
    sde2  ONLINE  0  0  0
    sdf2  ONLINE  0  0  0

Code:
pveversion -v
proxmox-ve: 4.4-78 (running kernel: 4.4.35-2-pve)
pve-manager: 4.4-5 (running version: 4.4-5/c43015a5)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.35-2-pve: 4.4.35-78
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-102
pve-firmware: 1.1-10
libpve-common-perl: 4.0-85
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-71
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.1-1
pve-container: 1.0-90
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.6-5
lxcfs: 2.0.5-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
Why used the client more space than allowed, and why - because 1.7TB used only??

Any hints?

Udo
 
Have you used thick provisioning (not actively enabling thin provisioning) of your pool? If so, check the property refreservation on your "big" dataset/zvol.

The space usage is always with parity, such that you have more space used physically than logically. This is strange, but I also stumbled over this.
 
Have you used thick provisioning (not actively enabling thin provisioning) of your pool? If so, check the property refreservation on your "big" dataset/zvol.
Hi,
I had nothing change from the default-installation - Thin provisioning was marked on.
The space usage is always with parity, such that you have more space used physically than logically. This is strange, but I also stumbled over this.
Which parity?? With six 1TB-HDDs ( app. 930GB) and zraid2 I have 4 * 930GB for data - which is app. the 3.5TB which is show as storage-space.
Why should the data doubled on the 3.5TB?

ZFS and I don't get buddys... every time I try ZFS there is something weird.

Udo
 
Hi,
I had nothing change from the default-installation - Thin provisioning was marked on.

Which parity?? With six 1TB-HDDs ( app. 930GB) and zraid2 I have 4 * 930GB for data - which is app. the 3.5TB which is show as storage-space.
Why should the data doubled on the 3.5TB?

ZFS and I don't get buddys... every time I try ZFS there is something weird.

Udo

raidz-2, ashift=12 and the default blocksize for zvols don't mix well. see this recent thread: https://forum.proxmox.com/threads/zfs-pool-not-showing-correct-usage.31111/#post-155425 for why, and set the blocksize option on the storage on the PVE side to an appropriate value. you will need to recopy all your data to see the space savings (but you can use "move disk" in PVE to do this live).
 
raidz-2, ashift=12 and the default blocksize for zvols don't mix well. see this recent thread: https://forum.proxmox.com/threads/zfs-pool-not-showing-correct-usage.31111/#post-155425 for why, and set the blocksize option on the storage on the PVE side to an appropriate value. you will need to recopy all your data to see the space savings (but you can use "move disk" in PVE to do this live).
Hi Fabian,
thanks for the hint.

I removed the big disk, add blockzie:
Code:
zfspool: local-zfs
   pool rpool/data
   sparse
   blocksize 32k
   content images,rootdir
recreate the disk and now it's looks better:
Code:
# before:
root@pve-temp:~# zfs list -o name,used,refer,volsize,volblocksize,written -r rpool
NAME  USED  REFER  VOLSIZE  VOLBLOCK  WRITTEN
rpool/data/vm-310-disk-2  3.50T  3.50T  2.54T  8K  3.50T

# after:
rpool/data/vm-310-disk-2  5.48G  5.48G  2.54T  32K  5.48G
Unfortunal the system (pve-host) reboots during rsync inside the guest after app. 8 minutes...
8GB of hostmem (VM use 2GB - only one VM is running) looks not enough. Must tune zfs for mem usage?!
Looks not realy convincing for me.

In the logs are nothing special:
Code:
Feb  1 11:25:36 pve-temp pvedaemon[14772]: starting vnc proxy UPID:pve-temp:000039B4:00898857:5891B7A0:vncproxy:310:root@pam:
Feb  1 11:25:36 pve-temp pvedaemon[575]: <root@pam> starting task UPID:pve-temp:000039B4:00898857:5891B7A0:vncproxy:310:root@pam:
Feb  1 11:25:38 pve-temp systemd-timesyncd[2050]: interval/delta/delay/jitter/drift 2048s/+0.001s/0.025s/0.000s/+17ppm (ignored)
Feb  1 11:25:38 pve-temp kernel: [90136.068050] kvm [14740]: vcpu0 unhandled rdmsr: 0xc001100d
Feb  1 11:25:38 pve-temp kernel: [90136.204653] kvm [14740]: vcpu1 unhandled rdmsr: 0xc001100d
Feb  1 11:27:46 pve-temp pvedaemon[4379]: <root@pam> successful auth for user 'root@pam'
Feb  1 11:32:48 pve-temp kernel: [90566.071358] kvm [14740]: vcpu0 unhandled rdmsr: 0xc001100d
Feb  1 11:32:48 pve-temp kernel: [90566.208726] kvm [14740]: vcpu1 unhandled rdmsr: 0xc001100d
Feb  1 11:37:38 pve-temp rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="2489" x-info="http://www.rsyslog.com"] start
Feb  1 11:37:38 pve-temp systemd-modules-load[1510]: Module 'fuse' is builtin
Feb  1 11:37:38 pve-temp systemd[1]: Mounted Huge Pages File System.
Feb  1 11:37:38 pve-temp systemd[1]: Mounted POSIX Message Queue File System.
Feb  1 11:37:38 pve-temp systemd[1]: Mounted Debug File System.
Feb  1 11:37:38 pve-temp systemd[1]: Started Create Static Device Nodes in /dev.
Feb  1 11:37:38 pve-temp kernel: [  0.000000] Initializing cgroup subsys cpuset
Feb  1 11:37:38 pve-temp systemd[1]: Starting udev Kernel Device Manager...
Feb  1 11:37:38 pve-temp kernel: [  0.000000] Initializing cgroup subsys cpu
Feb  1 11:37:38 pve-temp kernel: [  0.000000] Initializing cgroup subsys cpuacct
Feb  1 11:37:38 pve-temp kernel: [  0.000000] Linux version 4.4.35-2-pve (root@nora) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Mon Jan 9 10:21:44 CET 2017 ()
Feb  1 11:37:38 pve-temp kernel: [  0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.4.35-2-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
Feb  1 11:37:38 pve-temp kernel: [  0.000000] KERNEL supported cpus:
Feb  1 11:37:38 pve-temp systemd-modules-load[1510]: Inserted module 'vhost_net'
Feb  1 11:37:38 pve-temp kernel: [  0.000000]  Intel GenuineIntel
Feb  1 11:37:38 pve-temp kernel: [  0.000000]  AMD AuthenticAMD
Feb  1 11:37:38 pve-temp kernel: [  0.000000]  Centaur CentaurHauls
Feb  1 11:37:38 pve-temp kernel: [  0.000000] tseg: 00bdf00000
Feb  1 11:37:38 pve-temp kernel: [  0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
...
Udo
 
Hi Fabian,
thanks for the hint.

I removed the big disk, add blockzie:
Code:
zfspool: local-zfs
   pool rpool/data
   sparse
   blocksize 32k
   content images,rootdir
recreate the disk and now it's looks better:
Code:
# before:
root@pve-temp:~# zfs list -o name,used,refer,volsize,volblocksize,written -r rpool
NAME  USED  REFER  VOLSIZE  VOLBLOCK  WRITTEN
rpool/data/vm-310-disk-2  3.50T  3.50T  2.54T  8K  3.50T

# after:
rpool/data/vm-310-disk-2  5.48G  5.48G  2.54T  32K  5.48G
Unfortunal the system (pve-host) reboots during rsync inside the guest after app. 8 minutes...
8GB of hostmem (VM use 2GB - only one VM is running) looks not enough. Must tune zfs for mem usage?!
Looks not realy convincing for me.
Udo

yes, 6GB (or 4? if you have not changed the default 50% ARC limit) is on the rather low end for your pool.

you can change the ARC to be only used for metadata, but this will decrease performance (you are relying 100% on VM internal caching then), so it should only be seen as a last resort for low-memory machines.
 
aidz-2, ashift=12 and the default blocksize for zvols don't mix well

Yes, that's true, but why is there ashift=12 in use? This should only be used on big (>= 2 TB) harddisks and should match the underlying sector size. For smaller (e.g. here 1 TB) you can use ashift=9 and have all the nice and shiny features.

If you have the wrong blocksize, you will not benefit from things like compression at all: If you compress a 4K Block into e.g. 2,37 KB to be stored on an ashift=12 device (4K internal block size in ZFS), it does not matter if it is compressed or not, it will still fill a 4K block, but if you have 512B blocks, you'll only need 2,5 KB (or 5 blocks) to store the data.

I had the same problem when I send/received an ashift=9 pool to an ashift=12 pool and had twice the amount of space used as before. That was horrible!
 
Yes, that's true, but why is there ashift=12 in use? This should only be used on big (>= 2 TB) harddisks and should match the underlying sector size. For smaller (e.g. here 1 TB) you can use ashift=9 and have all the nice and shiny features.

If you have the wrong blocksize, you will not benefit from things like compression at all: If you compress a 4K Block into e.g. 2,37 KB to be stored on an ashift=12 device (4K internal block size in ZFS), it does not matter if it is compressed or not, it will still fill a 4K block, but if you have 512B blocks, you'll only need 2,5 KB (or 5 blocks) to store the data.

I had the same problem when I send/received an ashift=9 pool to an ashift=12 pool and had twice the amount of space used as before. That was horrible!

you can choose the ashift in the installer since 4.4 if you know you want 9 (or 13 ;)) instead of 12.

the problem (and original reason for defaulting to 12) was that lots of disks in the past have been lying about their sector size, and you would see performance issues when writing with ashift=9 (because the disk actually had to read-modify-write 4k for each 512b that you actually wanted to write). ZFS nowadays has an internal list for overriding some of the known offenders, and AFAICT there are less problematic devices in the wild nowadays (and physical 512b disks are almost extinct), so maybe we could switch to defaulting to ashift=0 (i.e., autodetection) unless something is specified explicitly in the advanced settings?
 
I thought the ashift possibilites are hardcoded, aren't they?

ZFS itself allows 9-13, but basically only 9, 12 and 13 make any sense (512b, 4k and 8k respectively). the PVE installer defaults to 12, but allows 9-13 (in the advanced settings when selecting ZFS). once set, it can never be changed for a given zpool.
 
  • Like
Reactions: i_am_jam

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!