[SOLVED] Where is the space in ZFS?

tuxillo

Renowned Member
Mar 2, 2010
57
6
73
Hi,

I have two ZFS pools, one for regular operation and another one for backups.

Code:
    root@pve01:~# zpool status |grep -w pool
      pool: rbackup
      pool: rpool

I have two containers that take a lot of space in rpool and the backup to the rbackup pool can't succeed, I don't know why.
The thing is that at some point the root filesystem fills up and I can't find out where the space is. The whole instance becomes unusable:

Code:
root@pve01:~# zfs list
NAME                           USED  AVAIL     REFER  MOUNTPOINT
rbackup                        545G  1.22T      545G  /rbackup
rpool                         1.76T  2.81M      104K  /rpool
rpool/ROOT                     763G  2.81M       96K  /rpool/ROOT
rpool/ROOT/pve-1               763G  2.81M      763G  /
rpool/data                    1.01T  2.81M      120K  /rpool/data
rpool/data/subvol-100-disk-0   796M  2.81M      796M  /rpool/data/subvol-100-disk-0
rpool/data/subvol-101-disk-0   325G  2.81M      325G  /rpool/data/subvol-101-disk-0
rpool/data/subvol-102-disk-0   663G  2.81M      663G  /rpool/data/subvol-102-disk-0
rpool/data/subvol-103-disk-0  10.1G  2.81M     10.1G  /rpool/data/subvol-103-disk-0
rpool/data/subvol-103-disk-1  2.06G  2.81M     2.06G  /rpool/data/subvol-103-disk-1
rpool/data/subvol-103-disk-2   454M  2.81M      454M  /rpool/data/subvol-103-disk-2
rpool/data/subvol-103-disk-3  7.80G  2.81M     7.80G  /rpool/data/subvol-103-disk-3
rpool/data/subvol-104-disk-0  22.9G  2.81M     22.9G  /rpool/data/subvol-104-disk-0
rpool/data/subvol-105-disk-0   943M  2.81M      943M  /rpool/data/subvol-105-disk-0
rpool/data/subvol-106-disk-0  1011M  2.81M     1011M  /rpool/data/subvol-106-disk-0
rpool/data/subvol-110-disk-0  1.07G  2.81M     1.07G  /rpool/data/subvol-110-disk-0

After removing the leftover vzdump snapshots, hours and/or days later I start gaining space back but it never fully recovers.
There are no snapshots, du -hsx does not report any extra space usage in /, etc. I've checked the usual suspects.

Any hint will be appreciated.

Thanks.
 
Your rpool is completly filled as is seems.

Please post the output of "df -h"

Then Install ncdu, "apt install ncdu"

Scan for drive space usage "ncdu /", take a look where huge files are and post them here.

If you cant install ncdu due to not enough space, delete some files like old logs under /var/log.
 
As far as I know the correct way of listing the zfs datasets is with `zfs list`. Anyways here's the df -h output:

Code:
root@pve01:~# df -h
Filesystem                    Size  Used Avail Use% Mounted on
udev                          7.8G     0  7.8G   0% /dev
tmpfs                         1.6G   57M  1.6G   4% /run
rpool/ROOT/pve-1              764G  764G  3.4M 100% /
tmpfs                         7.8G   60M  7.8G   1% /dev/shm
tmpfs                         5.0M     0  5.0M   0% /run/lock
tmpfs                         7.8G     0  7.8G   0% /sys/fs/cgroup
rpool                         3.5M  128K  3.4M   4% /rpool
rpool/ROOT                    3.5M  128K  3.4M   4% /rpool/ROOT
rpool/data                    3.5M  128K  3.4M   4% /rpool/data
rpool/data/subvol-102-disk-0  663G  663G  3.4M 100% /rpool/data/subvol-102-disk-0
rpool/data/subvol-101-disk-0  326G  326G  3.4M 100% /rpool/data/subvol-101-disk-0
rpool/data/subvol-100-disk-0  800M  797M  3.4M 100% /rpool/data/subvol-100-disk-0
rpool/data/subvol-103-disk-0   11G   11G  3.4M 100% /rpool/data/subvol-103-disk-0
rpool/data/subvol-103-disk-2  458M  455M  3.4M 100% /rpool/data/subvol-103-disk-2
rpool/data/subvol-103-disk-3  7.8G  7.8G  3.4M 100% /rpool/data/subvol-103-disk-3
rpool/data/subvol-103-disk-1  2.1G  2.1G  3.4M 100% /rpool/data/subvol-103-disk-1
rpool/data/subvol-106-disk-0 1015M 1012M  3.4M 100% /rpool/data/subvol-106-disk-0
rpool/data/subvol-104-disk-0   23G   23G  3.4M 100% /rpool/data/subvol-104-disk-0
rpool/data/subvol-110-disk-0  1.1G  1.1G  3.4M 100% /rpool/data/subvol-110-disk-0
rpool/data/subvol-105-disk-0  947M  944M  3.4M 100% /rpool/data/subvol-105-disk-0
/dev/fuse                      30M   20K   30M   1% /etc/pve


Also the ncdu -x / (without -x it won't complete):

Code:
--- / --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  761.2 GiB [##########] /rbackup                                                                                                                                                                                  
    1.3 GiB [          ] /usr
  328.0 MiB [          ] /var
  217.6 MiB [          ] /boot
    3.6 MiB [          ] /etc
   66.5 KiB [          ] /root
   29.0 KiB [          ] /tmp
    1.5 KiB [          ] /mnt
@ 512.0   B [          ]  libx32
@ 512.0   B [          ]  lib64
@ 512.0   B [          ]  lib32
@ 512.0   B [          ]  sbin
@ 512.0   B [          ]  lib
@ 512.0   B [          ]  bin
e 512.0   B [          ] /srv
e 512.0   B [          ] /opt
e 512.0   B [          ] /media
e 512.0   B [          ] /home
>   0.0   B [          ] /sys
>   0.0   B [          ] /run
>   0.0   B [          ] /rpool
>   0.0   B [          ] /proc
>   0.0   B [          ] /dev

As I have mentioned before, I have already checked the usual suspects, here's also a list of all zfs elements:

Code:
root@pve01:~# zfs list -t all
NAME                           USED  AVAIL     REFER  MOUNTPOINT
rbackup                        545G  1.22T      545G  /rbackup
rpool                         1.76T  3.45M      104K  /rpool
rpool/ROOT                     763G  3.45M       96K  /rpool/ROOT
rpool/ROOT/pve-1               763G  3.45M      763G  /
rpool/data                    1.01T  3.45M      120K  /rpool/data
rpool/data/subvol-100-disk-0   796M  3.45M      796M  /rpool/data/subvol-100-disk-0
rpool/data/subvol-101-disk-0   325G  3.45M      325G  /rpool/data/subvol-101-disk-0
rpool/data/subvol-102-disk-0   663G  3.45M      663G  /rpool/data/subvol-102-disk-0
rpool/data/subvol-103-disk-0  10.1G  3.45M     10.1G  /rpool/data/subvol-103-disk-0
rpool/data/subvol-103-disk-1  2.06G  3.45M     2.06G  /rpool/data/subvol-103-disk-1
rpool/data/subvol-103-disk-2   454M  3.45M      454M  /rpool/data/subvol-103-disk-2
rpool/data/subvol-103-disk-3  7.80G  3.45M     7.80G  /rpool/data/subvol-103-disk-3
rpool/data/subvol-104-disk-0  22.9G  3.45M     22.9G  /rpool/data/subvol-104-disk-0
rpool/data/subvol-105-disk-0   943M  3.45M      943M  /rpool/data/subvol-105-disk-0
rpool/data/subvol-106-disk-0  1011M  3.45M     1011M  /rpool/data/subvol-106-disk-0
rpool/data/subvol-110-disk-0  1.07G  3.45M     1.07G  /rpool/data/subvol-110-disk-0

The space is taken up in `rpool/ROOT/pve-1` and it's not showing up anywhere.
 
There are constant writes to the filesystem, iotop -Pa reports:

Code:
Total DISK READ:         0.00 B/s | Total DISK WRITE:       931.18 B/s
Current DISK READ:       0.00 B/s | Current DISK WRITE:       2.38 M/s
  PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                              
  530 be/4 root          0.00 B      0.00 B  0.00 % 23.27 % [txg_sync]
33107 be/4 www-data      0.00 B   1024.00 B  0.00 % 11.87 % pveproxy worker
51973 be/4 www-data      0.00 B      0.00 B  0.00 %  9.50 % pveproxy worker
41104 be/4 www-data      0.00 B   1024.00 B  0.00 %  2.18 % pveproxy worker

And zpool iostat also shows activity, no clue what's doing:

Code:
root@pve01:~# zpool iostat rpool 1
              capacity     operations     bandwidth 
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
rpool       1.76T  58.0G      1    447  20.5K  3.04M
rpool       1.76T  58.0G      0    375      0  2.54M
rpool       1.76T  58.0G      0    441      0  2.84M
rpool       1.76T  58.0G      0    387      0  2.58M
rpool       1.76T  58.0G      0      0      0      0
rpool       1.76T  58.0G      0    391      0  2.58M
rpool       1.76T  58.0G      0    401      0  2.61M
rpool       1.76T  58.0G      0    387      0  2.58M
rpool       1.76T  58.0G      0      0      0      0
rpool       1.76T  58.0G      0    391      0  2.58M
rpool       1.76T  58.0G      0    499      0  3.40M
rpool       1.76T  58.0G      0      0      0      0
rpool       1.76T  58.0G      0    405      0  2.61M
rpool       1.76T  58.0G      0    393      0  2.58M
rpool       1.76T  58.0G      0      0      0      0
rpool       1.76T  58.0G      0      0      0      0
rpool       1.76T  58.0G      0    376      0  2.58M
rpool       1.76T  58.0G      0    778      0  5.16M
rpool       1.76T  58.0G      0      0      0      0
rpool       1.76T  58.0G      0      0      0      0
rpool       1.76T  58.0G      0      0      0      0
rpool       1.76T  58.0G      0      0      0      0
rpool       1.76T  58.0G      0    379      0  2.55M
rpool       1.76T  58.0G      0    416      0  2.57M
 
The system is rendered completely unusable right now, lots of messages in dmesg:

Code:
[  363.312850] INFO: task pvesr:24346 blocked for more than 120 seconds.
[  363.312893]       Tainted: P           OE     5.4.41-1-pve #1
[  363.312921] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  363.312959] pvesr           D    0 24346      1 0x00000000
[  363.312962] Call Trace:
[  363.312973]  __schedule+0x2e6/0x6f0
[  363.312979]  ? filename_parentat.isra.57.part.58+0xf7/0x180
[  363.312982]  schedule+0x33/0xa0
[  363.312987]  rwsem_down_write_slowpath+0x2ed/0x4a0
[  363.312990]  down_write+0x3d/0x40
[  363.312993]  filename_create+0x8e/0x180
[  363.312997]  do_mkdirat+0x59/0x110
[  363.313000]  __x64_sys_mkdir+0x1b/0x20
[  363.313004]  do_syscall_64+0x57/0x190
[  363.313007]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  363.313010] RIP: 0033:0x7f87f7a120d7
[  363.313017] Code: Bad RIP value.
[  363.313018] RSP: 002b:00007fff8455c4e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
[  363.313020] RAX: ffffffffffffffda RBX: 000055dd21e9f260 RCX: 00007f87f7a120d7
[  363.313022] RDX: 000055dd2154b1f4 RSI: 00000000000001ff RDI: 000055dd25ea1340
[  363.313023] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000006
[  363.313024] R10: 0000000000000000 R11: 0000000000000246 R12: 000055dd23256048
[  363.313025] R13: 000055dd25ea1340 R14: 000055dd25b0c268 R15: 00000000000001ff
[  484.139687] INFO: task pvesr:24346 blocked for more than 241 seconds.
[  484.139735]       Tainted: P           OE     5.4.41-1-pve #1
[  484.139763] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  484.139800] pvesr           D    0 24346      1 0x00000000

I had to kill a zfs dataset to get back 1G. The containers won't start, the tasks in the web interface just hang. At this point the server is rendered useless.
 
As already mentioned your storage is full, there is no space left.

Move /rbackup onto another hard drive.
 
Well, /rbackup is a completely different pool. Thanks for taking the time to reply but I would appreciate if you actually paid attention to the information that is already shared.

The problem is that / is fill and the space does not reflect in files or snapshots anywhre.
 
After your previous message I checked my rbackup pool:

Code:
root@pve01:~# zfs get mountpoint rbackup
NAME     PROPERTY    VALUE       SOURCE
rbackup  mountpoint  /rbackup    local

But indeed df shows that /rbackup is not mounted, dunno why because the mountpoint was set already for the rbackup pool.
My guess is that the rbackup pool got umounted at some point, then data went into the /rbackup directory located in / and in the next reboot the rbackup pool could not be mounted there since it was not empty.
 
My guess is that the rbackup pool got umounted at some point, then data went into the /rbackup directory located in / and in the next reboot the rbackup pool could not be mounted there since it was not emp
To avoid that you can configure the storage to not auto create the directory and to mark it as a mountpoint. Not sure though if it can be done via the GUI or needs to be done via the CLI / config file. Best check the docs on storages and what options can be set.
 
  • Like
Reactions: tuxillo
  • Like
Reactions: tuxillo

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!