[SOLVED] rpool full, not sure why

Mar 8, 2016
66
5
73
[TLDR: Cannot LS or DU /etc/pve on one host. Other host in cluster is fine. Almost all space on rpool is used but cannot be seen / no reason for it, rpool should be about 3GB. Most backups are not on rpool]


Hi!

I tried to update one of my Proxmox 6.2-11 machines today and there was not enough space for apt to do it.

Code:
# zfs list -o space,refquota,quota,volsize
NAME                             AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  REFQUOTA  QUOTA  VOLSIZE
rpool                             106M   449G        0B    208K             0B       449G      none   none        -
rpool/ROOT                        106M   440G        0B    192K             0B       440G      none   none        -
rpool/ROOT/pve-1                  106M   440G        0B    440G             0B         0B      none   none        -
rpool/data                        106M  1.35G        0B    192K             0B      1.35G      none   none        -
rpool/data/subvol-111-disk-1      106M  1.35G        0B   1.35G             0B         0B       10G   none        -
rpool/swap                       8.61G  8.50G        0B    112K          8.50G         0B         -      -       8G
rpool/zsync-bu                    106M   192K        0B    192K             0B         0B      none   none        -

I have other disks as well, but that's the rpool. When I try du -h on various directories under / I can't find anything using significant space. DU -h /etc will not complete, hangs during computation of /etc/pve/<subfolder>

This machine is only used as a backup and is a member of corosync with my main machine.

Thanks for suggestions of how to clear up or reclaim some space! I am unaware of what might be using this space on rpool.

ps- rpool on the main machine has plenty of space and it has some VMs stored on it!

Code:
# zfs list -o space,refquota,quota,volsize
NAME                            AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  REFQUOTA  QUOTA  VOLSIZE
rpool                            217G   245G        0B    208K             0B       245G      none   none        -
rpool/ROOT                       217G  53.1G        0B    192K             0B      53.1G      none   none        -
rpool/ROOT/pve-1                 217G  53.1G        0B   53.1G             0B         0B      none   none        -
rpool/data                       217G   178G        0B    208K             0B       178G      none   none        -
rpool/data/subvol-102-disk-1    7.08G   946M        0B    946M             0B         0B        8G   none        -
rpool/data/subvol-106-disk-0    5.27G   752M        0B    752M             0B         0B        6G   none        -
rpool/data/subvol-111-disk-1    8.62G  1.63G      259M   1.38G             0B         0B       10G   none        -
rpool/data/vm-100-disk-1         217G  9.79G     4.55G   5.24G             0B         0B         -      -      12G
rpool/data/vm-103-disk-1         217G  7.62G     4.13G   3.49G             0B         0B         -      -       4G
rpool/data/vm-103-disk-3         217G  13.3G      197M   13.1G             0B         0B         -      -      20G
rpool/data/vm-104-disk-0         217G  7.97G     2.81M   7.97G             0B         0B         -      -      15G
rpool/data/vm-104-disk-1         217G   136G     92.3G   43.4G             0B         0B         -      -      84G
rpool/swap                       217G  13.1G        0B   13.1G             0B         0B         -      -       8G

pps- If I go to /etc/pve/nodes/pveb where pveb is the name of this node in question, I cannot LS that directory. Or DU.

Code:
drwxr-xr-x 2 root www-data 0 Jun 23  2018 pveb
root@pveb:/etc/pve/nodes# ll pveb
hangs

If I go to my main Proxmox box, pvea, I can list it:

Code:
root@pvea:/etc/pve/nodes# ll pveb
total 2.0K
-rw-r----- 1 root www-data   83 Sep 26 05:59 lrm_status
-rw-r----- 1 root www-data    0 Sep 21 05:59 lrm_status.tmp.10568
-rw-r----- 1 root www-data   83 Sep 19 06:25 lrm_status.tmp.14668
-rw-r----- 1 root www-data    0 Sep 17 06:51 lrm_status.tmp.18195
drwxr-xr-x 2 root www-data    0 Jun 23  2018 lxc
drwxr-xr-x 2 root www-data    0 Jun 23  2018 openvz
drwx------ 2 root www-data    0 Jun 23  2018 priv
-rw-r----- 1 root www-data 1.7K Jun 23  2018 pve-ssl.key
-rw-r----- 1 root www-data 1.7K Jul 17 05:46 pve-ssl.pem
drwxr-xr-x 2 root www-data    0 Jun 23  2018 qemu-server

Code:
root@pvea:/etc/pve/nodes# du -h pveb
0    pveb/priv
0    pveb/openvz
0    pveb/qemu-server
0    pveb/lxc
2.0K    pveb
 
Last edited:
You should check why your "rpool/ROOT/pve-1" uses "440G" of data on one server and the other only "53.1G"
 
Check /var/log directory first.
Sometimes the journals get very big or logrotate doesn't work.
Depending on how long the system runs you als o might have a lot of space eaten up by old modules in /lib/modules directory.

I was annojed by this and created a script for cleanup:
https://forum.proxmox.com/threads/lib-modules-filling-up-hard-drive.39199/

HTH

Yes, I checked things like that... all of the root level folders:

Code:
root@pveb:/# du -hs /bin
9.5M    /bin
root@pveb:/# du -hs /boot
159M    /boot
root@pveb:/# du -hs /dev
63M    /dev
root@pveb:/# du -hs /etc
^C (hangs)
root@pveb:/# du -hs /home
43K    /home
root@pveb:/# du -hs /lib
595M    /lib
root@pveb:/# du -hs /lib64
1.0K    /lib64
root@pveb:/# du -hs /media
512    /media
root@pveb:/# du -hs /mnt
512    /mnt
root@pveb:/# du -hs /opt
512    /opt
root@pveb:/# du -hs /reds
3.5K    /reds
root@pveb:/# du -hs /root
132K    /root
root@pveb:/# du -hs /rpool
2.5K    /rpool
root@pveb:/# du -hs /run
19M    /run
root@pveb:/# du -hs /sbin
9.0M    /sbin
root@pveb:/# du -hs /srv
512    /srv
root@pveb:/# du -hs /sys
0    /sys
root@pveb:/# du -hs /usr
1.1G    /usr
root@pveb:/# du -hs /var
120M    /var
 
/etc/pve is a fuse filesystem ... I suspect something has gone awry with that, as evidenced by the inability to LS or DU /etc/pve. On the main machine I can LS and DU /etc/pve.
 
Ok, not sure why I didn't spot this before. In short, I had a disk mounted at /internal4g that I was copying data to incrementally. That disk went away so it started backing up to a folder called /internal4g which was of course on the rpool / disk. I guess this is a danger in the way Linux mounts disks, maybe there's a way around it that I don't know about yet!

Sorry for any false implication it was a Proxmox issue when it was not.
 
Ok, not sure why I didn't spot this before. In short, I had a disk mounted at /internal4g that I was copying data to incrementally. That disk went away so it started backing up to a folder called /internal4g which was of course on the rpool / disk. I guess this is a danger in the way Linux mounts disks, maybe there's a way around it that I don't know about yet!

Sorry for any false implication it was a Proxmox issue when it was not.

Slight update to that -- internal4g is a ZFS dataset set to mount at the root level, so /internal4g. I guess at some point it did not come up and a folder structure was created at /internal4g (on root disk) probably by my backup script. So from then on the ZFS dataset couldn't mount there so the root disk got filled up!