[SOLVED] rpool full, not sure why

Mar 8, 2016
66
3
73
[TLDR: Cannot LS or DU /etc/pve on one host. Other host in cluster is fine. Almost all space on rpool is used but cannot be seen / no reason for it, rpool should be about 3GB. Most backups are not on rpool]


Hi!

I tried to update one of my Proxmox 6.2-11 machines today and there was not enough space for apt to do it.

Code:
# zfs list -o space,refquota,quota,volsize
NAME                             AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  REFQUOTA  QUOTA  VOLSIZE
rpool                             106M   449G        0B    208K             0B       449G      none   none        -
rpool/ROOT                        106M   440G        0B    192K             0B       440G      none   none        -
rpool/ROOT/pve-1                  106M   440G        0B    440G             0B         0B      none   none        -
rpool/data                        106M  1.35G        0B    192K             0B      1.35G      none   none        -
rpool/data/subvol-111-disk-1      106M  1.35G        0B   1.35G             0B         0B       10G   none        -
rpool/swap                       8.61G  8.50G        0B    112K          8.50G         0B         -      -       8G
rpool/zsync-bu                    106M   192K        0B    192K             0B         0B      none   none        -

I have other disks as well, but that's the rpool. When I try du -h on various directories under / I can't find anything using significant space. DU -h /etc will not complete, hangs during computation of /etc/pve/<subfolder>

This machine is only used as a backup and is a member of corosync with my main machine.

Thanks for suggestions of how to clear up or reclaim some space! I am unaware of what might be using this space on rpool.

ps- rpool on the main machine has plenty of space and it has some VMs stored on it!

Code:
# zfs list -o space,refquota,quota,volsize
NAME                            AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  REFQUOTA  QUOTA  VOLSIZE
rpool                            217G   245G        0B    208K             0B       245G      none   none        -
rpool/ROOT                       217G  53.1G        0B    192K             0B      53.1G      none   none        -
rpool/ROOT/pve-1                 217G  53.1G        0B   53.1G             0B         0B      none   none        -
rpool/data                       217G   178G        0B    208K             0B       178G      none   none        -
rpool/data/subvol-102-disk-1    7.08G   946M        0B    946M             0B         0B        8G   none        -
rpool/data/subvol-106-disk-0    5.27G   752M        0B    752M             0B         0B        6G   none        -
rpool/data/subvol-111-disk-1    8.62G  1.63G      259M   1.38G             0B         0B       10G   none        -
rpool/data/vm-100-disk-1         217G  9.79G     4.55G   5.24G             0B         0B         -      -      12G
rpool/data/vm-103-disk-1         217G  7.62G     4.13G   3.49G             0B         0B         -      -       4G
rpool/data/vm-103-disk-3         217G  13.3G      197M   13.1G             0B         0B         -      -      20G
rpool/data/vm-104-disk-0         217G  7.97G     2.81M   7.97G             0B         0B         -      -      15G
rpool/data/vm-104-disk-1         217G   136G     92.3G   43.4G             0B         0B         -      -      84G
rpool/swap                       217G  13.1G        0B   13.1G             0B         0B         -      -       8G

pps- If I go to /etc/pve/nodes/pveb where pveb is the name of this node in question, I cannot LS that directory. Or DU.

Code:
drwxr-xr-x 2 root www-data 0 Jun 23  2018 pveb
root@pveb:/etc/pve/nodes# ll pveb
hangs

If I go to my main Proxmox box, pvea, I can list it:

Code:
root@pvea:/etc/pve/nodes# ll pveb
total 2.0K
-rw-r----- 1 root www-data   83 Sep 26 05:59 lrm_status
-rw-r----- 1 root www-data    0 Sep 21 05:59 lrm_status.tmp.10568
-rw-r----- 1 root www-data   83 Sep 19 06:25 lrm_status.tmp.14668
-rw-r----- 1 root www-data    0 Sep 17 06:51 lrm_status.tmp.18195
drwxr-xr-x 2 root www-data    0 Jun 23  2018 lxc
drwxr-xr-x 2 root www-data    0 Jun 23  2018 openvz
drwx------ 2 root www-data    0 Jun 23  2018 priv
-rw-r----- 1 root www-data 1.7K Jun 23  2018 pve-ssl.key
-rw-r----- 1 root www-data 1.7K Jul 17 05:46 pve-ssl.pem
drwxr-xr-x 2 root www-data    0 Jun 23  2018 qemu-server

Code:
root@pvea:/etc/pve/nodes# du -h pveb
0    pveb/priv
0    pveb/openvz
0    pveb/qemu-server
0    pveb/lxc
2.0K    pveb
 
Last edited:
You should check why your "rpool/ROOT/pve-1" uses "440G" of data on one server and the other only "53.1G"
 
Check /var/log directory first.
Sometimes the journals get very big or logrotate doesn't work.
Depending on how long the system runs you als o might have a lot of space eaten up by old modules in /lib/modules directory.

I was annojed by this and created a script for cleanup:
https://forum.proxmox.com/threads/lib-modules-filling-up-hard-drive.39199/

HTH

Yes, I checked things like that... all of the root level folders:

Code:
root@pveb:/# du -hs /bin
9.5M    /bin
root@pveb:/# du -hs /boot
159M    /boot
root@pveb:/# du -hs /dev
63M    /dev
root@pveb:/# du -hs /etc
^C (hangs)
root@pveb:/# du -hs /home
43K    /home
root@pveb:/# du -hs /lib
595M    /lib
root@pveb:/# du -hs /lib64
1.0K    /lib64
root@pveb:/# du -hs /media
512    /media
root@pveb:/# du -hs /mnt
512    /mnt
root@pveb:/# du -hs /opt
512    /opt
root@pveb:/# du -hs /reds
3.5K    /reds
root@pveb:/# du -hs /root
132K    /root
root@pveb:/# du -hs /rpool
2.5K    /rpool
root@pveb:/# du -hs /run
19M    /run
root@pveb:/# du -hs /sbin
9.0M    /sbin
root@pveb:/# du -hs /srv
512    /srv
root@pveb:/# du -hs /sys
0    /sys
root@pveb:/# du -hs /usr
1.1G    /usr
root@pveb:/# du -hs /var
120M    /var
 
/etc/pve is a fuse filesystem ... I suspect something has gone awry with that, as evidenced by the inability to LS or DU /etc/pve. On the main machine I can LS and DU /etc/pve.
 
Ok, not sure why I didn't spot this before. In short, I had a disk mounted at /internal4g that I was copying data to incrementally. That disk went away so it started backing up to a folder called /internal4g which was of course on the rpool / disk. I guess this is a danger in the way Linux mounts disks, maybe there's a way around it that I don't know about yet!

Sorry for any false implication it was a Proxmox issue when it was not.
 
Ok, not sure why I didn't spot this before. In short, I had a disk mounted at /internal4g that I was copying data to incrementally. That disk went away so it started backing up to a folder called /internal4g which was of course on the rpool / disk. I guess this is a danger in the way Linux mounts disks, maybe there's a way around it that I don't know about yet!

Sorry for any false implication it was a Proxmox issue when it was not.

Slight update to that -- internal4g is a ZFS dataset set to mount at the root level, so /internal4g. I guess at some point it did not come up and a folder structure was created at /internal4g (on root disk) probably by my backup script. So from then on the ZFS dataset couldn't mount there so the root disk got filled up!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!