[SOLVED] No space left on device, vm disk filled partition. How to free up space/proceed?

ixM

Member
Oct 7, 2021
6
1
8
36
Hello everyone,

I'm in a bit of a pickle. One of our VMs suddenly stopped working. I restarted the VM and it worked fine for a bit but then crashed again (appearing with a yellow warning label on VM in proxmox gui, showing io error or something of the sort).

It turns out that the disk of proxmox is full :

Code:
root@server:~# df -h
Filesystem        Size  Used Avail Use% Mounted on
udev               16G     0   16G   0% /dev
tmpfs             3.2G  1.5M  3.2G   1% /run
rpool/ROOT/pve-1  1.7G  1.7G     0 100% /
tmpfs              16G     0   16G   0% /dev/shm
tmpfs             5.0M     0  5.0M   0% /run/lock
efivarfs          192K  108K   80K  58% /sys/firmware/efi/efivars
rdata             861G  4.5G  857G   1% /rdata
rpool             256K  256K     0 100% /rpool
rpool/var-lib-vz  256K  256K     0 100% /var/lib/vz
rpool/ROOT        256K  256K     0 100% /rpool/ROOT
tmpfs             3.2G     0  3.2G   0% /run/user/0
rpool/data        256K  256K     0 100% /rpool/data

By digging further, I found that during the installation process of our VM, it was misconfigured and a vm disk that should have been put in the rdata zfs pool, was put in another (rpool) in which proxmox is installed. I don't know how it's possible but it was configured to have a size of 800 GB and was placed in a 220 GB disk...

Code:
root@server:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rdata   888G  4.41G   884G        -         -     0%     0%  1.00x    ONLINE  -
rpool   220G   213G  6.77G        -         -    83%    96%  1.00x    ONLINE  -

Code:
root@server:~# zfs list    
NAME                         USED  AVAIL  REFER  MOUNTPOINT
rdata                       4.41G   856G  4.41G  /rdata
rpool                        213G     0B   208K  /rpool
rpool/ROOT                  1.60G     0B   192K  /rpool/ROOT
rpool/ROOT/pve-1            1.60G     0B  1.60G  /
rpool/data                   212G     0B   192K  /rpool/data
rpool/data/base-101-disk-0  6.32G     0B  6.32G  -
rpool/data/vm-100-disk-0    24.7G     0B  27.4G  -
rpool/data/vm-100-disk-1     181G     0B   181G  -
rpool/var-lib-vz             240K     0B   240K  /var/lib/vz

How can I proceed ? ncdu doesn't reveal anything of a remotely useful size to move/delete, all qm functions fail because the process hasn't started because the disk is full.

Can I move the large disk manually to rdata? Where is this vm disk file located? /rpool/ is all but empty :

Code:
root@server:~# du -hs /rpool/*
512     /rpool/data
1.0K    /rpool/ROOT

Many thanks in advance for your help!

Best regards,


Maximilien
 
this can happen ;)

since there is no room for further writes, you have to "create" some. The only way to do that is to overwrite an existing file with a smaller one. find a large size candidate (like in /var/log) and overwrite it like so:

dd if=/dev/zero bs=4k count=1 of=/path/to/your/file

that should give you a toehold, and you can proceed to delete other stuff normally.

Bear in mind that you only have 1.6GB available for your system. PVE needs more than that and it'll happen again VERY quickly.
 
  • Like
Reactions: ixM and gfngfn256
The most striking thing with your setup is that (as mentioned by alexskysilk) total root size of 1.7G! This is most unusual for ANY installation. Just plain logging and apt cache updates will quickly fill ALL that space. Not to mention a simple mistake in storage-selection (as has happened in your case) or a missing mount device - will almost immediately render your system inoperable.

I wonder how you got to this unusual setup in the first place.
 
  • Like
Reactions: ixM
Hi guys,


Thanks for your fast reply. I don't know how this system was created but earlier, it showed the root partition as 2.5 GB. I deleted a 800 MB vzdump file and rebooted the server. The VM started again and soon enough the disk was full and showed a size of 1.7 GB which is why I think it's related to this large vmdisk that shouldn't have been there.

Is it possible to prevent the vm from starting without using the qm tool or the interface ? If I free up a few mb of logs, this is the last chance I'll get of freeing up space and if it's really due to the vm, restarting the server will return it to the same situation.

Thanks a bunch!


Maximilien
 
I wonder how you got to this unusual setup in the first place.
Not hard to understand. he's using rpool for both the system and payload. fairly simple to overprovision and then overrun.
rpool/data/base-101-disk-0 6.32G 0B 6.32G -
rpool/data/vm-100-disk-0 24.7G 0B 27.4G -
rpool/data/vm-100-disk-1 181G 0B 181G -
 
  • Like
Reactions: ixM
Is it possible to prevent the vm from starting without using the qm tool or the interface ?
Maybe possible to edit the kernel command line on boot to mask the pve-guests.service with systemd.mask=pve-guests .
This way on the next boot after you have finally corrected that disk space issue - you simply reboot & everything should run again as usual.

Edit: I just found someone who has already suggested similar here.
 
  • Like
Reactions: ixM
Not hard to understand. he's using rpool for both the system and payload. fairly simple to overprovision and then overrun.
He is showing a total allocated size of 1.7G in his df -h output. This is what I referred to as non-understandable. Why would someone setup PVE like that.
Maybe I'm missing something about ZFS (which I don't use).
 
  • Like
Reactions: ixM
in the case of op, he set up the entirety of his only disk(s) as a single zfs pool. he is using it for both his root partition AND his virtual disk space. since zfs zvols deployed by pve are thin provisioned by default, its possible to overprovision (request maximum) space, and as your vm disks grow they can and will consume all available space.

edit to make this post useful, I'll also mention that you can use reservations and quotas to protect your root space from being overrun, see https://docs.oracle.com/cd/E19253-01/819-5461/gbdbb/index.html for more information.
 
Last edited:
  • Like
Reactions: ixM
in the case of op, he set up the entirety of his only disk(s) as a single zfs pool. he is using it for both his root partition AND his virtual disk space. since zfs zvols deployed by pve are thin provisioned by default, its possible to overprovision (request maximum) space, and as your vm disks grow they can and will consume all available space.
This I understood already. BUT on a regular freshly setup PVE installation you will be using ~3G before any logs, updates etc. So how in his case is the completely allotted size only 1.7G?
 
  • Like
Reactions: ixM
Yes, I didn't set up this server, it's really unfortunate that this mistake was made as another zfs pool exists for the purpose of storing this vm disk...

Is it possible to move the problematic vm disk without using the qm commands ?

I cannot even find this file/partition. zfs list shows the vm disks but it's nowhere to be found using ls, ncdu,...

Am I shit out of luck and is the only solution to reinstall proxmox and lose the vm and it's content ?
 
Have you already tried my above suggestion (kernel command line)?
Missed it, just tried it and it looks like it's working.

Rebooted and the VM was stopped. Now, it looks like I'm able to move the disk to the other zpool storage.

In 20 minutes, I'll know if I can breathe a sigh of relief or not :-D Fingers, crossed!

Thank you so much !!!

(will update later)

Maximilien
 
So ! It worked !

What I did was :
- disabled the pve-guests service : systemctl disable pve-guests.service
- move the content of /usr/share to the other zpool : mv /usr/share /rdata/
- mkdir /usr/share
- added a bind mount to fstab : /rdata/share /usr/share none defaults,bind 0 0
- rebooted the server
- with the freed up space, proxmox was able to start, I then move the disk to the rdata zpool using the interface
- deleted the "unused disk" from the VM configuration
- removed the mount from fstab (I tried to stop all services and umount it but although lsof showed nothing using /usr/share, it wouldn't umount it)
- rebooted the proxmox (lots of services were failing due to /usr/share now being empty), including the networking
- using our KVM, I copied all the necessary data back to /usr/share and rebooted once again
- proxmox started
- renabled the pve-guests.service
- rebooted once again
- and everything was working

The last steps were a bit sketchy but it worked in the end.


Thanks so much for the help!
 
  • Like
Reactions: leesteken