Proxmox local storage usage went to 100% in a few minutes

knutp

New Member
Mar 17, 2022
3
0
1
44
I moved one vm from one host to another and the vm became immediately non responsive. After a few more minutes the proxmox host also became unresponsive and restarted itself. When it restarted the local storage had gone up to 100%

1647509322199.png

1647509616810.png

The VM in question got is qcow file in an external file storage, so it should not take up space.

df -h gives this:
Code:
Filesystem                        Size  Used Avail Use% Mounted on
udev                              252G     0  252G   0% /dev
tmpfs                              51G  122M   51G   1% /run
/dev/mapper/pve-root               94G   94G     0 100% /
tmpfs                             252G   60M  252G   1% /dev/shm
tmpfs                             5.0M     0  5.0M   0% /run/lock
/dev/fuse                         128M   64K  128M   1% /etc/pve
ip-addr:/mnt/ssd            705G  243G  462G  35% /mnt/pve/stor3_ssd
ip-addr:/mnt/stor1          2.4T  1.6T  803G  67% /mnt/pve/stor3
ip-addr:/mnt/stor2_1/stor2  3.1T  123G  3.0T   4% /mnt/pve/stor2
tmpfs                              51G     0   51G   0% /run/user/0

I am now unable to run this du -a / | sort -n -r | head -n 20 to check where the largest files and folders are because of no available storage.
 
your / is full, see the output in df above in your post.

most likely you've restored the VM in the local storage which points by default to /var/lib/vz/ (as pointed out by @LnxBil above)

you can try going to your VM in the GUI and in the "Hardware" tab select your disk and press "Move disk" (also select to remove the original disk, to free up the space)

hope this helps
 
I did a apt autoremove to get just a bit of disk space available, I could then run a check of what folders was the biggest. it was actually loads of GB in kern.log , messages and syslog (29 GB in each)
 
it was actually loads of GB in kern.log , messages and syslog (29 GB in each)
well then something weird might have happened in that case, normally those files shouldn't get that big (since they're rotated)..

do you see any relevant errors in journalctl -xe output?
 
Unfortunately it starts at the boot it did after the host went unresponsive.

I tried a live migrate from one host to another, they are exactly the same hardware. The vm was a win 10, 8 core kvm cpu, 8 GB ram, virtio storage (on nfs), e1000 nic. I have done this loads of times before and it have never been a problem.

It is quite strange to see that the log files exploded to 87 GB within 20 minutes
 
Unfortunately it starts at the boot it did after the host went unresponsive.
that's unfortunate. you can enable persistent journaling in case it happens another time: mkdir -p /var/log/journal

It is quite strange to see that the log files exploded to 87 GB within 20 minutes
yes very strange, keep an eye on the logs
 
From your screenshot you are running kernel 5.13.19-5 (5.13.19-12).

With 5.13.19-11/12 there was a problem with Windows-VMs, resulting in massive log spamming. [1] [2]

The problem got solved with 5.13.19-13. [3]

So I would suggest to update your PVE-host and reboot for the new kernel to take effect.

[1] https://forum.proxmox.com/threads/dirtypipe-cve-2022-0847-fix-for-proxmox-ve.106096/#post-456663
[2] https://forum.proxmox.com/threads/5...er-of-a-few-min-win10-vm-will-not-boot.106110
[3] https://forum.proxmox.com/threads/dirtypipe-cve-2022-0847-fix-for-proxmox-ve.106096/#post-456768
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!