Hello,
since a few days, we have on a PVE7 - standalone - host the strange issue, that VMs stuck and not responding (host unreachable) anymore. The root cause seems to be, that the host "/" is getting full:
Some envs:
I have realy no idea .. why it happens:
I've deleted a non used VM, which gave us 12GB back .. which I could seen yesterday.
I've checked the Monitoring graphs .. and I can see, that the free space jumps from ~400GB on 1.1.2024 to less than 10GB. I think .. maybe our Syslog VMs did something .. or our Debian repository .. but the real question for me is: How can that be ? I mean .. the VM disks, has a limited space assigned .. how can it bring the rootfs from the host itself to full ?
I've tried to give a reservation for the rpool/ROOT/pve-1:
but without success. Again were the VMs in the morning (7:35) down. Small note: Not all, but all VMs, which are bigger and doing stuff, like Debian Repo / Syslog / Puppet Master host. Our Jumphos was always ok.
So .. I'm a bit out of ideas.
any suggestions?
since a few days, we have on a PVE7 - standalone - host the strange issue, that VMs stuck and not responding (host unreachable) anymore. The root cause seems to be, that the host "/" is getting full:
Some envs:
- Debian Bullseye pve-manager/7.4-17/513c62be
Code:
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 1.4M 6.3G 1% /run
rpool/ROOT/pve-1 6.4G 3.3G 3.1G 52% /
tmpfs 32G 46M 32G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
rpool 1.4G 256K 1.4G 1% /rpool
rpool/data 1.4G 256K 1.4G 1% /rpool/data
rpool/ROOT 1.4G 256K 1.4G 1% /rpool/ROOT
rpool/pve-container 1.4G 256K 1.4G 1% /rpool/pve-container
/dev/fuse 128M 24K 128M 1% /etc/pve
tmpfs 6.3G 0 6.3G 0% /run/user/1003
tmpfs 6.3G 0 6.3G 0% /run/user/1024
Code:
# zpool status
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 01:24:24 with 0 errors on Sun Dec 10 01:48:25 2023
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
wwn-0x5002538f4121a940-part2 ONLINE 0 0 0
wwn-0x5002538f4121a93d-part2 ONLINE 0 0 0
wwn-0x5002538f4121a947-part2 ONLINE 0 0 0
wwn-0x5002538f31332a6f-part2 ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
wwn-0x5002538f4121a945-part2 ONLINE 0 0 0
wwn-0x5002538f4121a94b-part2 ONLINE 0 0 0
wwn-0x5002538f41203130-part2 ONLINE 0 0 0
wwn-0x5002538f43464634-part2 ONLINE 0 0 0
errors: No known data errors
# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 7.25T 7.02T 239G - - 82% 96% 1.00x ONLINE -
zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 5.14T 1.39G 140K /rpool
rpool/ROOT 5.00G 1.39G 140K /rpool/ROOT
rpool/ROOT/pve-1 3.29G 3.10G 3.29G /
rpool/data 140K 1.39G 140K /rpool/data
rpool/pve-container 5.13T 1.39G 140K /rpool/pve-container
rpool/pve-container/vm-100-disk-1 10.3G 2.49G 9.21G -
rpool/pve-container/vm-100-disk-2 1.07T 1.39G 1.07T -
rpool/pve-container/vm-101-disk-0 11.4G 1.39G 11.4G -
rpool/pve-container/vm-101-disk-1 10.3G 7.39G 4.32G -
rpool/pve-container/vm-103-disk-0 48.9G 21.5G 22.8G -
rpool/pve-container/vm-103-disk-1 52.0G 5.75G 32.1G -
rpool/pve-container/vm-104-disk-0 25.8G 5.52G 21.6G -
rpool/pve-container/vm-105-disk-0 25.8G 8.28G 18.9G -
rpool/pve-container/vm-105-disk-1 3.88T 1.39G 3.88T -
rpool/swap 9.45G 1.39G 9.45G -
I have realy no idea .. why it happens:
I've deleted a non used VM, which gave us 12GB back .. which I could seen yesterday.
I've checked the Monitoring graphs .. and I can see, that the free space jumps from ~400GB on 1.1.2024 to less than 10GB. I think .. maybe our Syslog VMs did something .. or our Debian repository .. but the real question for me is: How can that be ? I mean .. the VM disks, has a limited space assigned .. how can it bring the rootfs from the host itself to full ?
I've tried to give a reservation for the rpool/ROOT/pve-1:
Code:
root@fc-r02-pmox-06:[/etc/cron.daily]: zfs get reservation rpool/ROOT/pve-1
NAME PROPERTY VALUE SOURCE
rpool/ROOT/pve-1 reservation 5G local
but without success. Again were the VMs in the morning (7:35) down. Small note: Not all, but all VMs, which are bigger and doing stuff, like Debian Repo / Syslog / Puppet Master host. Our Jumphos was always ok.
So .. I'm a bit out of ideas.
any suggestions?