Today I was trying to backup one of my VMs before migrating it to a new node, and I mistakenly assumed the node it was on had enough local storage to complete the backup (all of the other nodes in the cluster have 2TB or more of local storage, but this node only had 512GB for some reason). The VM had a 256gb disk, and the node ran out of space on rpool and the backup job hung. At this point I wasn't sure why this was happening. The node wouldn't let me reboot because the backup job was in an uninteruptable sleep, so I disconnected the power from the machine and restarted it.
Now, the machine is unable to connect to the cluster. I looked at the logs for pve-cluster service, and this is when I realized the disk was full.
I'm not really sure how to proceed at this point. I am pretty sure all my configuration files are fine, I think I just need to delete the files from the failed backup and I should be able to reconnect, but I am no ZFS-guru. Here is the output of some commands for some info on my storage situation:
As you can see there are 0 bytes available in the pool. However, when I navigate to any of those locations (e.g. /rpool/vms/) there are no files there. How can I get to the files that I need to clear out?
Thanks.
Now, the machine is unable to connect to the cluster. I looked at the logs for pve-cluster service, and this is when I realized the disk was full.
Code:
Feb 14 14:25:29 pve2 systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Feb 14 14:25:29 pve2 pmxcfs[1412]: [main] notice: resolved node name 'pve2' to '192.168.50.3' for default node IP address
Feb 14 14:25:29 pve2 pmxcfs[1412]: [main] notice: resolved node name 'pve2' to '192.168.50.3' for default node IP address
Feb 14 14:25:29 pve2 pmxcfs[1412]: [database] crit: chmod failed: No space left on device
Feb 14 14:25:29 pve2 pmxcfs[1412]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Feb 14 14:25:29 pve2 pmxcfs[1412]: [main] notice: exit proxmox configuration filesystem (-1)
Feb 14 14:25:29 pve2 pmxcfs[1412]: [database] crit: chmod failed: No space left on device
Feb 14 14:25:29 pve2 pmxcfs[1412]: [main] crit: memdb_open failed - unable to open database '/var/lib/pve-cluster/config.db'
Feb 14 14:25:29 pve2 pmxcfs[1412]: [main] notice: exit proxmox configuration filesystem (-1)
Feb 14 14:25:29 pve2 systemd[1]: pve-cluster.service: Control process exited, code=exited, status=255/EXCEPTION
Feb 14 14:25:29 pve2 systemd[1]: pve-cluster.service: Failed with result 'exit-code'.
Feb 14 14:25:29 pve2 systemd[1]: Failed to start pve-cluster.service - The Proxmox VE cluster filesystem.
Feb 14 14:25:29 pve2 systemd[1]: pve-cluster.service: Scheduled restart job, restart counter is at 4.
Feb 14 14:25:29 pve2 systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
I'm not really sure how to proceed at this point. I am pretty sure all my configuration files are fine, I think I just need to delete the files from the failed backup and I should be able to reconnect, but I am no ZFS-guru. Here is the output of some commands for some info on my storage situation:
Code:
root@pve2:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 32G 0 32G 0% /dev
tmpfs 6.3G 1.3M 6.3G 1% /run
rpool/ROOT/pve-1 268G 268G 0 100% /
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
efivarfs 128K 8.3K 115K 7% /sys/firmware/efi/efivars
rpool 128K 128K 0 100% /rpool
rpool/ROOT 128K 128K 0 100% /rpool/ROOT
rpool/data 128K 128K 0 100% /rpool/data
rpool/vms 128K 128K 0 100% /rpool/vms
tmpfs 6.3G 0 6.3G 0% /run/user/0
root@pve2:~#
root@pve2:~# zfs list -o space,refquota,quota,volsize
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD REFQUOTA QUOTA VOLSIZE
rpool 0B 435G 0B 104K 0B 435G none none -
rpool/ROOT 0B 268G 0B 96K 0B 268G none none -
rpool/ROOT/pve-1 0B 268G 0B 268G 0B 0B none none -
rpool/data 0B 96K 0B 96K 0B 0B none none -
rpool/vms 0B 167G 0B 96K 0B 167G none none -
rpool/vms/vm-102-disk-0 0B 167G 0B 167G 0B 0B - - 256G
As you can see there are 0 bytes available in the pool. However, when I navigate to any of those locations (e.g. /rpool/vms/) there are no files there. How can I get to the files that I need to clear out?
Thanks.