[SOLVED] Need help: io error zfs pool full do I need to buy new disks to access the data?

vistorvistor

Member
Dec 27, 2021
3
2
8
41
Hey guys, would love some help getting back on my feet without being ruined here! :)

I did a large filesync to one of my VMs the other day and it died with IO error.
I can no longer boot it or mount the disk because it's overprovisioned (yes I know, my bad).
The only solution I can think of is to buy a couple of new harddrives to expand the pool. Is this really the only solution?

Background:
Code:
1x SSD ZFS mirror for VMs named "root".
1x Raidz1 3x 12TB WD RED for backups and storage - named "storage".

3 VMs using SSD only.
1 VM using SSD-storage for root partition but also a very large drive using the entirety of the storage partition (25TB).

In other words, the VM sees a big disk and the ZFS part is handled by Proxmox.
I accidentally created the virtual hard disk on the VM to be bigger than the actual storage-pool and during an unexpected large backup job to the VM the partition grew bigger than the actual ZFS pool resulting in Proxmox giving me an io error.

So now I can't boot the VM. It just says I/O error. Same when I boot with an Ubuntu live ISO. I can detach the large harddrive and system boots fine. Then I attach the disk and dmesg find the "new drive sdb" but whenever I try to mount it (in hope of removing files) system freezes up with I/O error.

The only solution I can think of is to buy two (mirror) or three (raidz1) new disks -> add to the pool and expand it. But there has to be another way... right.. right? :)
 
The only solution I can think of is to buy a couple of new harddrives to expand the pool. Is this really the only solution?
You could backup a VM/LXC, destroy it and see if ZFS can recover and free up some space (which isn't guaranteed as ZFS uses Copy-on-Write so you need free space to be able to delete something to free up space). Then start your VM (in case it could do that...completely filling up a pool usually means more or less data loss/corruptions...), delete stuff and restore that VM/LXC. If the pool can't recover you would need to add more disks. But the VM then still might be damaged in which case you might want to restore a recent backup.

I accidentally created the virtual hard disk on the VM to be bigger than the actual storage-pool and during an unexpected large backup job to the VM the partition grew bigger than the actual ZFS pool resulting in Proxmox giving me an io error.
Thats what quotas are for. With proper quotas set this couldn't be happened by accident...once your pool reaches 80% usage you should buy more disks anyway...
 
  • Like
Reactions: vistorvistor
After some digging I set the discard flag to no and managed to boot the VM temporarily with spa_slop_shift set to 5.
VM actually seems fine, which is great.

I removed a lot of data from the VM OS hoping that Proxmox would pick up on it but that didn't really help.
Setting spa_slop_shift back to 8 again brings back the io error.

Is there any way for Proxmox to detect the fact that there is now a lot of free space or is the only good solution here to buy more disks?
 
I removed a lot of data from the VM OS hoping that Proxmox would pick up on it but that didn't really help.
Without discard enabled ZFS won't free up space as the guestOS can't tell ZFS whats deleted.
Is there any way for Proxmox to detect the fact that there is now a lot of free space or is the only good solution here to buy more disks?
Thats what discard is for. You could try a zpool trim.
 
Last edited:
I turned on discard and ran fstrim in Ubuntu (since that's where trim is needed in my case -> on the actual VM) and proxmox once again reports 10TB free, spa_slop_shift is back to 5 and things are running great!

Thanks a lot for the help and pointers here I thought I was completely screwed!

So for anyone else having this problem:

On the Proxmox host:
Code:
echo 8 > /sys/module/zfs/parameters/spa_slop_shift
On the VM:
- Turn on discard on the problematic harddisk if it's not all ready on
- Start VM
- Clean up your shit
- sudo systemctl start fstrim.service

On the Proxmox host:
Code:
echo 5 > /sys/module/zfs/parameters/spa_slop_shift
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!