VM Status: io-error ZFS full

wowkisame

New Member
May 20, 2022
7
0
1
Hello,

Since yesterday I had a VM with status : io-error. I can't find a way to make free space on it ... When I check my ZFS pool :
root@pve:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 222G 115G 107G - - 68% 51% 1.00x ONLINE -
stockage 43.6T 42.3T 1.36T - - 10% 96% 1.00x ONLINE -

ZFS list :
root@pve:~# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 115G 100G 104K /rpool
rpool/ROOT 4.34G 100G 96K /rpool/ROOT
rpool/ROOT/pve-1 4.34G 100G 4.34G /
rpool/data 110G 100G 96K /rpool/data
rpool/data/vm-100-disk-0 1.74G 100G 1.74G -
rpool/data/vm-101-disk-0 108G 100G 108G -
rpool/data/vm-105-disk-0 56K 100G 56K -
stockage 30.7T 0B 140K /stockage
stockage/vm-101-disk-0 30.7T 0B 30.7T -


I think that the free space in zpool is use to parity for raidz, so ZFS don't have any space.

I can start a live Linux on the VM but when I try to mount the filesystem, the VM freeze and I got status : io-error in Proxmox.

I don't know if is-it possible to mount the filesystem directly to the host proxmox and make some cleanup ...
Any others ideas ?

Thanks !
 
Last edited:
I think that the free space in zpool is use to parity for raidz, so ZFS don't have any space.
No, the free space is unusable, therefore you cannot store anything new on it.

First, I'd look for any refreservation values, then I'd see if there are any snapshots that can be deleted, then I'd temporarily send/receive one dataset to another pool and delete the source ... and do your maintenance stuff.
 
Hi LnxBil,
First of all, thanks for your time.

Unfortunately, no snapshot, no reservation, no quota (may be there are others commands ?) :
Code:
root@pve:~# zfs get reservation stockage
NAME      PROPERTY     VALUE   SOURCE
stockage  reservation  none    default

root@pve:~# zfs get quota stockage
NAME      PROPERTY  VALUE  SOURCE
stockage  quota     none   default

root@pve:~# zfs get snapshot_count stockage
NAME      PROPERTY        VALUE    SOURCE
stockage  snapshot_count  none     default

root@pve:~# zfs list -t snapshot
no datasets available

send/receive one dataset to another pool and delete the source
Could you explain this part please ?

Of course I will create an alert for storage next time :( ...
 
Better you set a quota too and not just an alarm. In general its recommended to keep around 20% of your ZFS pool always empty because the pool will become slower when the pool gets full. I always set a quota of 90% for the entire pool and set a alarm when 80% is exceeded to clean up stuff to bring it under 80% again. That way you can't run into such a situation where your pools gets full to a point where it stops working.

For replication examples see here: https://docs.oracle.com/cd/E18752_01/html/819-5461/gbchx.html

And you only checked "reservation" but not "refreservation".

Output of zfs list -o space stockage or even zfs list -o space would be useful too. Then you also see snapshot and refreservation space usage.
 
Last edited:
Hi Dunuin,

Thanks for your time and your answer.

Code:
root@pve:~# zfs get refreservation stockage
NAME      PROPERTY        VALUE      SOURCE
stockage  refreservation  none       default

root@pve:~# zfs list -o space stockage
NAME      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
stockage  1.96M  30.7T        0B    140K             0B      30.7T

root@pve:~# zfs list -o space
NAME                      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
rpool                      100G   115G        0B    104K             0B       115G
rpool/ROOT                 100G  4.34G        0B     96K             0B      4.34G
rpool/ROOT/pve-1           100G  4.34G        0B   4.34G             0B         0B
rpool/data                 100G   110G        0B     96K             0B       110G
rpool/data/vm-100-disk-0   100G  1.74G        0B   1.74G             0B         0B
rpool/data/vm-101-disk-0   100G   108G        0B    108G             0B         0B
rpool/data/vm-105-disk-0   100G    56K        0B     56K             0B         0B
stockage                  1.96M  30.7T        0B    140K             0B      30.7T
stockage/vm-101-disk-0    1.96M  30.7T        0B   30.7T             0B         0B

I don't understand the "Sending and Receiving ZFS Data" part. I can only send a part of stockage to rpool/data for exemple ?

Thanks.
 
Hi Dunuin,

Thanks for your time and your answer.

Code:
root@pve:~# zfs get refreservation stockage
NAME      PROPERTY        VALUE      SOURCE
stockage  refreservation  none       default

root@pve:~# zfs list -o space stockage
NAME      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
stockage  1.96M  30.7T        0B    140K             0B      30.7T

root@pve:~# zfs list -o space
NAME                      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
rpool                      100G   115G        0B    104K             0B       115G
rpool/ROOT                 100G  4.34G        0B     96K             0B      4.34G
rpool/ROOT/pve-1           100G  4.34G        0B   4.34G             0B         0B
rpool/data                 100G   110G        0B     96K             0B       110G
rpool/data/vm-100-disk-0   100G  1.74G        0B   1.74G             0B         0B
rpool/data/vm-101-disk-0   100G   108G        0B    108G             0B         0B
rpool/data/vm-105-disk-0   100G    56K        0B     56K             0B         0B
stockage                  1.96M  30.7T        0B    140K             0B      30.7T
stockage/vm-101-disk-0    1.96M  30.7T        0B   30.7T             0B         0B

I don't understand the "Sending and Receiving ZFS Data" part. I can only send a part of stockage to rpool/data for exemple ?
You need to send full datasets or zvols. So you for example could use "zfs send" to copy a zvol like your stockage/vm-101-disk-0 from the stockage pool to your rpool pool or even to another server running a ZFS pool. But isn't really helpful here because you just use one big zvol that fills the entire pool...atleast unless you got another ZFS pool with more than 31+ TB of free space.
But I also don't see other options if refreservation and snapshot space usage is already zero and if you don't got space to move the 30.7TB zvol temporarily to some other bigger ZFS pool.
 
Last edited:
Duniun,

I don't have 31+TB to copy my volume... So I don't have any solution to clean or reduce the volume :(
 
Oh :(
May be someone else have others ideas ?

Thanks.
I am sorry, but having a plan for such situation is something you should have done "before" running into issues.
You will have to move the data to another Disk.... ZFS does really not like filled up to 100%.
 
The problem is that you have only one zvol stored on your pool stockage without any snapshots or any kind of reservation and you ran out of space. The send/receive route is not a viable option, because you only have one volume.

The problem with freeing up used space is that this probably needs to write in order to clean it up a bit, which is not real option. Yet, as you've already descibed, you already ran out of space and the guest noticed that, so you're already a filesystem with errors on it. As you described, you cannot mount the disk in a live system due to the space problem, so cleaning it now won't help - or maybe it does.

Can you describe further what you see inside of your guest with e.g. a Live Linux? What OS and what is the partition layout, maybe a configured swap or recovery partition? You only need to free (more precisely: block discard) some stuff in order to get your main/big partition mounted again.
 
Hi LnxBil,

In live host : Only one ext4 partition with no SWAP or other parition that I can delete.
All my critical data in this partition are backup so I think that I will delete this partition and start again with quota ... :(
 
Another approach would be to temporarily add a disk to your zpool, do your cleanup und remove it. That should be the last straw.

All my critical data in this partition are backup so I think that I will delete this partition and start again with quota ... :(
Quota will not help to clean up unused stuff. You should run fstrim daily to get rid of deleted stuff inside of your filesystem and free up the used storage outside.