Garbage not start

frankz

Renowned Member
Nov 16, 2020
419
27
68
Hello everyone, I realized that I have saturated the space on a datastore so the garbage does not start!

2023-12-26T10:06:27+01:00: starting garbage collection on store ZFS_STORAGE
2023-12-26T10:06:27+01:00: Start GC phase1 (mark used chunks)
2023-12-26T10:06:27+01:00: TASK ERROR: update atime failed for chunk/file "/mnt/datastore/ZFS_STORAGE/.chunks/294f/294f0417a3e12d7396e3a31d3ce7cbbbc7b2d332480bf1f3af970d8a59a90843" - ENOSPC: No space left on device
 
Then you are screwed. A ZFS pool should NEVER be filled up to 100%. Next time set a ZFS quota for the datastores dataset so the pool can't be completely filled by accident. For best performance the pool shouldn't be filled more than 80% anyway, so doesn't hurt to set a 90% quota so 10% will always be kept free. In such a situation you could then temporarily increase the quota from 90 to 95% to get plenty of space to run the GC and later decrease it again to 90%.

Steps you could try:
1.) disable all backup jobs or set the datastore in maintenance mode so freed up space won't be filled up again
2.) in case your datastore and system share the same pool you could try to delete some unneeded files like logs to get space
If you can't delete anything there are the options:
A.) destroy your datastore and lose all your backups and start from scratch
B.) buy more disks and extend your pool so your pool got some additional space so you could run the GC to free up stuff
C.) move the whole datastore folder to another bigger storage you mount on your PBS. Run the GC and move the datastore folder back
D.) move and symlink some chunks to another storage
 
Last edited:
Then you are screwed. A ZFS pool should NEVER be filled up to 100%. Next time set a ZFS quota for the datastores dataset so the pool can't be completely filled by accident. For best performance the pool shouldn't be filled more than 80% anyway, so doesn't hurt to set a 90% quota so 10% will always be kept free. In such a situation you could then temporarily increase the quota from 90 to 95% to get plenty of space to run the GC and later decrease it again to 90%.

Steps you could try:
1.) disable all backup jobs or set the datastore in maintenance mode so freed up space won't be filled up again
2.) in case your datastore and system share the same pool you could try to delete some unneeded files like logs to get space
If you can't delete anything there are the options:
A.) destroy your datastore and lose all your backups and start from scratch
B.) buy more disks and extend your pool so your pool got some additional space so you could run the GC to free up stuff
C.) move the whole datastore folder to another bigger storage you mount on your PBS. Run the GC and move the datastore folder back
D.) move and symlink some chunks to another storage
Thank you, if you want to put in datastore in maintenance mode, how do I delete the logs to recover space?
 
Use the "rm" command. They are in "/var/log". There is also journalctl --vacuum-size=10M to delete old journald logs.
 
Then there isn't much you can do except for getting a bigger disk. Without available space there is no way to remove backups in an undestructive way.
 
Then there isn't much you can do except for getting a bigger disk. Without available space there is no way to remove backups in an undestructive way.
I thank you anyway, but at the moment I have deleted some chunks and the garbage has left. Later we will see .....
 
I thank you anyway, but at the moment I have deleted some chunks and the garbage has left. Later we will see .....
Yes, the problem is that all chunks are deduplicated. So by deleting random 1000 chunks to free up ~2GB space and then running the GC and a full re-verify you could end up with many to all backups not working anymore as each backup snapshot is now missing some chunks.
So don't forget to re-verify ALL backup snapshots so you don't think backups are fine while they are not.
 
Last edited:
Yes, the problem is that all chunks are deduplicated. So by deleting random 1000 chunks to free up ~2GB space and then running the GC and a full re-verify you could end up with many to all backups not working anymore as each backup snapshot is now missing some chunks.
So don't forget to re-verify ALL backup snapshots so you don't think backups are fine while they are not.
Thank you, you were perfectly right! I wanted to try, without being successful. I removed the entire data store. Now everything works. I believe that, this type of management for a system like proxmox is to be considered a strong gap. I thank you anyway for your kindness in answering . Finally I ask you, if at the moment I update to the latest version, having a cluster veaione 7, can you have problems?
 
I believe that, this type of management for a system like proxmox is to be considered a strong gap.
In my opinion, that is a user error. The admin has to make sure there is proper monitoring with notifications in case the storage is slowly running out of space. And to make sure there are quotas set, so it is impossible, even by accident, to brick that pool by filling it up.

But yes, would be nice if PBS would create a datastore with a predefined quota and options in the webUI to set quotas and notifications. So there is a useful default preventing these situations. And it would help people who don't know how to administrate ZFS via CLI.

Finally I ask you, if at the moment I update to the latest version, having a cluster veaione 7, can you have problems?
PBS3 is backwards compatible with PVE7. So yes, upgrading should work.
 
Last edited:
  • Like
Reactions: frankz
In my opinion, that is a user error. The admin has to make sure there is proper monitoring with notifications in case the storage is slowly running out of space. And to make sure there are quotas set, so it is impossible, even by accident, to brick that pool by filling it up.

But yes, would be nice if PBS would create a datastore with a predefined quota and options in the webUI to set quotas and notifications. So there is a useful default preventing these situations. And it would help people who don't know how to administrate ZFS via CLI.


PBS3 is backwards compatible with PVE7. So yes, upgrading should work.
all done ! upgrade works