Garbage not start

frankz · Dec 26, 2023

Hello everyone, I realized that I have saturated the space on a datastore so the garbage does not start!

2023-12-26T10:06:27+01:00: starting garbage collection on store ZFS_STORAGE
2023-12-26T10:06:27+01:00: Start GC phase1 (mark used chunks)
2023-12-26T10:06:27+01:00: TASK ERROR: update atime failed for chunk/file "/mnt/datastore/ZFS_STORAGE/.chunks/294f/294f0417a3e12d7396e3a31d3ce7cbbbc7b2d332480bf1f3af970d8a59a90843" - ENOSPC: No space left on device

Dunuin · Dec 26, 2023

Then you are screwed. A ZFS pool should NEVER be filled up to 100%. Next time set a ZFS quota for the datastores dataset so the pool can't be completely filled by accident. For best performance the pool shouldn't be filled more than 80% anyway, so doesn't hurt to set a 90% quota so 10% will always be kept free. In such a situation you could then temporarily increase the quota from 90 to 95% to get plenty of space to run the GC and later decrease it again to 90%.

Steps you could try:
1.) disable all backup jobs or set the datastore in maintenance mode so freed up space won't be filled up again
2.) in case your datastore and system share the same pool you could try to delete some unneeded files like logs to get space
If you can't delete anything there are the options:
A.) destroy your datastore and lose all your backups and start from scratch
B.) buy more disks and extend your pool so your pool got some additional space so you could run the GC to free up stuff
C.) move the whole datastore folder to another bigger storage you mount on your PBS. Run the GC and move the datastore folder back
D.) move and symlink some chunks to another storage

frankz · Dec 26, 2023

Dunuin said:
Then you are screwed. A ZFS pool should NEVER be filled up to 100%. Next time set a ZFS quota for the datastores dataset so the pool can't be completely filled by accident. For best performance the pool shouldn't be filled more than 80% anyway, so doesn't hurt to set a 90% quota so 10% will always be kept free. In such a situation you could then temporarily increase the quota from 90 to 95% to get plenty of space to run the GC and later decrease it again to 90%.

Steps you could try:
1.) disable all backup jobs or set the datastore in maintenance mode so freed up space won't be filled up again
2.) in case your datastore and system share the same pool you could try to delete some unneeded files like logs to get space
If you can't delete anything there are the options:
A.) destroy your datastore and lose all your backups and start from scratch
B.) buy more disks and extend your pool so your pool got some additional space so you could run the GC to free up stuff
C.) move the whole datastore folder to another bigger storage you mount on your PBS. Run the GC and move the datastore folder back
D.) move and symlink some chunks to another storage

Thank you, if you want to put in datastore in maintenance mode, how do I delete the logs to recover space?

Dunuin · Dec 26, 2023

Use the "rm" command. They are in "/var/log". There is also journalctl --vacuum-size=10M to delete old journald logs.

frankz · Dec 26, 2023

Dunuin said:
Use the "rm" command. They are in "/var/log". There is also journalctl --vacuum-size=10M to delete old journald logs.

Ok , but the my datastore is usb !

Dunuin · Dec 26, 2023

Then there isn't much you can do except for getting a bigger disk. Without available space there is no way to remove backups in an undestructive way.

frankz · Dec 26, 2023

Dunuin said:
Then there isn't much you can do except for getting a bigger disk. Without available space there is no way to remove backups in an undestructive way.

I thank you anyway, but at the moment I have deleted some chunks and the garbage has left. Later we will see .....

Dunuin · Dec 26, 2023

frankz said:
I thank you anyway, but at the moment I have deleted some chunks and the garbage has left. Later we will see .....

Yes, the problem is that all chunks are deduplicated. So by deleting random 1000 chunks to free up ~2GB space and then running the GC and a full re-verify you could end up with many to all backups not working anymore as each backup snapshot is now missing some chunks.
So don't forget to re-verify ALL backup snapshots so you don't think backups are fine while they are not.

frankz · Dec 27, 2023

Dunuin said:
Yes, the problem is that all chunks are deduplicated. So by deleting random 1000 chunks to free up ~2GB space and then running the GC and a full re-verify you could end up with many to all backups not working anymore as each backup snapshot is now missing some chunks.
So don't forget to re-verify ALL backup snapshots so you don't think backups are fine while they are not.

Thank you, you were perfectly right! I wanted to try, without being successful. I removed the entire data store. Now everything works. I believe that, this type of management for a system like proxmox is to be considered a strong gap. I thank you anyway for your kindness in answering . Finally I ask you, if at the moment I update to the latest version, having a cluster veaione 7, can you have problems?

Dunuin · Dec 27, 2023

frankz said:
I believe that, this type of management for a system like proxmox is to be considered a strong gap.

In my opinion, that is a user error. The admin has to make sure there is proper monitoring with notifications in case the storage is slowly running out of space. And to make sure there are quotas set, so it is impossible, even by accident, to brick that pool by filling it up.

But yes, would be nice if PBS would create a datastore with a predefined quota and options in the webUI to set quotas and notifications. So there is a useful default preventing these situations. And it would help people who don't know how to administrate ZFS via CLI.

frankz said:
Finally I ask you, if at the moment I update to the latest version, having a cluster veaione 7, can you have problems?

PBS3 is backwards compatible with PVE7. So yes, upgrading should work.

frankz · Dec 27, 2023

Dunuin said:
In my opinion, that is a user error. The admin has to make sure there is proper monitoring with notifications in case the storage is slowly running out of space. And to make sure there are quotas set, so it is impossible, even by accident, to brick that pool by filling it up.

But yes, would be nice if PBS would create a datastore with a predefined quota and options in the webUI to set quotas and notifications. So there is a useful default preventing these situations. And it would help people who don't know how to administrate ZFS via CLI.

PBS3 is backwards compatible with PVE7. So yes, upgrading should work.

all done ! upgrade works

Search

Search

Garbage not start

frankz

Renowned Member

Dunuin

Distinguished Member

frankz

Renowned Member

Dunuin

Distinguished Member

frankz

Renowned Member

Dunuin

Distinguished Member

frankz

Renowned Member

Dunuin

Distinguished Member

frankz

Renowned Member

Dunuin

Distinguished Member

frankz

Renowned Member

We value your privacy