PBS completely full disk

bea · Jan 10, 2025

Hello.

I have a remote PBS with 2 disks as a ZFS-mirror. It all worked great but the disks got completely full so it is not operable anymore. I can access it through GUI and SSH, but I can only run few commands. I have deleted almost every log and most of its backups and then waited for more than 24 hours, but no difference.

What else could I do?

I don't mind loosing every backup it has.

Thank you

Chris · Jan 10, 2025

bea said:
I have deleted almost every log and most of its backups and then waited for more than 24 hours, but no difference.

What else could I do?

You also need to run garbage collection [0], pruning snapshots will not free any unused chunks.

[0] https://pbs.proxmox.com/docs/maintenance.html#garbage-collection

bea · Jan 10, 2025

Thank you.

I understood that is run automatically 24h and 5 min after deletion. But anyway when trying to run it manually it gives me an unable to start garbage collection job on datastore mydatastore - ENOSPC: No space left on device (400) error

tcabernoch · Jan 10, 2025

I see you posted that you don't mind losing every backup.
Ok. Here's how to do that.

Go to the shell console in PBS or an SSH session.

This shows you the ZFS datasets on your PBS server.
zfs list

Use the info to build the following command. This will delete the dataset.
zfs destroy bea-backup-or-whatever-you-called-it

This reset's PBS's info about the dataset.
mv /etc/proxmox-backup/datastore.cfg /etc/proxmox-backup/datastore.bak

Perhaps more folks will comment here. Listen to them too.
Search the forum. This is not an uncommon problem. There are longer, more complicated answers than the one I just supplied.

You don't want this to happen again.
After your recovery, set a reservation on your root dataset so if you fill up again, root will still work.

zfs set reservation=10g rpool/ROOT

bea · Jan 13, 2025

Thank you.

I don't understand. I think I know what a datastore is, but what is a dataset?

I don't mind loosing every data or even configuration, as far as it becomes again the healthy PBS it was. If it were not remote, I would have reinstalled PBS from scratch (and then configure a quota). But as it is remote, I don't know how to clean everything to have a healthy PBS again.

What should I destroy here?

Bash:

root@remote-pbs-:~# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool             1.76T     0B  1.75T  /rpool
rpool/ROOT        2.36G     0B    96K  /rpool/ROOT
rpool/ROOT/pbs-1  2.36G     0B  2.36G  /

VictorSTS · Jan 14, 2025

Take a look at my post here [1]

[1] https://forum.proxmox.com/threads/how-to-recover-from-100-disk-use.142044/post-637174

bea · Jan 15, 2025

Thank you.

On step 1, when trying to set the read-only maintenance mode through the GUI on the datastore, I get the following error:

mkstemp "/etc/proxmox-backup/datastore.tmp_XXXXXX" failed: ENOSPC: No space left on device (400)

Is there any workaround to accomplish this first step or should I forget it and go on for the second step?

VictorSTS · Jan 15, 2025

Just proceed to step 2, as step 1 is just to disable writes from PBS clients to the datastore and in your case is not needed.

bea · Jan 15, 2025

The PBS has a sync job that worries me. I mean it will daily connect to another PBS to pull data. I cannot remove that sync job from the GUI (I get the no-space-left error). Is there any way to somehow disable it (or kill it) through commands?

I guess that step 1 would make it.

bea · Jan 15, 2025

By the way. I did step 2. I am waiting to pass those 24 hours (but I'm afraid that sync job will spoil the party).

bea · Jan 17, 2025

No difference. I guess I'll have to go there physically and reinstall everything from scratch.

And then configure a quota.

Is it in the roadmap having a default quota enabled or any other mechanism to avoid this? If not, where should I register an issue, suggestion or something similar?

Thanks

VictorSTS · Jan 17, 2025

The procedure I described did work 100% of the times I've had to use it. I would need a lot more detail on server storage config, the exact actions taken, logs of the GC task to provide any other hint about what could be going on here.

To disable the sync job you can edit / delete /etc/proxmox-backup/sync.cfg. If the datastore is on the same drives as the OS, you can try to vacuum journald logs to recover some space too journalctl --vacuum-time=1d.

There have been some bug reported regarding this [1], but for now the reply has been something like "the admin should take care of not letting the drives to get too full by some other means", which I fully agree but given how hard it can get to recover from a full disk without data loss, PBS could really provide some mechanism to automatically reserve some space and deny writes if some level is reached.

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5376

bea · Jan 17, 2025

Yeeeees, it does work! Thank you!!
Or at least that's what it seems now.
I did journalctl --vacuum-time=1h
I moved more than 2GB to an external drive.
I renamed /etc/proxmox-backup/sync.cfg
(which had a line saying schedule *:0/30, which I think it means every 30min!)
I did not know what service I had to restart for the sync config change so I did a reboot.
And now the GC is running! It is finding those expected errors you mentioned.
So things seem to go now in the right direction.
I will wait and proceed with the rest of the steps and then configure the quota.
Thank you again!

bea · Jan 20, 2025

Done. PBS healthy and working. And quota configured.

Search

Search

PBS completely full disk

bea

Active Member

Chris

Proxmox Staff Member

bea

Active Member

tcabernoch

Active Member

bea

Active Member

VictorSTS

Famous Member

bea

Active Member

VictorSTS

Famous Member

bea

Active Member

bea

Active Member

bea

Active Member

VictorSTS

Famous Member

bea

Active Member

bea

Active Member

We value your privacy