PBS completely full disk

bea

Active Member
Dec 25, 2020
61
7
28
25
Hello.

I have a remote PBS with 2 disks as a ZFS-mirror. It all worked great but the disks got completely full so it is not operable anymore. I can access it through GUI and SSH, but I can only run few commands. I have deleted almost every log and most of its backups and then waited for more than 24 hours, but no difference.

What else could I do?

I don't mind loosing every backup it has.

Thank you
 
Thank you.

I understood that is run automatically 24h and 5 min after deletion. But anyway when trying to run it manually it gives me an unable to start garbage collection job on datastore mydatastore - ENOSPC: No space left on device (400) error
 
I see you posted that you don't mind losing every backup.
Ok. Here's how to do that.

Go to the shell console in PBS or an SSH session.

This shows you the ZFS datasets on your PBS server.
zfs list

Use the info to build the following command. This will delete the dataset.
zfs destroy bea-backup-or-whatever-you-called-it

This reset's PBS's info about the dataset.
mv /etc/proxmox-backup/datastore.cfg /etc/proxmox-backup/datastore.bak


Perhaps more folks will comment here. Listen to them too.
Search the forum. This is not an uncommon problem. There are longer, more complicated answers than the one I just supplied.

You don't want this to happen again.
After your recovery, set a reservation on your root dataset so if you fill up again, root will still work.

zfs set reservation=10g rpool/ROOT
 
Last edited:
Thank you.

I don't understand. I think I know what a datastore is, but what is a dataset?

I don't mind loosing every data or even configuration, as far as it becomes again the healthy PBS it was. If it were not remote, I would have reinstalled PBS from scratch (and then configure a quota). But as it is remote, I don't know how to clean everything to have a healthy PBS again.

What should I destroy here?


Bash:
root@remote-pbs-:~# zfs list
NAME               USED  AVAIL  REFER  MOUNTPOINT
rpool             1.76T     0B  1.75T  /rpool
rpool/ROOT        2.36G     0B    96K  /rpool/ROOT
rpool/ROOT/pbs-1  2.36G     0B  2.36G  /
 
Thank you.

On step 1, when trying to set the read-only maintenance mode through the GUI on the datastore, I get the following error:

mkstemp "/etc/proxmox-backup/datastore.tmp_XXXXXX" failed: ENOSPC: No space left on device (400)

Is there any workaround to accomplish this first step or should I forget it and go on for the second step?
 
The PBS has a sync job that worries me. I mean it will daily connect to another PBS to pull data. I cannot remove that sync job from the GUI (I get the no-space-left error). Is there any way to somehow disable it (or kill it) through commands?

I guess that step 1 would make it.
 
By the way. I did step 2. I am waiting to pass those 24 hours (but I'm afraid that sync job will spoil the party).
 
No difference. I guess I'll have to go there physically and reinstall everything from scratch.

And then configure a quota.

Is it in the roadmap having a default quota enabled or any other mechanism to avoid this? If not, where should I register an issue, suggestion or something similar?

Thanks
 
The procedure I described did work 100% of the times I've had to use it. I would need a lot more detail on server storage config, the exact actions taken, logs of the GC task to provide any other hint about what could be going on here.

To disable the sync job you can edit / delete /etc/proxmox-backup/sync.cfg. If the datastore is on the same drives as the OS, you can try to vacuum journald logs to recover some space too journalctl --vacuum-time=1d.

There have been some bug reported regarding this [1], but for now the reply has been something like "the admin should take care of not letting the drives to get too full by some other means", which I fully agree but given how hard it can get to recover from a full disk without data loss, PBS could really provide some mechanism to automatically reserve some space and deny writes if some level is reached.

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=5376
 
  • Like
Reactions: bea and carles89
Yeeeees, it does work! Thank you!!
Or at least that's what it seems now.
I did journalctl --vacuum-time=1h
I moved more than 2GB to an external drive.
I renamed /etc/proxmox-backup/sync.cfg
(which had a line saying schedule *:0/30, which I think it means every 30min!)
I did not know what service I had to restart for the sync config change so I did a reboot.
And now the GC is running! It is finding those expected errors you mentioned.
So things seem to go now in the right direction.
I will wait and proceed with the rest of the steps and then configure the quota.
Thank you again!
 
Last edited:
  • Like
Reactions: tcabernoch