When a PBS storage is created on a CIFS mount it will be impossible to delete backups or prune. PBS is installed on a separate host from PVE, the versions in use are:
I created a CIFS systemd automount:
Initializing the storage takes about 10 minutes, results:
I add PBS as storage to PVE and run a test backup:
Now, when trying to remove a backup from either PVE or PBS, the following critical issue happens:
The following is logged on PBS:
Inspecting the directory /mnt/storage/vm/114/2020-12-03T16:57:39Z shows that the directory is now actually empty:
I would be able to rmdir the directory, which verifies there are no locks that would keep the directory from being deleted:
While this removes the backup from both the PBS and PVE GUI, this would lead to a fatal inconsistent state of the entire backup group, because any further backups would still be created incrementally.
Now after this, if trying to run a garbage collection, this will lead to another error. I had to reboot PBS to recover from this lock, remounting the CIFS storage didn't help:
Running a successful garbage collection after PBS reboot would still NOT fix the state of having broken backups for any VM where a prune/delete was attempted, as any further attempts will result in incremental backups being created eventhough there doesn't even exist a full backup. It looks like GC is not really deleting anything:
Pending removals: 969.33 MiB. So why not removing it? How do I actually recover from this situation?
CIFS is the only option for this storage. Not being able to delete/prune on CIFS mounts is a critical bug which leads to the point that we will be unable to use this product in our environment.
Please advise.
Code:
PBS Version:
proxmox-backup: 1.0-4 (running kernel: 5.4.78-1-pve)
proxmox-backup-server: 1.0.5-1 (running version: 1.0.5)
pve-kernel-5.4: 6.3-2
pve-kernel-helper: 6.3-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
ifupdown2: not correctly installed
libjs-extjs: 6.0.1-10
proxmox-backup-docs: 1.0.4-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-xtermjs: 4.7.0-3 smartmontools: 7.1-
pve2 zfsutils-linux: 0.8.5-pve1
PVE Version:
proxmox-ve: 6.3-1 (running kernel: 5.4.78-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-2
pve-kernel-helper: 6.3-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve3
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-1
libpve-common-perl: 6.3-1
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-2
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4 lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3 novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1 vncterm: 1.6-2
I created a CIFS systemd automount:
Code:
# cat /etc/systemd/system/mnt-storage.mount
[Unit]
Description=CIFS mount from Hetzner Storagebox
[Mount]
What=//1.2.3.4/backup/pbs/
Where=/mnt/storage
Options=username=u123456,password=123456,rw,uid=34,noforceuid,gid=34,noforcegid
Type=cifs
[Install]
WantedBy=multi-user.target
Code:
# cat mnt-storage.automount
[Unit]
Description=Automount /mnt/storage
After=network-online.target
Wants=network-online.target
[Automount]
Where=/mnt/storage
TimeoutIdleSec=10min
[Install]
WantedBy=multi-user.target
Initializing the storage takes about 10 minutes, results:
Code:
# proxmox-backup-manager datastore show storage
┌────────────────┬──────────────┐
│ Name │ Value │
╞════════════════╪══════════════╡
│ name │ storage │
├────────────────┼──────────────┤
│ path │ /mnt/storage │
├────────────────┼──────────────┤
│ comment │ │
├────────────────┼──────────────┤
│ gc-schedule │ 2:30 │
├────────────────┼──────────────┤
│ keep-daily │ 7 │
├────────────────┼──────────────┤
│ keep-monthly │ 6 │
├────────────────┼──────────────┤
│ keep-weekly │ 4 │
├────────────────┼──────────────┤
│ prune-schedule │ 1:30 │
└────────────────┴──────────────┘
I add PBS as storage to PVE and run a test backup:
Code:
INFO: starting new backup job: vzdump 114 --node test --storage pbs --mode snapshot --remove 0
INFO: Starting Backup of VM 114 (qemu)
INFO: Backup started at 2020-12-03 17:57:39
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: meet
INFO: include disk 'scsi0' 'thin:vm-114-disk-0' 5G
INFO: creating Proxmox Backup Server archive 'vm/114/2020-12-03T16:57:39Z'
INFO: starting kvm to execute backup task
INFO: enabling encryption
INFO: started backup task 'bc9c9871-460b-462b-9181-c1de9f63acbf'
INFO: scsi0: dirty-bitmap status: created new
INFO: 20% (1.0 GiB of 5.0 GiB) in 3s, read: 341.3 MiB/s, write: 176.0 MiB/s
INFO: 21% (1.1 GiB of 5.0 GiB) in 6s, read: 26.7 MiB/s, write: 25.3 MiB/s
<...>
INFO: 97% (4.9 GiB of 5.0 GiB) in 2m 7s, read: 26.7 MiB/s, write: 25.3 MiB/s
INFO: 99% (5.0 GiB of 5.0 GiB) in 2m 10s, read: 32.0 MiB/s, write: 32.0 MiB/s
INFO: 100% (5.0 GiB of 5.0 GiB) in 2m 13s, read: 14.7 MiB/s, write: 14.7 MiB/s
INFO: backup is sparse: 1.14 GiB (22%) total zero data
INFO: backup was done incrementally, reused 1.14 GiB (22%)
INFO: transferred 5.00 GiB in 143 seconds (35.8 MiB/s)
INFO: stopping kvm after backup task
INFO: Finished Backup of VM 114 (00:02:25)
INFO: Backup finished at 2020-12-03 18:00:04
INFO: Backup job finished successfully
TASK OK
Now, when trying to remove a backup from either PVE or PBS, the following critical issue happens:
Code:
proxmox-backup-client failed: Error: removing backup snapshot "/mnt/storage/vm/114/2020-12-03T16:57:39Z" failed - Directory not empty (os error 39) at /usr/share/perl5/PVE/API2/Storage/Content.pm line 458. (500)
The following is logged on PBS:
Code:
Dec 3 18:26:47 pbs proxmox-backup-proxy[746]: DELETE /api2/json/admin/datastore/storage/snapshots?backup-id=114&backup-time=1607014659&backup-type=vm: 400 Bad Request: [client [::ffff:1.2.3.5]:46804] removing backup snapshot "/mnt/storage/vm/114/2020-12-03T16:57:39Z" failed - Directory not empty (os error 39)
Dec 3 18:26:48 pbs proxmox-backup-proxy[746]: error during snapshot file listing: 'unable to load blob '"/mnt/storage/vm/114/2020-12-03T16:57:39Z/index.json.blob"' - No such file or directory (os error 2)'
Inspecting the directory /mnt/storage/vm/114/2020-12-03T16:57:39Z shows that the directory is now actually empty:
Code:
# cd /mnt/storage/vm/114/2020-12-03T16:57:39Z/
# ls -al
total 0
drwxr-xr-x 2 backup backup 0 Dec 3 18:26 .
drwxr-xr-x 2 backup backup 0 Dec 3 17:57 ..
I would be able to rmdir the directory, which verifies there are no locks that would keep the directory from being deleted:
Code:
# rmdir /mnt/storage/vm/114/2020-12-03T16:57:39Z/
#
While this removes the backup from both the PBS and PVE GUI, this would lead to a fatal inconsistent state of the entire backup group, because any further backups would still be created incrementally.
Code:
INFO: starting new backup job: vzdump 114 --storage pbs --node test --mode snapshot --remove 0
INFO: Starting Backup of VM 114 (qemu)
INFO: Backup started at 2020-12-03 18:36:26
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: meet
INFO: include disk 'scsi0' 'thin:vm-114-disk-0' 5G
INFO: creating Proxmox Backup Server archive 'vm/114/2020-12-03T17:36:26Z'
INFO: starting kvm to execute backup task
INFO: enabling encryption
INFO: started backup task '80c3acc3-59b2-48b0-ae71-c7907936211d'
INFO: scsi0: dirty-bitmap status: created new
INFO: 33% (1.7 GiB of 5.0 GiB) in 3s, read: 566.7 MiB/s, write: 373.3 MiB/s
INFO: 53% (2.7 GiB of 5.0 GiB) in 6s, read: 350.7 MiB/s, write: 324.0 MiB/s
INFO: 80% (4.0 GiB of 5.0 GiB) in 9s, read: 457.3 MiB/s, write: 340.0 MiB/s
INFO: 100% (5.0 GiB of 5.0 GiB) in 12s, read: 332.0 MiB/s, write: 278.7 MiB/s
INFO: backup is sparse: 1.14 GiB (22%) total zero data
INFO: backup was done incrementally, reused 1.14 GiB (22%)
INFO: transferred 5.00 GiB in 12 seconds (426.7 MiB/s)
INFO: stopping kvm after backup task
INFO: Finished Backup of VM 114 (00:00:14)
INFO: Backup finished at 2020-12-03 18:36:40
INFO: Backup job finished successfully
TASK OK
Now after this, if trying to run a garbage collection, this will lead to another error. I had to reboot PBS to recover from this lock, remounting the CIFS storage didn't help:
Code:
2020-12-03T18:45:08+01:00: starting garbage collection on store storage
2020-12-03T18:45:08+01:00: TASK ERROR: unable to get exclusive lock - EACCES: Permission denied
Running a successful garbage collection after PBS reboot would still NOT fix the state of having broken backups for any VM where a prune/delete was attempted, as any further attempts will result in incremental backups being created eventhough there doesn't even exist a full backup. It looks like GC is not really deleting anything:
Code:
2020-12-03T18:52:58+01:00: starting garbage collection on store storage
2020-12-03T18:52:58+01:00: Start GC phase1 (mark used chunks)
2020-12-03T18:52:58+01:00: Start GC phase2 (sweep unused chunks)
2020-12-03T18:53:01+01:00: percentage done: phase2 1% (processed 10 chunks)
2020-12-03T18:53:03+01:00: percentage done: phase2 2% (processed 21 chunks)
<...>
2020-12-03T18:56:43+01:00: percentage done: phase2 98% (processed 966 chunks)
2020-12-03T18:56:45+01:00: percentage done: phase2 99% (processed 976 chunks)
2020-12-03T18:56:47+01:00: Removed garbage: 0 B
2020-12-03T18:56:47+01:00: Removed chunks: 0
2020-12-03T18:56:47+01:00: Pending removals: 969.33 MiB (in 988 chunks)
2020-12-03T18:56:47+01:00: Original data usage: 0 B
2020-12-03T18:56:47+01:00: On-Disk chunks: 0
2020-12-03T18:56:47+01:00: Deduplication factor: 1.00
2020-12-03T18:56:47+01:00: TASK OK
Pending removals: 969.33 MiB. So why not removing it? How do I actually recover from this situation?
CIFS is the only option for this storage. Not being able to delete/prune on CIFS mounts is a critical bug which leads to the point that we will be unable to use this product in our environment.
Please advise.
Last edited: