[SOLVED] I'm losing backup chunks

lucius_the · Oct 7, 2024

Been using PBS on several installations without any issues so far. On one of my PBS installations I was having issues with spinning disks (using ZFS) so I went on to add a special device to the pool in order to speedup metadata processing (GC is metadata heavy).

To describe this better, this particular machine has PBS installed alongside with PVE. Storage is ZFS on 4 SAS spinning drives in RAID-10. This is the storage:

Bash:

# zpool status pool16
  pool: pool16
 state: ONLINE
  scan: resilvered 5.30M in 00:00:01 with 0 errors on Mon Sep 30 17:01:36 2024
config:

        NAME                                             STATE     READ WRITE CKSUM
        pool16                                           ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            scsi-35000cca2a0140e94                       ONLINE       0     0     0
            scsi-35000cca2a0140f48                       ONLINE       0     0     0
          mirror-1                                       ONLINE       0     0     0
            scsi-35000cca2a0141a30                       ONLINE       0     0     0
            scsi-35000cca2a014014c                       ONLINE       0     0     0
        special
          mirror-2                                       ONLINE       0     0     0
            ata-SSDSC2KG240G8R_PHYG9485038L240AGN-part1  ONLINE       0     0     0
            ata-SSDSC2KG240G8R_PHYG9485044L240AGN-part1  ONLINE       0     0     0

errors: No known data errors

And this is the ZFS dataset that I keep my PBS storage on:

Bash:

# zfs get all pool16/pbs-storage
NAME                PROPERTY              VALUE                  SOURCE
pool16/pbs-storage  type                  filesystem             -
pool16/pbs-storage  creation              Sat Nov 25 13:54 2023  -
pool16/pbs-storage  used                  806G                   -
pool16/pbs-storage  available             7.21T                  -
pool16/pbs-storage  referenced            806G                   -
pool16/pbs-storage  compressratio         1.20x                  -
pool16/pbs-storage  mounted               yes                    -
pool16/pbs-storage  quota                 none                   default
pool16/pbs-storage  reservation           none                   default
pool16/pbs-storage  recordsize            1M                     local
pool16/pbs-storage  mountpoint            /pool16/pbs-storage    default
pool16/pbs-storage  sharenfs              off                    default
pool16/pbs-storage  checksum              on                     default
pool16/pbs-storage  compression           lz4                    local
pool16/pbs-storage  atime                 on                     local
pool16/pbs-storage  devices               on                     default
pool16/pbs-storage  exec                  on                     default
pool16/pbs-storage  setuid                on                     default
pool16/pbs-storage  readonly              off                    default
pool16/pbs-storage  zoned                 off                    default
pool16/pbs-storage  snapdir               hidden                 default
pool16/pbs-storage  aclmode               discard                default
pool16/pbs-storage  aclinherit            restricted             default
pool16/pbs-storage  createtxg             17184                  -
pool16/pbs-storage  canmount              on                     default
pool16/pbs-storage  xattr                 sa                     local
pool16/pbs-storage  copies                1                      default
pool16/pbs-storage  version               5                      -
pool16/pbs-storage  utf8only              off                    -
pool16/pbs-storage  normalization         none                   -
pool16/pbs-storage  casesensitivity       sensitive              -
pool16/pbs-storage  vscan                 off                    default
pool16/pbs-storage  nbmand                off                    default
pool16/pbs-storage  sharesmb              off                    default
pool16/pbs-storage  refquota              8T                     local
pool16/pbs-storage  refreservation        none                   default
pool16/pbs-storage  guid                  13360060514386266996   -
pool16/pbs-storage  primarycache          metadata               local
pool16/pbs-storage  secondarycache        none                   inherited from pool16
pool16/pbs-storage  usedbysnapshots       0B                     -
pool16/pbs-storage  usedbydataset         806G                   -
pool16/pbs-storage  usedbychildren        0B                     -
pool16/pbs-storage  usedbyrefreservation  0B                     -
pool16/pbs-storage  logbias               latency                default
pool16/pbs-storage  objsetid              11253                  -
pool16/pbs-storage  dedup                 off                    default
pool16/pbs-storage  mlslabel              none                   default
pool16/pbs-storage  sync                  standard               default
pool16/pbs-storage  dnodesize             auto                   local
pool16/pbs-storage  refcompressratio      1.20x                  -
pool16/pbs-storage  written               806G                   -
pool16/pbs-storage  logicalused           972G                   -
pool16/pbs-storage  logicalreferenced     972G                   -
pool16/pbs-storage  volmode               default                default
pool16/pbs-storage  filesystem_limit      none                   default
pool16/pbs-storage  snapshot_limit        none                   default
pool16/pbs-storage  filesystem_count      none                   default
pool16/pbs-storage  snapshot_count        none                   default
pool16/pbs-storage  snapdev               hidden                 default
pool16/pbs-storage  acltype               off                    default
pool16/pbs-storage  context               none                   default
pool16/pbs-storage  fscontext             none                   default
pool16/pbs-storage  defcontext            none                   default
pool16/pbs-storage  rootcontext           none                   default
pool16/pbs-storage  relatime              on                     local
pool16/pbs-storage  redundant_metadata    most                   local
pool16/pbs-storage  overlay               on                     default
pool16/pbs-storage  encryption            off                    default
pool16/pbs-storage  keylocation           none                   default
pool16/pbs-storage  keyformat             none                   default
pool16/pbs-storage  pbkdf2iters           0                      default
pool16/pbs-storage  special_small_blocks  0                      default
pool16/pbs-storage  prefetch              all                    default

This ZFS pool didn't have a special device until last week. I added two Intel DC series SSDs and added a special device as a mirror.
Then I went on to do... well, I used a script to "rebalance" the contents of PBS dataset in order to rewrite them to storage so that metadata finished on SSD-s. I made sure proxmox-backup.service, proxmox-backup-proxy.service and proxmox-backup-daily-update.service were stopped before I started the script. I should have probably stopped cron as well, but didn't. As far as I could tell, all went fine, although it took a VERY long time. My SSD-s were filled with metadata, zpool status was good. They I started those services back.

Backups were all there, verifying was going fine - all looked good. I made NO CHANGES to backup schedules, retention, sync jobs, pruning, GC, etc. I noticed GC was running MUCH faster than before. What took hours was now taking minutes.
I used this script to rebalance files in the dataset: https://github.com/markusressel/zfs-inplace-rebalancing because I already used this script in the past, albeit not on PBS datastore files, but on my file share dir and it did the job without any issues. Chunks are just files so... I used it here as well.

I then noticed that when verify jobs were running my SSD-s my PBS storage was getting a lot of writes. I figured that a verify job is updating something and came up with the idea that it was probably marking something inside the "ns" subdir inside PBS datastore. Because that's the only place things could be happening. In order to try to speed things up on my PBS I did the following. Since I didn't use the whole SSD space for the special device (I paritioned it and gave the partition to ZFS) I now made another small partition (10 GB) and made another zpool mirror on it. This one:

Bash:

# zpool status small_sdd_mirror
  pool: small_sdd_mirror
 state: ONLINE
config:

        NAME                                             STATE     READ WRITE CKSUM
        small_sdd_mirror                                 ONLINE       0     0     0
          mirror-0                                       ONLINE       0     0     0
            ata-SSDSC2KG240G8R_PHYG9485038L240AGN-part9  ONLINE       0     0     0
            ata-SSDSC2KG240G8R_PHYG9485044L240AGN-part9  ONLINE       0     0     0

errors: No known data errors

And on top of it I created a new ZFS dataset that looks like this:

Bash:

# zfs get all small_sdd_mirror/pbs-ns
NAME                     PROPERTY              VALUE                     SOURCE
small_sdd_mirror/pbs-ns  type                  filesystem                -
small_sdd_mirror/pbs-ns  creation              Wed Oct  2 17:25 2024     -
small_sdd_mirror/pbs-ns  used                  11.2M                     -
small_sdd_mirror/pbs-ns  available             9.19G                     -
small_sdd_mirror/pbs-ns  referenced            11.2M                     -
small_sdd_mirror/pbs-ns  compressratio         1.71x                     -
small_sdd_mirror/pbs-ns  mounted               yes                       -
small_sdd_mirror/pbs-ns  quota                 none                      default
small_sdd_mirror/pbs-ns  reservation           none                      default
small_sdd_mirror/pbs-ns  recordsize            1M                        local
small_sdd_mirror/pbs-ns  mountpoint            /small_sdd_mirror/pbs-ns  default
small_sdd_mirror/pbs-ns  sharenfs              off                       default
small_sdd_mirror/pbs-ns  checksum              on                        default
small_sdd_mirror/pbs-ns  compression           on                        default
small_sdd_mirror/pbs-ns  atime                 on                        default
small_sdd_mirror/pbs-ns  devices               on                        default
small_sdd_mirror/pbs-ns  exec                  on                        default
small_sdd_mirror/pbs-ns  setuid                on                        default
small_sdd_mirror/pbs-ns  readonly              off                       default
small_sdd_mirror/pbs-ns  zoned                 off                       default
small_sdd_mirror/pbs-ns  snapdir               hidden                    default
small_sdd_mirror/pbs-ns  aclmode               discard                   default
small_sdd_mirror/pbs-ns  aclinherit            restricted                default
small_sdd_mirror/pbs-ns  createtxg             67                        -
small_sdd_mirror/pbs-ns  canmount              on                        default
small_sdd_mirror/pbs-ns  xattr                 on                        default
small_sdd_mirror/pbs-ns  copies                1                         default
small_sdd_mirror/pbs-ns  version               5                         -
small_sdd_mirror/pbs-ns  utf8only              off                       -
small_sdd_mirror/pbs-ns  normalization         none                      -
small_sdd_mirror/pbs-ns  casesensitivity       sensitive                 -
small_sdd_mirror/pbs-ns  vscan                 off                       default
small_sdd_mirror/pbs-ns  nbmand                off                       default
small_sdd_mirror/pbs-ns  sharesmb              off                       default
small_sdd_mirror/pbs-ns  refquota              none                      default
small_sdd_mirror/pbs-ns  refreservation        none                      default
small_sdd_mirror/pbs-ns  guid                  8725473865594968389       -
small_sdd_mirror/pbs-ns  primarycache          all                       default
small_sdd_mirror/pbs-ns  secondarycache        all                       default
small_sdd_mirror/pbs-ns  usedbysnapshots       0B                        -
small_sdd_mirror/pbs-ns  usedbydataset         11.2M                     -
small_sdd_mirror/pbs-ns  usedbychildren        0B                        -
small_sdd_mirror/pbs-ns  usedbyrefreservation  0B                        -
small_sdd_mirror/pbs-ns  logbias               latency                   default
small_sdd_mirror/pbs-ns  objsetid              643                       -
small_sdd_mirror/pbs-ns  dedup                 off                       default
small_sdd_mirror/pbs-ns  mlslabel              none                      default
small_sdd_mirror/pbs-ns  sync                  standard                  default
small_sdd_mirror/pbs-ns  dnodesize             legacy                    default
small_sdd_mirror/pbs-ns  refcompressratio      1.71x                     -
small_sdd_mirror/pbs-ns  written               11.2M                     -
small_sdd_mirror/pbs-ns  logicalused           17.9M                     -
small_sdd_mirror/pbs-ns  logicalreferenced     17.9M                     -
small_sdd_mirror/pbs-ns  volmode               default                   default
small_sdd_mirror/pbs-ns  filesystem_limit      none                      default
small_sdd_mirror/pbs-ns  snapshot_limit        none                      default
small_sdd_mirror/pbs-ns  filesystem_count      none                      default
small_sdd_mirror/pbs-ns  snapshot_count        none                      default
small_sdd_mirror/pbs-ns  snapdev               hidden                    default
small_sdd_mirror/pbs-ns  acltype               off                       default
small_sdd_mirror/pbs-ns  context               none                      default
small_sdd_mirror/pbs-ns  fscontext             none                      default
small_sdd_mirror/pbs-ns  defcontext            none                      default
small_sdd_mirror/pbs-ns  rootcontext           none                      default
small_sdd_mirror/pbs-ns  relatime              on                        default
small_sdd_mirror/pbs-ns  redundant_metadata    all                       default
small_sdd_mirror/pbs-ns  overlay               on                        default
small_sdd_mirror/pbs-ns  encryption            off                       default
small_sdd_mirror/pbs-ns  keylocation           none                      default
small_sdd_mirror/pbs-ns  keyformat             none                      default
small_sdd_mirror/pbs-ns  pbkdf2iters           0                         default
small_sdd_mirror/pbs-ns  special_small_blocks  0                         default
small_sdd_mirror/pbs-ns  prefetch              all                       default

I then:
- moved the contents of "ns" subdirectory from the spinning disks to this SSD pool
- deleted the ns subir
- and made a symlink from pbs-storage/ns subdirectory to point to this SSD dataset

lucius_the · Oct 7, 2024

I need to split the post in multiple messages, because there's a character limit to a single post. I'm continuing below:

So the pbs datastore looks like this now:

Bash:

#ll /pool16/pbs-storage
total 8.1M
drwxr-xr-x     5 root   root     8 2024-10-02 17:40 .
drwxr-xr-x     7 root   root     7 2024-01-15 04:18 ..
drwxr-x--- 65538 backup backup 65K 2023-11-23 21:31 .chunks
-rw-r--r--     1 backup backup 312 2023-11-25 02:27 .gc-status
drwxr-xr-x     3 backup backup   3 2023-08-11 00:00 host
-rw-r--r--     1 backup backup   0 2023-11-23 21:31 .lock
lrwxrwxrwx     1 backup backup  25 2024-10-02 17:40 ns -> /small_sdd_mirror/pbs-ns/
drwxr-xr-x     2 backup backup   2 2023-06-18 20:42 vm

Of course, I did that with proxmox backup services stopped. After doing this all was still good.
But then, few days after, my validation jobs started reporting 'missing chunks'. I found out that a GC job removed most of them. Why ? I have no idea.

Ok, I did use that "rebalance" script so... maybe something didn't go as expected. So I decided to delete all backups from this datastore and start fresh. I didn't remove the datastore itself, just removed all the backups it stored. Afterwards a full backup was done. And a verify. And all was fine.

Then today my GC job marked 859 GB of chunks (that's my yesterdays backup, that's practically everything that this dataset has in it) as "Pending removal". What "Pending removal" actually means is that PBS thinks these chunks are not needed.

Now... I have no idea why prune & GC are marking chunks as unneeded, because:
- these chunks contain my yesterdays backup
- my retention is set to 7d, 4w, 3m

Does anyone have an idea why is PBS marking them for pending removal ? And how do I fix this. I have:
- atime turned on on all PBS-used datasets
- prune jobs having a much longer retention than 1 day
- but prune and GC is marking chunks created less than 24h ago to be deleted

lucius_the · Oct 7, 2024

These are my prune job settings and my last GC log, attached.

lucius_the · Oct 7, 2024

Just to add something strange that I noticed in logs from GC operation:

Code:

2024-10-07T23:27:22+02:00: starting garbage collection on store backups-local
2024-10-07T23:27:22+02:00: Start GC phase1 (mark used chunks)
2024-10-07T23:27:22+02:00: Start GC phase2 (sweep unused chunks)
2024-10-07T23:27:23+02:00: processed 1% (3255 chunks)
2024-10-07T23:27:23+02:00: processed 2% (6502 chunks)
...

I just skips to GC phase 2 right away. GC phase 1, where it should be marking used chunks is not happening any more...
This is how it used to look like before I made the changes described in above posts:

Code:

2024-09-30T02:05:00+02:00: starting garbage collection on store backups-local
2024-09-30T02:05:00+02:00: task triggered by schedule '2:05'
2024-09-30T02:05:00+02:00: Start GC phase1 (mark used chunks)
2024-09-30T02:08:11+02:00: marked 1% (12 of 1135 index files)
2024-09-30T02:08:41+02:00: marked 2% (23 of 1135 index files)
2024-09-30T02:08:48+02:00: marked 3% (35 of 1135 index files)
...

And that phase 1 actually took 2 hours, before it went on to GC phase 2. Now phase 1 doesn't do anything. It fails to mark used chunks. I guess that's why phase2 marks everything as unused. But that shouldn't be happening... This is bad.
I hope someone has an idea why it's happening...

lucius_the · Oct 7, 2024

Oh, I just found this one: https://forum.proxmox.com/threads/garbage-collection-skipped-first-phase.135931/post-602983

...we iterate through the datastore looking for indices (the 'snapshots') which reference the chunks
during this iteration we do not resolve or follow symlinks and thus if the indices are behind one, we don't mark their chunks as in use

Well that explains it. I will move the "ns" directory back to the original storage and see if another GC sweep will get any smarter now. I'll report my results back to serve future forum searchers.

lucius_the · Oct 7, 2024

Ok... Well, another thing learned. DON'T use symlinks on your PBS store

Check.

Returning the ns subdir back to PBS main storage area made everything work again. I didn't loose my yesterdays backup because GC didn't actually remove chunks, it just marked them for future removal (Pending status) so when I ran the GC job again (after moving ns subfolder back) it re-marked them as used and now "Pending removal" is 0.

dcsapak · Oct 8, 2024

great you found the issue, but

lucius_the said:
I figured that a verify job is updating something and came up with the idea that it was probably marking something inside the "ns" subdir inside PBS datastore. Because that's the only place things could be happening.

is probably wrong. during gc the most writes happen to the metadata of the chunk files namely we update the access timestamp on them
moving the 'ns' dir out of the datastore would probably not have made any significant impact, since the index files are rather small and few in contrast to the actual chunk files in the .chunks folder

mow · Oct 9, 2024

lucius_the said:
Well that explains it. I will move the "ns" directory back to the original storage and see if another GC sweep will get any smarter now. I'll report my results back to serve future forum searchers.

You could have just mounted it there instead of symlinking ...

lucius_the · Oct 21, 2024

mow said:
You could have just mounted it there instead of symlinking ...

Yeah, I tried, but I'm using a ZFS dataset for PBS storage. And 'ns' was moved to another dataset. I couldn't mount another ZFS dataset inside an existing ZFS dataset directly - actually, I didn't even try to figure out if that was in some way possible, because a symlink came to mind as a simpler solution. Although a wrong one.

lucius_the · Oct 21, 2024

dcsapak said:
is probably wrong. during gc the most writes happen to the metadata of the chunk files namely we update the access timestamp on them
moving the 'ns' dir out of the datastore would probably not have made any significant impact, since the index files are rather small and few in contrast to the actual chunk files in the .chunks folder

Thank you.
Makes sense, although I think I was seeing updates going to HDD-s, not SSD-s, but maybe I'd have to double check that info...

Anyway, speed is somewhat satisfactory now so I'm probably not going to dwell more on this. When possible I'll get a better server for PBS with a newer CPU since I'm running PBS on very old hardware here - CPU doesn't have acceleration for crypto functions and becomes the bottleneck in the whole setup... instead of disks. So, more tuning on the storage side of things won't help. Special device helped a lot with GC, but now the curlpit seems to be CPU.

mow · Oct 24, 2024

lucius_the said:
I couldn't mount another ZFS dataset inside an existing ZFS dataset directly

legacy or zfs mountpoint?

[SOLVED] I'm losing backup chunks

lucius_the

Active Member

lucius_the

Active Member

lucius_the

Active Member

Attachments

lucius_the

Active Member

lucius_the

Active Member

lucius_the

Active Member

dcsapak

Proxmox Staff Member

mow

Active Member

lucius_the

Active Member

lucius_the

Active Member

mow

Active Member

We value your privacy