Backup datastores

Dennis Ayala

Well-Known Member
Jul 20, 2018
43
4
48
53
I am having an issue with backups and restores on multiple datastores with different Backup Servers. The only common thing is that backup servers are version 2.3-3.

Ever since I either upgraded to 2.3-3 or simply installed a new server on 2.3-3 my backups aren't reliable. For example, I just backed up this VM on a fairly new datastore:

Code:
NFO: starting new backup job: vzdump 1003 --mode snapshot --node sjuppmprx03p --storage PBSV1-TEMP --remove 0
INFO: Starting Backup of VM 1003 (qemu)
INFO: Backup started at 2023-02-28 23:39:58
INFO: status = running
INFO: VM Name: PROPHET
INFO: include disk 'ide0' 'IBM-SAS-900G-PROD-01:vm-1003-disk-1' 139396M
INFO: include disk 'ide1' 'IBM-SAS-900G-PROD-01:vm-1003-disk-0' 597408M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/1003/2023-03-01T03:39:58Z'
INFO: skipping guest-agent 'fs-freeze', agent configured but not running?
INFO: started backup task '030e7eb9-aaeb-4595-be3c-9b443702f6e2'
INFO: resuming VM again
INFO: ide0: dirty-bitmap status: created new
INFO: ide1: dirty-bitmap status: created new
INFO:   0% (344.0 MiB of 719.5 GiB) in 3s, read: 114.7 MiB/s, write: 86.7 MiB/s
INFO:   1% (7.3 GiB of 719.5 GiB) in 1m 13s, read: 101.5 MiB/s, write: 101.3 MiB/s
INFO:   2% (14.5 GiB of 719.5 GiB) in 2m 28s, read: 98.4 MiB/s, write: 96.1 MiB/s
INFO:   3% (21.6 GiB of 719.5 GiB) in 3m 36s, read: 107.3 MiB/s, write: 107.3 MiB/s
INFO:   4% (28.8 GiB of 719.5 GiB) in 4m 47s, read: 104.5 MiB/s, write: 104.5 MiB/s
INFO:   5% (36.0 GiB of 719.5 GiB) in 5m 47s, read: 122.2 MiB/s, write: 122.2 MiB/s
INFO:   6% (43.3 GiB of 719.5 GiB) in 6m 46s, read: 126.0 MiB/s, write: 126.0 MiB/s
INFO:   7% (50.5 GiB of 719.5 GiB) in 7m 44s, read: 126.8 MiB/s, write: 126.8 MiB/s
INFO:   8% (57.6 GiB of 719.5 GiB) in 8m 42s, read: 126.1 MiB/s, write: 126.1 MiB/s
INFO:   9% (64.9 GiB of 719.5 GiB) in 9m 42s, read: 124.1 MiB/s, write: 124.1 MiB/s
INFO:  10% (72.0 GiB of 719.5 GiB) in 10m 40s, read: 126.1 MiB/s, write: 126.1 MiB/s
INFO:  11% (79.2 GiB of 719.5 GiB) in 11m 38s, read: 127.4 MiB/s, write: 127.4 MiB/s
INFO:  12% (86.4 GiB of 719.5 GiB) in 12m 36s, read: 126.9 MiB/s, write: 126.9 MiB/s
INFO:  13% (93.6 GiB of 719.5 GiB) in 13m 34s, read: 126.5 MiB/s, write: 126.5 MiB/s
INFO:  14% (100.8 GiB of 719.5 GiB) in 14m 39s, read: 113.5 MiB/s, write: 113.5 MiB/s
INFO:  15% (107.9 GiB of 719.5 GiB) in 15m 41s, read: 118.1 MiB/s, write: 118.1 MiB/s
INFO:  16% (115.2 GiB of 719.5 GiB) in 16m 39s, read: 127.5 MiB/s, write: 127.5 MiB/s
INFO:  17% (122.4 GiB of 719.5 GiB) in 17m 37s, read: 127.2 MiB/s, write: 127.2 MiB/s
INFO:  18% (129.6 GiB of 719.5 GiB) in 18m 35s, read: 127.0 MiB/s, write: 127.0 MiB/s
INFO:  19% (136.8 GiB of 719.5 GiB) in 19m 34s, read: 126.2 MiB/s, write: 126.2 MiB/s
INFO:  20% (144.0 GiB of 719.5 GiB) in 20m 36s, read: 117.9 MiB/s, write: 117.9 MiB/s
INFO:  21% (151.2 GiB of 719.5 GiB) in 21m 49s, read: 101.1 MiB/s, write: 101.1 MiB/s
INFO:  22% (158.4 GiB of 719.5 GiB) in 23m 15s, read: 85.8 MiB/s, write: 85.8 MiB/s
INFO:  23% (165.6 GiB of 719.5 GiB) in 24m 45s, read: 81.7 MiB/s, write: 81.7 MiB/s
INFO:  24% (172.7 GiB of 719.5 GiB) in 26m 21s, read: 76.5 MiB/s, write: 76.5 MiB/s
INFO:  25% (180.0 GiB of 719.5 GiB) in 27m 43s, read: 90.6 MiB/s, write: 90.6 MiB/s
INFO:  26% (187.2 GiB of 719.5 GiB) in 28m 47s, read: 114.9 MiB/s, write: 114.9 MiB/s
INFO:  27% (194.4 GiB of 719.5 GiB) in 29m 53s, read: 111.7 MiB/s, write: 111.7 MiB/s
INFO:  28% (201.6 GiB of 719.5 GiB) in 30m 53s, read: 122.9 MiB/s, write: 122.9 MiB/s
INFO:  29% (208.7 GiB of 719.5 GiB) in 31m 51s, read: 125.5 MiB/s, write: 125.5 MiB/s
INFO:  30% (215.9 GiB of 719.5 GiB) in 33m 1s, read: 105.5 MiB/s, write: 105.5 MiB/s
INFO:  31% (223.1 GiB of 719.5 GiB) in 34m 11s, read: 105.4 MiB/s, write: 105.4 MiB/s
INFO:  32% (230.3 GiB of 719.5 GiB) in 35m 10s, read: 125.4 MiB/s, write: 125.4 MiB/s
INFO:  33% (237.5 GiB of 719.5 GiB) in 36m 7s, read: 129.0 MiB/s, write: 129.0 MiB/s
INFO:  34% (244.7 GiB of 719.5 GiB) in 37m 31s, read: 87.4 MiB/s, write: 87.4 MiB/s
INFO:  35% (251.9 GiB of 719.5 GiB) in 38m 31s, read: 122.8 MiB/s, write: 122.8 MiB/s
INFO:  36% (259.1 GiB of 719.5 GiB) in 40m, read: 83.3 MiB/s, write: 83.3 MiB/s
INFO:  37% (266.3 GiB of 719.5 GiB) in 41m 15s, read: 98.1 MiB/s, write: 98.1 MiB/s
INFO:  38% (273.5 GiB of 719.5 GiB) in 42m 14s, read: 124.7 MiB/s, write: 124.7 MiB/s
INFO:  39% (280.7 GiB of 719.5 GiB) in 43m 29s, read: 99.0 MiB/s, write: 99.0 MiB/s
INFO:  40% (287.9 GiB of 719.5 GiB) in 44m 33s, read: 114.2 MiB/s, write: 114.2 MiB/s
INFO:  41% (295.1 GiB of 719.5 GiB) in 45m 46s, read: 101.0 MiB/s, write: 101.0 MiB/s
INFO:  42% (302.3 GiB of 719.5 GiB) in 47m 2s, read: 97.7 MiB/s, write: 97.7 MiB/s
INFO:  43% (309.5 GiB of 719.5 GiB) in 48m, read: 126.7 MiB/s, write: 126.7 MiB/s
INFO:  44% (316.6 GiB of 719.5 GiB) in 48m 57s, read: 127.6 MiB/s, write: 127.6 MiB/s
INFO:  45% (323.8 GiB of 719.5 GiB) in 49m 56s, read: 125.2 MiB/s, write: 125.2 MiB/s
INFO:  46% (331.1 GiB of 719.5 GiB) in 50m 58s, read: 119.7 MiB/s, write: 119.7 MiB/s
INFO:  47% (338.2 GiB of 719.5 GiB) in 52m 7s, read: 105.6 MiB/s, write: 104.9 MiB/s
INFO:  48% (345.4 GiB of 719.5 GiB) in 53m 17s, read: 105.6 MiB/s, write: 105.6 MiB/s
INFO:  49% (352.6 GiB of 719.5 GiB) in 54m 31s, read: 99.4 MiB/s, write: 99.2 MiB/s
INFO:  50% (359.8 GiB of 719.5 GiB) in 55m 36s, read: 114.1 MiB/s, write: 102.6 MiB/s
INFO:  51% (367.1 GiB of 719.5 GiB) in 56m 35s, read: 125.6 MiB/s, write: 125.6 MiB/s
INFO:  52% (374.2 GiB of 719.5 GiB) in 57m 41s, read: 110.2 MiB/s, write: 110.2 MiB/s
INFO:  53% (381.4 GiB of 719.5 GiB) in 58m 50s, read: 107.5 MiB/s, write: 107.5 MiB/s
INFO:  54% (388.6 GiB of 719.5 GiB) in 59m 57s, read: 109.9 MiB/s, write: 109.9 MiB/s
INFO:  55% (395.8 GiB of 719.5 GiB) in 1h 1m 27s, read: 82.4 MiB/s, write: 82.4 MiB/s
INFO:  56% (403.0 GiB of 719.5 GiB) in 1h 2m 39s, read: 101.7 MiB/s, write: 101.7 MiB/s
INFO:  57% (410.2 GiB of 719.5 GiB) in 1h 3m 47s, read: 108.6 MiB/s, write: 108.6 MiB/s
INFO:  58% (417.4 GiB of 719.5 GiB) in 1h 4m 56s, read: 106.1 MiB/s, write: 106.1 MiB/s
INFO:  59% (424.6 GiB of 719.5 GiB) in 1h 6m 9s, read: 101.9 MiB/s, write: 101.9 MiB/s
INFO:  60% (431.8 GiB of 719.5 GiB) in 1h 7m 22s, read: 100.7 MiB/s, write: 100.7 MiB/s
INFO:  61% (438.9 GiB of 719.5 GiB) in 1h 8m 38s, read: 96.1 MiB/s, write: 96.1 MiB/s
INFO:  62% (446.2 GiB of 719.5 GiB) in 1h 9m 41s, read: 118.3 MiB/s, write: 118.3 MiB/s
INFO:  63% (453.4 GiB of 719.5 GiB) in 1h 10m 43s, read: 118.8 MiB/s, write: 118.8 MiB/s
INFO:  64% (460.6 GiB of 719.5 GiB) in 1h 11m 43s, read: 122.4 MiB/s, write: 122.4 MiB/s
INFO:  65% (467.8 GiB of 719.5 GiB) in 1h 12m 43s, read: 123.8 MiB/s, write: 123.8 MiB/s
INFO:  66% (474.9 GiB of 719.5 GiB) in 1h 13m 41s, read: 125.0 MiB/s, write: 125.0 MiB/s
INFO:  67% (482.1 GiB of 719.5 GiB) in 1h 14m 44s, read: 117.4 MiB/s, write: 117.4 MiB/s
INFO:  68% (489.3 GiB of 719.5 GiB) in 1h 15m 43s, read: 124.7 MiB/s, write: 124.7 MiB/s
INFO:  69% (496.5 GiB of 719.5 GiB) in 1h 16m 42s, read: 125.3 MiB/s, write: 125.3 MiB/s
INFO:  70% (503.7 GiB of 719.5 GiB) in 1h 17m 41s, read: 124.9 MiB/s, write: 124.9 MiB/s
INFO:  71% (510.9 GiB of 719.5 GiB) in 1h 18m 42s, read: 120.0 MiB/s, write: 120.0 MiB/s
INFO:  72% (518.1 GiB of 719.5 GiB) in 1h 19m 42s, read: 124.1 MiB/s, write: 124.1 MiB/s
INFO:  73% (525.3 GiB of 719.5 GiB) in 1h 21m 52s, read: 56.6 MiB/s, write: 56.6 MiB/s
INFO:  74% (532.5 GiB of 719.5 GiB) in 1h 23m 46s, read: 64.2 MiB/s, write: 64.2 MiB/s
INFO:  75% (539.7 GiB of 719.5 GiB) in 1h 25m 37s, read: 66.2 MiB/s, write: 66.2 MiB/s
INFO:  76% (547.0 GiB of 719.5 GiB) in 1h 26m 54s, read: 97.1 MiB/s, write: 97.1 MiB/s
INFO:  77% (554.1 GiB of 719.5 GiB) in 1h 28m, read: 110.1 MiB/s, write: 110.1 MiB/s
INFO:  78% (561.3 GiB of 719.5 GiB) in 1h 29m, read: 123.4 MiB/s, write: 92.3 MiB/s
INFO:  79% (568.5 GiB of 719.5 GiB) in 1h 30m 16s, read: 97.2 MiB/s, write: 97.2 MiB/s
INFO:  80% (575.7 GiB of 719.5 GiB) in 1h 31m 35s, read: 92.6 MiB/s, write: 92.6 MiB/s
INFO:  81% (582.9 GiB of 719.5 GiB) in 1h 32m 52s, read: 95.9 MiB/s, write: 95.9 MiB/s
INFO:  82% (590.1 GiB of 719.5 GiB) in 1h 34m 30s, read: 75.4 MiB/s, write: 75.3 MiB/s
INFO:  83% (597.3 GiB of 719.5 GiB) in 1h 36m 1s, read: 81.1 MiB/s, write: 81.1 MiB/s
INFO:  84% (604.4 GiB of 719.5 GiB) in 1h 36m 57s, read: 130.4 MiB/s, write: 65.1 MiB/s
INFO:  85% (611.6 GiB of 719.5 GiB) in 1h 38m 43s, read: 69.4 MiB/s, write: 69.1 MiB/s
INFO:  86% (618.9 GiB of 719.5 GiB) in 1h 40m 19s, read: 77.7 MiB/s, write: 76.8 MiB/s
INFO:  87% (626.0 GiB of 719.5 GiB) in 1h 41m 43s, read: 86.9 MiB/s, write: 86.3 MiB/s
INFO:  88% (633.2 GiB of 719.5 GiB) in 1h 43m 20s, read: 75.8 MiB/s, write: 75.4 MiB/s
INFO:  89% (640.5 GiB of 719.5 GiB) in 1h 44m 54s, read: 79.2 MiB/s, write: 78.1 MiB/s
INFO:  90% (647.6 GiB of 719.5 GiB) in 1h 46m 18s, read: 87.2 MiB/s, write: 86.0 MiB/s
INFO:  91% (654.8 GiB of 719.5 GiB) in 1h 47m 37s, read: 93.1 MiB/s, write: 85.0 MiB/s
INFO:  92% (662.1 GiB of 719.5 GiB) in 1h 48m 41s, read: 117.0 MiB/s, write: 82.1 MiB/s
INFO:  93% (669.2 GiB of 719.5 GiB) in 1h 49m 39s, read: 124.7 MiB/s, write: 90.6 MiB/s
INFO:  94% (676.4 GiB of 719.5 GiB) in 1h 50m 57s, read: 95.4 MiB/s, write: 83.2 MiB/s
INFO:  95% (683.6 GiB of 719.5 GiB) in 1h 52m 20s, read: 88.0 MiB/s, write: 85.6 MiB/s
INFO:  96% (690.8 GiB of 719.5 GiB) in 1h 53m 39s, read: 94.1 MiB/s, write: 94.1 MiB/s
INFO:  97% (698.0 GiB of 719.5 GiB) in 1h 54m 54s, read: 98.3 MiB/s, write: 98.3 MiB/s
INFO:  98% (705.2 GiB of 719.5 GiB) in 1h 56m 9s, read: 97.7 MiB/s, write: 97.7 MiB/s
INFO:  99% (712.4 GiB of 719.5 GiB) in 1h 57m 25s, read: 96.7 MiB/s, write: 96.7 MiB/s
INFO: 100% (719.5 GiB of 719.5 GiB) in 1h 58m 12s, read: 156.3 MiB/s, write: 67.7 MiB/s
INFO: backup is sparse: 16.79 GiB (2%) total zero data
INFO: backup was done incrementally, reused 16.79 GiB (2%)
INFO: transferred 719.54 GiB in 7092 seconds (103.9 MiB/s)
INFO: Finished Backup of VM 1003 (01:58:18)
INFO: Backup finished at 2023-03-01 01:38:16
INFO: Backup job finished successfully
TASK OK

Immediately I attempted to restore it:
Code:
Wiping dos signature on /dev/IBM-SAS-900G-PROD-01/vm-8401003-disk-0.
  Wiping atari signature on /dev/IBM-SAS-900G-PROD-01/vm-8401003-disk-0.
  Logical volume "vm-8401003-disk-0" created.
new volume ID is 'IBM-SAS-900G-PROD-01:vm-8401003-disk-0'
  Logical volume "vm-8401003-disk-1" created.
new volume ID is 'IBM-SAS-900G-PROD-01:vm-8401003-disk-1'
restore proxmox backup image: /usr/bin/pbs-restore --repository root@pam@10.101.0.110:PBSV1-TEMP vm/1003/2023-03-01T03:39:58Z drive-ide0.img.fidx /dev/IBM-SAS-900G-PROD-01/vm-8401003-disk-0 --verbose --format raw
connecting to repository 'root@pam@10.101.0.110:PBSV1-TEMP'
open block backend for target '/dev/IBM-SAS-900G-PROD-01/vm-8401003-disk-0'
starting to restore snapshot 'vm/1003/2023-03-01T03:39:58Z'
download and verify backup index
restore failed: reading file "/mnt/datastores/PBSV1-TEMP/.chunks/9108/9108c1e67cc52a5967809dc03c09e7dbbb461be040397725b1099f66a1ded2fd" failed: No such file or directory (os error 2)
  Logical volume "vm-8401003-disk-0" successfully removed
temporary volume 'IBM-SAS-900G-PROD-01:vm-8401003-disk-0' sucessfuly removed
  Logical volume "vm-8401003-disk-1" successfully removed
temporary volume 'IBM-SAS-900G-PROD-01:vm-8401003-disk-1' sucessfuly removed
error before or during data restore, some or all disks were not completely restored. VM 8401003 state is NOT cleaned up.
TASK ERROR: command '/usr/bin/pbs-restore --repository root@pam@10.101.0.110:PBSV1-TEMP vm/1003/2023-03-01T03:39:58Z drive-ide0.img.fidx /dev/IBM-SAS-900G-PROD-01/vm-8401003-disk-0 --verbose --format raw' failed: exit code 255

All my datastores are having the same issue with .chunks not found or directory os errors ever since I upgraded or installed 2.3-3. My environment had been working without issues on backup Server 1.x

Any suggestions as to what might be causing this issue?

Regards,

Dennis
 
what kind of setup are you using on the PBS side? is it running bare metal or in a VM? what kind of storage? any crashes in recent times that might correspond to the corrupt backup?

https://pbs.proxmox.com/docs/storage.html#tuning might be worth a shot as well.
 
Fabian,

Backup servers are VMs inside Proxmox physical hosts. Datastore storage are NFS volumes from 3 different physical storage servers (2 QNAPs, 1 Supermicro).

As I mentioned, I have 3 datastores created recently and all of them have this issue. The first 24 hours after creating the datastores, I was able to restore from a couple of backups.

Now thinking about it, my datastores have a GC policy running every day at 00:00. Issue started after GC ran.

Dennis
 
some things that could cause such behaviour:
- cache inconsistency on the storage layer somewhere (PBS think it has written the chunks, later something crashes/has an outage and the writes are not actually persisted). the combination of PBS as VM and NFS could cause this if something is misconfigured somehwere or your VM crashes.
- time jumps of the PBS system causing the safeguards in GC to fail and remove a chunk that was just uploaded

if you have the resources, I would suggest trying the same load on a non-NFS-backed datastore with local disks.

also check the system logs of both your storage servers and the PBS and PVE systems for the time period starting when the first snapshot referencing a missing chunk was started. a verify should tell you which snapshot was the last "good" and first "bad" for a given chunk, but note that the actual real first backup snapshot could have been in-between and pruned in the meantime, so when in doubt, please look at the logs starting with the last "good", not the first "bad" snapshot time.
 
Fabian,

Thank you for your help. Unfortunately I dont have resources to provide PBS local disks of such size.

Also, it was working perfectly for over two years while PBS was version 1.x, this justbstarted after upgtading PBS to 2.x

I will follow your test suggestions and also will create a new PBS with v1.x and a brand new NFS volume. Will use that scenario only (will disable backups from PBS v2) and will see how it goes.

Will post findings later.

Regards,

Dennis
 
An update...

I created a new PBS server with version 1.1-1 and brand new datastore.

I have been backing up to this datastore for two days now, no issues at all.

I have been able to restore everything that I have backed up so far.

Garbage collection has been running daily as usual, no issues.
 
possibly some changed NFS default settings between the two underlying Debian releases? if anything, PBS 2.x should be more consistent since it allows controlling how chunks are synced..
 
possibly some changed NFS default settings between the two underlying Debian releases? if anything, PBS 2.x should be more consistent since it allows controlling how chunks are synced..
I honestly don' know but its been four days since I created the 1.1 PBS and everything is working fine.
Definitely a 2.x issue.

Dennis
 
well, there's lots of people running 2.x without experiencing that issue, so I still think it's somehow related to your particular setup (and most likely, some interaction with NFS and caching, but that would require further experiments to find out).
 
well, there's lots of people running 2.x without experiencing that issue, so I still think it's somehow related to your particular setup (and most likely, some interaction with NFS and caching, but that would require further experiments to find out).
I understand.

In more detail, my setup is a PVE cluster running inside an IBM Blade with IBM fibre channel storage.

For backup we use QNAP NAS storage, running latest version of QTS operating system.

Nothing out of the ordinary. Is there anyone here using PBS (virtually) and QNAP as NFS storage?

Dennis
 
it seems to me like most people here are using TrueNAS (if they are using NFS/CIFS), and mostly those run into permission related issues not missing chunks..
 
it seems to me like most people here are using TrueNAS (if they are using NFS/CIFS), and mostly those run into permission related issues not missing chunks..
In my case there are no permission issues. I also have TrueNAS storage but not connected to PBS 2.x
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!