Misleading error "no previous backup found" when in reality there is an issue with the last backup state

audioPhil

New Member
Jul 29, 2020
6
0
1
44
I just debugged an error message that I had received for the last couple of days.
The regular daily backup job on our Promox kept reporting
Code:
The regular daily backup job on our Promox kept reporting "Backup of VM 103 failed - VM 103 qmp command 'backup' failed - backup register image failed: command error: no previous backup found, cannot do incremental backup" for one of our VMs, while all other VMs were backed up correctly.
for one of our VMs, while all other VMs were backed up correctly.
Before this message suddenly appeared (about a week ago), backups had worked correctly for many weeks.

I logged into the pbs dashboard and looked at the log message for the respective backup. It showed another error which was more accurate but did not really help me either:
Code:
2021-02-08T12:53:09+01:00: starting new backup on datastore 'REDACTED': "vm/103/2021-02-08T11:53:10Z"
2021-02-08T12:53:09+01:00: download 'index.json.blob' from previous backup.
2021-02-08T12:53:09+01:00: register chunks in 'drive-scsi0.img.fidx' from previous backup.
2021-02-08T12:53:09+01:00: download 'drive-scsi0.img.fidx' from previous backup.
2021-02-08T12:53:09+01:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-02-08T12:53:09+01:00: removing unfinished backup
2021-02-08T12:53:09+01:00: TASK ERROR: backup ended but finished flag is not set.

I then saw that the "Gabage collections" task had also generated errors which read
Code:
2021-02-08T00:00:00+01:00: starting garbage collection on store REDACTED
2021-02-08T00:00:00+01:00: task triggered by schedule 'daily'
2021-02-08T00:00:00+01:00: Start GC phase1 (mark used chunks)
2021-02-08T00:00:08+01:00: percentage done: phase1 2% (1 of 46 index files)
[...]
2021-02-08T00:00:59+01:00: TASK ERROR: can't open index /PATH/TO/DATASTORE/vm/103/2021-02-03T02:00:02Z/drive-scsi0.img.fidx - Invalid argument (os error 22)

From then on I at least knew where the problem was located. I have to admit that I neither tried to fix the backup from the third of February, nor did I try to find the root cause for this problem.
Instead I just deleted the incremental backup for this date and tried a manual backup of the VM. This time it worked like a charm.

I think the vzdump backup notifications created by PVE should probably be changed to more accurately reflect the real issue (the previous backup is not missing, but it is somehow "corrupted" or unusable).
 
The above-described issue just happened again for me. This time, vmid 104 was affected (the second vmid when ordering them numerically), while last time vmid 103 (the first one) had the issue.
However, as the sample size is still quite low, this could also be a coincidence.

Does anyone have an idea what the root cause could be? I just started a verification run for all snapshots to identify potential further issues.
 
Hi! We are receiving the same message in a few backups in one of our datastores:

Code:
2021-02-12T02:21:26-03:00: starting new backup on datastore '[removed]': "vm/723/2021-02-12T05:21:25Z"
2021-02-12T02:21:26-03:00: download 'index.json.blob' from previous backup.
2021-02-12T02:21:26-03:00: GET /previous: 400 Bad Request: Unable to open fixed index "/backup/[removed]/vm/723/2021-02-10T05:04:17Z/drive-scsi0.img.fidx" - got unknown magic number
2021-02-12T02:21:26-03:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-02-12T02:21:26-03:00: removing unfinished backup
2021-02-12T02:21:26-03:00: TASK ERROR: backup ended but finished flag is not set.

We will also run verifications for snapshots, but we are not aware about the root cause.

Thank you!
 
on what kind of storage/file systems are your datastores? any special mount options?

the index files are written to a temporary file and then atomically renamed into place..
 
Hi! We are receiving the same message in a few backups in one of our datastores:

Code:
2021-02-12T02:21:26-03:00: starting new backup on datastore '[removed]': "vm/723/2021-02-12T05:21:25Z"
2021-02-12T02:21:26-03:00: download 'index.json.blob' from previous backup.
2021-02-12T02:21:26-03:00: GET /previous: 400 Bad Request: Unable to open fixed index "/backup/[removed]/vm/723/2021-02-10T05:04:17Z/drive-scsi0.img.fidx" - got unknown magic number
2021-02-12T02:21:26-03:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-02-12T02:21:26-03:00: removing unfinished backup
2021-02-12T02:21:26-03:00: TASK ERROR: backup ended but finished flag is not set.

We will also run verifications for snapshots, but we are not aware about the root cause.

Thank you!
could you also post the client and server-side task logs for that snapshot with the unknown magic number?
 
on what kind of storage/file systems are your datastores? any special mount options?

the index files are written to a temporary file and then atomically renamed into place..
For testing I mounted our existing file server via a cifs mount in /etc/fstab and used this directory as datastore in the pbs GUI.
I already read on this forum that remote storages via SMB/NFS are not really recommended, but it worked quite good until just recently.
In /etc/fstab the mount options are credentials=/root/.smbcreds,iocharset=utf8,vers=3.0,noperm,uid=34,gid=34 and most of the time, this seems to work just fine.

However, I just checked the syslog and there I found, right next to the backup messages of "proxmox-backup-proxy", the following line:
Code:
Feb 11 03:00:08 pbs kernel: [223944.175412] CIFS VFS: No writable handle in writepages rc=-22
Interestingly, pbs seems to think that everything went fine and reports a successful backup right after this message.

So I guess for me the issue is having an unsupported storage system mounted via SMB/CIFS. The recommendation probably is to use local storage?

Update:
I just checked the fileserver and could identify the root cause: Apparently, a recent upgrade of our TrueNAS installation introduced a bug (https://jira.ixsystems.com/browse/NAS-108664) which, for whatever reasons, has until now only made itself visible during the update process of pbs.

So, while this is not an issue in pbs (at least for me, still need more details on @rada 's issue) my inital remark is still valid I think: The error message shown by PVE is misleading.
 
Last edited:
In our case, we had an issue with our PSU (hardware) in PBS. We solved it and had to backup again some containers, now issues are gone.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!