Backup of VM failed - job failed with err -116 - Stale file handle

Hi.
I am running PBS 3.2-8 here, and the datastore is a SATA HDD.
I got just one VM with this error:

INFO: starting new backup job: vzdump 2221 --node proxmox01 --storage bkp --remove 0 --notification-mode auto --mode snapshot --notes-template '{{guestname}}'
INFO: Starting Backup of VM 2221 (qemu)
INFO: Backup started at 2025-06-11 10:52:35
INFO: status = running
INFO: VM Name: SDBSKAPRD
INFO: include disk 'scsi0' 'STG-VMS:2221/vm-2221-disk-0.qcow2' 100G
INFO: include disk 'scsi1' 'STG-VMS:2221/vm-2221-disk-1.raw' 130G
INFO: include disk 'scsi2' 'STG-VMS:2221/vm-2221-disk-2.raw' 60G
INFO: include disk 'scsi3' 'STG-VMS:2221/vm-2221-disk-3.raw' 100G
INFO: include disk 'scsi4' 'STG-VMS:2221/vm-2221-disk-4.raw' 40G
INFO: include disk 'scsi5' 'STG-VMS:2221/vm-2221-disk-7.raw' 20G
INFO: include disk 'scsi6' 'STG-VMS:2221/vm-2221-disk-6.raw' 20G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/2221/2025-06-11T13:52:35Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'b1f995c8-7b91-426d-adf4-1735e72dec72'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi1: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi2: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi3: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi4: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi5: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: scsi6: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: 0% (772.0 MiB of 470.0 GiB) in 3s, read: 257.3 MiB/s, write: 248.0 MiB/s
INFO: 0% (2.6 GiB of 470.0 GiB) in 13s, read: 184.4 MiB/s, write: 49.2 MiB/s
ERROR: job failed with err -116 - Stale file handle
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 2221 failed - job failed with err -116 - Stale file handle
INFO: Failed at 2025-06-11 10:52:58
INFO: Backup job finished with errors
TASK ERROR: job errors

I have had searched in the Forum but almost all issue is about NFS or CIFS mounted datastore.
In my case, however, is a local SATA HDD!

Any help will be appreciated.

Thanks

Best regard.
 
I am running PBS 3.2-8 here, and the datastore is a SATA HDD.
Please upgrade to the latest version, this version is outdated.

ERROR: job failed with err -116 - Stale file handle
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 2221 failed - job failed with err -116 - Stale file handle
What about the sources? Are the VM disks located on a network share?
 
Please upgrade to the latest version, this version is outdated.


What about the sources? Are the VM disks located on a network share?
I already do a upgrade and now running 3.4.1
The VM are using a directory based storage, which is over GlusterFS volume.
There is around 30 VMS in this storage.
The backup is running smothly in all of this VMs.
Only this one VM has this issue.
I am doing a clone to see if I can backup the cloned VM.
Than, if the backup succeed, I will stop the original VM, clone it and the use the cloned VM.
But the issue is very weird.

Thanks for your support.
 
Last edited:
Hi there.
Got another Vm with the very same issue:
INFO: scsi3: dirty-bitmap status: existing bitmap was invalid and has been cleared
And than, State file handle.
I don't know if this has something to do, but the 2 VM has 5 or 7 disks, and one of the disk has no discard enabled.
After mark this disks as backup=no, the job ran without any issue.

Thank you for any help.

Best regards.
 
Have you already check the systemd journal of the PVE hosts for errors regarding this issues? Not so familiar with GlusterFS myself, but you could check for issues on the GlusterFS layer as a starting point. E.g. there was errors related to linkfiles in a git repo, see https://bugzilla.redhat.com/show_bug.cgi?id=1569074

The stale file handle would indicate that some other operation invalidates the file handle while the qemu process is operating on the disk file.

You could also try to migrate the disk off to a different storage backend, delete the original disk afterwards and migrate back to the original storage. See if that might help or if you get the stale file handle errors during storage migration as well.
 
Have you already check the systemd journal of the PVE hosts for errors regarding this issues? Not so familiar with GlusterFS myself, but you could check for issues on the GlusterFS layer as a starting point. E.g. there was errors related to linkfiles in a git repo, see https://bugzilla.redhat.com/show_bug.cgi?id=1569074

The stale file handle would indicate that some other operation invalidates the file handle while the qemu process is operating on the disk file.

You could also try to migrate the disk off to a different storage backend, delete the original disk afterwards and migrate back to the original storage. See if that might help or if you get the stale file handle errors during storage migration as well.
I don't think that has something to do with GlusterFS.
Didn't get nothing in the logs and others VMs are backed up normaly.
I am waiting for window to reboot the servers to see if this issue will be fix.
Thanks