ESTALE: Stale file handle

xk3tchuPx

Member
May 16, 2020
7
0
21
33
I'm having this issue since a couple weeks.
Backup would fail with this error.

It usually juste need to reboot PBS server to fix the issue, but it's very annoying.
Is there something I could do to solve this?

I'm backing up to a Local PBS server, running in a VM and then it's remote sync to another PBS instance running in AWS.

R,
xk3tchuPx
 
Do you see those errors in the task log when backing up from PVE to the local PBS VM?
Please provide the full log and the output of `proxmox-backup-manager versions --verbose`
 
Code:
proxmox-backup                2.3-1        running kernel: 5.15.83-1-pve
proxmox-backup-server         2.3.1-1      running version: 2.3.1       
pve-kernel-5.15               7.3-1                                     
pve-kernel-helper             7.3-1                                     
pve-kernel-5.15.83-1-pve      5.15.83-1                                 
pve-kernel-5.15.74-1-pve      5.15.74-1                                 
pve-kernel-5.15.35-1-pve      5.15.35-3                                 
ifupdown2                     3.1.0-1+pmx3                             
libjs-extjs                   7.0.0-1                                   
proxmox-backup-docs           2.3.1-1                                   
proxmox-backup-client         2.3.1-1                                   
proxmox-mini-journalreader    1.2-1                                     
proxmox-offline-mirror-helper 0.5.0-1                                   
proxmox-widget-toolkit        3.5.3                                     
pve-xtermjs                   4.16.0-1                                 
smartmontools                 7.2-pve3                                 
zfsutils-linux                2.1.7-pve1

Code:
2022-12-31T15:08:20-05:00: starting new backup on datastore 'NFS-CATALINA': "ct/109/2022-12-31T20:08:20Z"
2022-12-31T15:08:20-05:00: protocol upgrade done
2022-12-31T15:08:20-05:00: GET /previous_backup_time
2022-12-31T15:08:20-05:00: POST /blob
2022-12-31T15:08:20-05:00: add blob "/mnt/datastore/ct/109/2022-12-31T20:08:20Z/pct.conf.blob" (217 bytes, comp: 217)
2022-12-31T15:08:20-05:00: POST /dynamic_index
2022-12-31T15:08:20-05:00: POST /dynamic_index
2022-12-31T15:08:20-05:00: POST /dynamic_index: 400 Bad Request: unable to get shared lock - ESTALE: Stale file handle
2022-12-31T15:08:20-05:00: POST /dynamic_index: 400 Bad Request: unable to get shared lock - ESTALE: Stale file handle
2022-12-31T15:08:20-05:00: backup ended and finish failed: backup ended but finished flag is not set.
2022-12-31T15:08:20-05:00: removing unfinished backup
2022-12-31T15:08:20-05:00: TASK ERROR: backup ended but finished flag is not set.


My backup storage isn't fast storage, it's a ZFS raid10 made of 4x4TB HDD.
I've just added a mirrored SLOG ssd in front to see if that helps with stale.

Will report back.


R,
xk3tchuPx
 
Based on the name `NFS-CATALINA`, is the datastore an NFS mount instead of a local ZFS pool?
 
That can happen when clients don't disconnect cleanly.
Try removing the exports on the NFS and adding them again.

It's a common issue with NFS it seems, at least you can find a lot of things regarding those when searching for `ESTALE` and `NFS`.
 
I've seen a lot of post regarding that issue when NFS is in use, however nothing really helpful regarding how to fix it.
 
Re-export the mounts, this should fix it at least for some time.
Also check the logs of your NFS server to see if there are errors on that side.
 
Im having the same problem. - Not with PBS, but with the PVE host itself which has mounted an NFS share of a QNAP NAS.

One day my NAS seemed to hang/freeze and i needed to reboot it. Now i always had this STALE error, when i tried accessing files there.
In the PVE GUI u cannot remount NFS shares. So i deleted it and recreated it, but throws some obscure error message similar to this one:
https://forum.proxmox.com/threads/read-only-cifs-smb-and-nfs.101332/

So i tried remounting it from the command line, and also first unmounting it, and cleanly mounting it again. But it does it work, it always fails.
I tried it from other machines and they have no problem mounting it, so its definitely a pve-problem.
When i try removing it from GUI and adding it again there, then it always creates the directory, fe. in /mnt/pve/myNasShare . But its never accessible. But this directory cannot be deleted afterwards, it says device is busy.

I wanted to avoid rebooting the whole PVE server. - Is there any other way to do it?
 
UPDATE:
We just rebooted the server. The problem still persists.
The error message in the GUI when trying to add the NFS share again, is:
Code:
"create storage failed: mkdir /mnt/pve/vlnas1-nfsvol1/images: Read-only file system at /usr/share/perl5/PVE/Storage/Plugin.pm line 1374. (500)"

Is there any solution?
 
Last edited:
  • Like
Reactions: Altrove
UPDATE:
We just rebooted the server. The problem still persists.
The error message in the GUI when trying to add the NFS share again, is:
Code:
"create storage failed: mkdir /mnt/pve/vlnas1-nfsvol1/images: Read-only file system at /usr/share/perl5/PVE/Storage/Plugin.pm line 1374. (500)"

Is there any solution?
i have the some problem, after i have update the proxmox VE to 7.4.3 from 7.4.1, Till yesterday the QNAP and NFS storage work like as charme...
 
The solution from Altrove also works for me. The error is reproducable. Maybe it will not take many effort for the PROXMOX guys to fix it in a further release?
 
  • Like
Reactions: Altrove
i have a similar problem. been using PBS on a remote NFS share for years without problems, but recently I have changed my storage setup, so the backup target now is a ZFS dataset, still mounted as NFS. Since moving to NFS i am getting these errors during Garbage Collection:

Code:
TASK ERROR: update atime failed for chunk/file "/mnt/omv-backup/pbs/.chunks/0a9f/0a9f1f2b64429b61f20784be7a84d7c9d0bf86bccb496afd72892ad0c77347a9" - ESTALE: Stale file handle

restarting the NFS server and remounting the share in PBS helps for a day or two but then the issue appears again.

The pbs directory was rsynced to the new ZFS system with the parameters
Code:
sudo rsync -avxPH --info=progress2 --info=name0 --sparse /source /target

The share is mounted via this fstab entry:
Code:
192.168.50.8:/Backups on /mnt/omv-backup type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=14,retrans=2,sec=sys,clientaddr=192.168.50.16,local_lock=none,addr=192.168.50.8,_netdev)

Also i have discovered the ZFS dataset had atime disabled, so I have enabled it, but it did not help. These are properties of the dataset:
Code:
NAME                 PROPERTY               VALUE                                 SOURCE
StoragePool/Backups  type                   filesystem                            -
StoragePool/Backups  creation               Wed May  8 15:32 2024                 -
StoragePool/Backups  used                   1.81T                                 -
StoragePool/Backups  available              24.9T                                 -
StoragePool/Backups  referenced             1.81T                                 -
StoragePool/Backups  compressratio          1.22x                                 -
StoragePool/Backups  mounted                yes                                   -
StoragePool/Backups  quota                  none                                  default
StoragePool/Backups  reservation            none                                  default
StoragePool/Backups  recordsize             128K                                  default
StoragePool/Backups  mountpoint             /StoragePool/Backups                  default
StoragePool/Backups  sharenfs               off                                   inherited from StoragePool
StoragePool/Backups  checksum               on                                    default
StoragePool/Backups  compression            lz4                                   inherited from StoragePool
StoragePool/Backups  atime                  on                                    local
StoragePool/Backups  devices                on                                    default
StoragePool/Backups  exec                   on                                    default
StoragePool/Backups  setuid                 on                                    default
StoragePool/Backups  readonly               off                                   default
StoragePool/Backups  zoned                  off                                   default
StoragePool/Backups  snapdir                hidden                                default
StoragePool/Backups  aclmode                discard                               local
StoragePool/Backups  aclinherit             passthrough                           local
StoragePool/Backups  createtxg              125                                   -
StoragePool/Backups  canmount               on                                    default
StoragePool/Backups  xattr                  sa                                    local
StoragePool/Backups  copies                 1                                     default
StoragePool/Backups  version                5                                     -
StoragePool/Backups  utf8only               off                                   -
StoragePool/Backups  normalization          none                                  -
StoragePool/Backups  casesensitivity        insensitive                           -
StoragePool/Backups  vscan                  off                                   default
StoragePool/Backups  nbmand                 off                                   default
StoragePool/Backups  sharesmb               off                                   inherited from StoragePool
StoragePool/Backups  refquota               none                                  default
StoragePool/Backups  refreservation         none                                  default
StoragePool/Backups  guid                   13362638647968177696                  -
StoragePool/Backups  primarycache           all                                   inherited from StoragePool
StoragePool/Backups  secondarycache         all                                   inherited from StoragePool
StoragePool/Backups  usedbysnapshots        0B                                    -
StoragePool/Backups  usedbydataset          1.81T                                 -
StoragePool/Backups  usedbychildren         0B                                    -
StoragePool/Backups  usedbyrefreservation   0B                                    -
StoragePool/Backups  logbias                latency                               default
StoragePool/Backups  objsetid               1306                                  -
StoragePool/Backups  dedup                  off                                   default
StoragePool/Backups  mlslabel               none                                  default
StoragePool/Backups  sync                   standard                              default
StoragePool/Backups  dnodesize              legacy                                default
StoragePool/Backups  refcompressratio       1.22x                                 -
StoragePool/Backups  written                1.81T                                 -
StoragePool/Backups  logicalused            2.10T                                 -
StoragePool/Backups  logicalreferenced      2.10T                                 -
StoragePool/Backups  volmode                default                               default
StoragePool/Backups  filesystem_limit       none                                  default
StoragePool/Backups  snapshot_limit         none                                  default
StoragePool/Backups  filesystem_count       none                                  default
StoragePool/Backups  snapshot_count         none                                  default
StoragePool/Backups  snapdev                hidden                                default
StoragePool/Backups  acltype                nfsv4                                 local
StoragePool/Backups  context                none                                  default
StoragePool/Backups  fscontext              none                                  default
StoragePool/Backups  defcontext             none                                  default
StoragePool/Backups  rootcontext            none                                  default
StoragePool/Backups  relatime               on                                    default
StoragePool/Backups  redundant_metadata     all                                   default
StoragePool/Backups  overlay                on                                    default
StoragePool/Backups  encryption             aes-256-gcm                           -
StoragePool/Backups  keylocation            none                                  default
StoragePool/Backups  keyformat              hex                                   -
StoragePool/Backups  pbkdf2iters            0                                     default
StoragePool/Backups  encryptionroot         StoragePool                           -
StoragePool/Backups  keystatus              available                             -
StoragePool/Backups  special_small_blocks   0                                     default
StoragePool/Backups  prefetch               all                                   default
StoragePool/Backups  org.truenas:managedby  192.168.50.60                         local
StoragePool/Backups  omvzfsplugin:uuid      5660ef00-ef92-416a-b1a9-f4074589ddea  local

Any idea what could be wrong?
 
this is driving me nuts, my pbs storage is slowly growing and I did not have a successful garbage collection for years. i find it mildly interesting, that its always a different file that is failing. restarting the pbs server does not seem to help

also I have checked the file that came back with the error with the stats command and looks that atime was correctly updated, so i am confused what is going on here:
1727706890071.png

Code:
root@pbs:/mnt/omv-backup/pbs/.chunks/514e# stat 514e7b6d4df005e487f76919fa2fd9ca0596509d6b7d05c6b9bdb80b94cbc048 
  File: 514e7b6d4df005e487f76919fa2fd9ca0596509d6b7d05c6b9bdb80b94cbc048
  Size: 1007105       Blocks: 2004       IO Block: 1048576 regular file
Device: 0,43    Inode: 8927205     Links: 1
Access: (0644/-rw-r--r--)  Uid: (   34/  backup)   Gid: (   34/  backup)
Access: 2024-09-30 14:55:58.822971567 +0200
Modify: 2024-02-15 21:04:46.264773536 +0100
Change: 2024-09-30 14:55:58.822971567 +0200
 
Hi,
a stale file handle would indicate that the Proxmox Backup Server opens a file on the NFS backed datastore, but once it tries to operate on the open file descriptor, the NFS server does not allow to perform the operation on it. This might happen if the NFS server e.g. restarted in-between or otherwise invalidated the file handle. Please check the NFS side for errors and check that the zpool which is backing the NFS share is fine (what is the zpool status)
 
the pool is alright, scrubbed regularly, coincidentally the last scrub just finished today. both the NFS server as PBS have been rebooted before running the garbage collection (PBS turned off, NFS server rebooted, PBS started).

once it tries to operate on the open file descriptor, the NFS server does not allow to perform the operation on it.
what I am trying to figure out what it the operation PBS is trying to perform. judging from the error message
update atime failed i was thinking its a touch -a , which works on the share and i can see all the affected files having a current access time.

all my issues started since moving to ZFS, in my previous single drive setup I had zero issues with stale file handles over NFS
 
Same here. Getting my truenas nfs share connected and mounted into PBS as a datastore drives me nuts.

I mount my nfs share via fstab into /mnt/pve_backups_epyc

192.168.71.165:/mnt/pool1/Backups/pve_backups_epyc /mnt/pve_backups_epyc nfs defaults 0 0
So far so good, the folder has 775 and backup:backup as the proxmox backup server seem to needs that.
If the share is mounted, I am able to write via terminal as root into the nfs shared folder.


Bildschirmfoto 2024-10-02 um 11.36.07.png

When trying to add the datastore to the specific path, I get the ESATALE error.

Bildschirmfoto 2024-10-02 um 11.10.55.png

From my PVE Host, nfs works just fine, for the example with the following config:

Code:
nfs: pve_backups_epyc
    export /mnt/pool1/Backups/pve_backups_epyc
    path /mnt/pve/pve_backups_epyc
    server 192.168.71.165
    content iso,snippets,backup,rootdir,vztmpl,images
    nodes pve
    options vers=4.2
    prune-backups keep-all=1

Bildschirmfoto 2024-10-02 um 12.26.49.png

I do not understand where the problem with ACL or so might happen. Should it, if pve can use the NFS share and write and backup into it, not be able, that the pbs can create a datastore into it too?

Any suggestions would be nice.

Bildschirmfoto 2024-10-02 um 12.58.55.png
 
Last edited:
I might have found the solution for the ESTALE error.

Cause: This problem occurs when an application opens or creates a file, deletes and closes it, and then attempts to access or delete the same file again.

My solution: unmount the file system and then remount it. This may require using the -f flag in the umount command.

Example:

Code:
umount -f 10.x.x.x:/nfs-export-path /mnt/yourmountpoint
mount 10.x.x.x:/nfs-export-path /mnt/yourmountpoint

After unmount with -f and a remount of the share, the datastore was able to create the chunk files on the nfs share!
 
Maybe the short version would solve such lock cases also: "mount -o remount /mnt/yourmountpoint"