"unable to get exclusive lock - EIO: I/O error"

Dunuin · Jun 23, 2021

Hi,

I finally installed a PBS VM on my TrueNAS server. The PBS ISOs installer wasn't working so I installed a Debian Buster and added the "proxmox-backup" package.
My PBS datastore is a fstab mounted NFS share. The NFS share points to a dataset on a enccrypted ZFS pool on the same TrueNAS server. Then I added my new PBS to PVE with enabled encryption and started a backup of 2 VMs and 2 LXCs. That worked fine so far. But then I realized that it is stupid to enable the encryption in PVE for that PBS Backup storage because the datastore is already stored on an GELI encrypted ZFS pool. So I aborted the last running backup job, which gave me this error:

Code:

2021-06-23T02:50:21+02:00: created new fixed index 1 ("vm/102/2021-06-23T00:49:57Z/drive-scsi0.img.fidx")
2021-06-23T02:50:21+02:00: created new fixed index 2 ("vm/102/2021-06-23T00:49:57Z/drive-scsi1.img.fidx")
2021-06-23T02:50:21+02:00: add blob "/mnt/pbs/vm/102/2021-06-23T00:49:57Z/qemu-server.conf.blob" (559 bytes, comp: 559)
2021-06-23T02:50:21+02:00: add blob "/mnt/pbs/vm/102/2021-06-23T00:49:57Z/fw.conf.blob" (148 bytes, comp: 148)
2021-06-23T02:51:52+02:00: backup ended and finish failed: backup ended but finished flag is not set.
2021-06-23T02:51:52+02:00: removing unfinished backup
2021-06-23T02:51:52+02:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: broken pipe
2021-06-23T02:51:52+02:00: TASK ERROR: removing backup snapshot "/mnt/pbs/vm/102/2021-06-23T00:49:57Z" failed - Directory not empty (os error 39)

Then I used the PBS WebUI to delete the 4 finished backups I made. Then I switched to the PVE WebUI and removed the encryption from that PBS backup storage. This complained about "bad idea...you won't be able to restore old backups" but because I anready deleted all backups that shouldn't be any problem.
Then I switched to the PBS WebUI and started a Garbage Collection. That finished but it looks like nothing was deleted. The datastore is still using 9.2GB. I looked into the datastore directory and the "vm" and "ct" folders are empty but there are still 9,2GB of chunks stored.
Now I tried to start the Garbage Collection a second time but I'm only getting this error:

Code:

2021-06-23T03:48:49+02:00: starting garbage collection on store PBS_DS1
2021-06-23T03:48:49+02:00: TASK ERROR: unable to get exclusive lock - EIO: I/O error

The NFS share is still mounted and I can create and delete stuff.
The PBS WebUI Dashboard isn'T showing any running tasks.

This is what the datastore is showing:

Code:

Usage 0.48% (9.19 GiB of 1.85 TiB)
Backup Count
 CT 0 Groups, 0 Snapshots
 Host 0 Groups, 0 Snapshots
 VM 0 Groups, 0 Snapshots
Stats from last Garbage Collection
 Deduplication Factor 1.00

Shouldn't the Garbage Collection delete all chunks if I have deleted all backups?

Edit:
This is the log of the first and successful garbage collection:

Code:

2021-06-23T02:55:26+02:00: starting garbage collection on store PBS_DS1
2021-06-23T02:55:26+02:00: Start GC phase1 (mark used chunks)
2021-06-23T02:55:26+02:00: Start GC phase2 (sweep unused chunks)
2021-06-23T02:55:26+02:00: processed 1% (59 chunks)
2021-06-23T02:55:27+02:00: processed 2% (118 chunks)
2021-06-23T02:55:27+02:00: processed 3% (173 chunks)
2021-06-23T02:55:28+02:00: processed 4% (241 chunks)
2021-06-23T02:55:28+02:00: processed 5% (313 chunks)
2021-06-23T02:55:29+02:00: processed 6% (382 chunks)
2021-06-23T02:55:29+02:00: processed 7% (435 chunks)
2021-06-23T02:55:30+02:00: processed 8% (493 chunks)
2021-06-23T02:55:30+02:00: processed 9% (546 chunks)
2021-06-23T02:55:31+02:00: processed 10% (618 chunks)
2021-06-23T02:55:31+02:00: processed 11% (687 chunks)
2021-06-23T02:55:31+02:00: processed 12% (753 chunks)
2021-06-23T02:55:32+02:00: processed 13% (819 chunks)
2021-06-23T02:55:32+02:00: processed 14% (871 chunks)
2021-06-23T02:55:32+02:00: processed 15% (943 chunks)
2021-06-23T02:55:33+02:00: processed 16% (1008 chunks)
2021-06-23T02:55:33+02:00: processed 17% (1059 chunks)
2021-06-23T02:55:33+02:00: processed 18% (1128 chunks)
2021-06-23T02:55:34+02:00: processed 19% (1195 chunks)
2021-06-23T02:55:34+02:00: processed 20% (1246 chunks)
2021-06-23T02:55:35+02:00: processed 21% (1307 chunks)
2021-06-23T02:55:35+02:00: processed 22% (1365 chunks)
2021-06-23T02:55:35+02:00: processed 23% (1427 chunks)
2021-06-23T02:55:36+02:00: processed 24% (1486 chunks)
2021-06-23T02:55:36+02:00: processed 25% (1543 chunks)
2021-06-23T02:55:36+02:00: processed 26% (1599 chunks)
2021-06-23T02:55:37+02:00: processed 27% (1670 chunks)
2021-06-23T02:55:37+02:00: processed 28% (1742 chunks)
2021-06-23T02:55:38+02:00: processed 29% (1796 chunks)
2021-06-23T02:55:38+02:00: processed 30% (1842 chunks)
2021-06-23T02:55:38+02:00: processed 31% (1902 chunks)
2021-06-23T02:55:39+02:00: processed 32% (1968 chunks)
2021-06-23T02:55:39+02:00: processed 33% (2029 chunks)
2021-06-23T02:55:39+02:00: processed 34% (2096 chunks)
2021-06-23T02:55:40+02:00: processed 35% (2149 chunks)
2021-06-23T02:55:40+02:00: processed 36% (2214 chunks)
2021-06-23T02:55:41+02:00: processed 37% (2273 chunks)
2021-06-23T02:55:41+02:00: processed 38% (2341 chunks)
2021-06-23T02:55:41+02:00: processed 39% (2395 chunks)
2021-06-23T02:55:42+02:00: processed 40% (2462 chunks)
2021-06-23T02:55:42+02:00: processed 41% (2523 chunks)
2021-06-23T02:55:42+02:00: processed 42% (2564 chunks)
2021-06-23T02:55:43+02:00: processed 43% (2627 chunks)
2021-06-23T02:55:43+02:00: processed 44% (2693 chunks)
2021-06-23T02:55:44+02:00: processed 45% (2760 chunks)
2021-06-23T02:55:44+02:00: processed 46% (2824 chunks)
2021-06-23T02:55:45+02:00: processed 47% (2883 chunks)
2021-06-23T02:55:45+02:00: processed 48% (2934 chunks)
2021-06-23T02:55:45+02:00: processed 49% (3007 chunks)
2021-06-23T02:55:46+02:00: processed 50% (3059 chunks)
2021-06-23T02:55:46+02:00: processed 51% (3108 chunks)
2021-06-23T02:55:47+02:00: processed 52% (3166 chunks)
2021-06-23T02:55:47+02:00: processed 53% (3226 chunks)
2021-06-23T02:55:47+02:00: processed 54% (3288 chunks)
2021-06-23T02:55:48+02:00: processed 55% (3355 chunks)
2021-06-23T02:55:48+02:00: processed 56% (3411 chunks)
2021-06-23T02:55:48+02:00: processed 57% (3484 chunks)
2021-06-23T02:55:49+02:00: processed 58% (3552 chunks)
2021-06-23T02:55:49+02:00: processed 59% (3605 chunks)
2021-06-23T02:55:50+02:00: processed 60% (3672 chunks)
2021-06-23T02:55:50+02:00: processed 61% (3734 chunks)
2021-06-23T02:55:50+02:00: processed 62% (3796 chunks)
2021-06-23T02:55:51+02:00: processed 63% (3849 chunks)
2021-06-23T02:55:51+02:00: processed 64% (3912 chunks)
2021-06-23T02:55:51+02:00: processed 65% (3981 chunks)
2021-06-23T02:55:52+02:00: processed 66% (4036 chunks)
2021-06-23T02:55:52+02:00: processed 67% (4094 chunks)
2021-06-23T02:55:52+02:00: processed 68% (4149 chunks)
2021-06-23T02:55:53+02:00: processed 69% (4208 chunks)
2021-06-23T02:55:53+02:00: processed 70% (4258 chunks)
2021-06-23T02:55:54+02:00: processed 71% (4312 chunks)
2021-06-23T02:55:54+02:00: processed 72% (4374 chunks)
2021-06-23T02:55:54+02:00: processed 73% (4461 chunks)
2021-06-23T02:55:55+02:00: processed 74% (4514 chunks)
2021-06-23T02:55:55+02:00: processed 75% (4571 chunks)
2021-06-23T02:55:56+02:00: processed 76% (4625 chunks)
2021-06-23T02:55:56+02:00: processed 77% (4675 chunks)
2021-06-23T02:55:56+02:00: processed 78% (4731 chunks)
2021-06-23T02:55:57+02:00: processed 79% (4813 chunks)
2021-06-23T02:55:57+02:00: processed 80% (4871 chunks)
2021-06-23T02:55:58+02:00: processed 81% (4932 chunks)
2021-06-23T02:55:58+02:00: processed 82% (4993 chunks)
2021-06-23T02:55:58+02:00: processed 83% (5040 chunks)
2021-06-23T02:55:59+02:00: processed 84% (5106 chunks)
2021-06-23T02:55:59+02:00: processed 85% (5171 chunks)
2021-06-23T02:55:59+02:00: processed 86% (5233 chunks)
2021-06-23T02:56:00+02:00: processed 87% (5302 chunks)
2021-06-23T02:56:00+02:00: processed 88% (5369 chunks)
2021-06-23T02:56:00+02:00: processed 89% (5421 chunks)
2021-06-23T02:56:01+02:00: processed 90% (5496 chunks)
2021-06-23T02:56:01+02:00: processed 91% (5556 chunks)
2021-06-23T02:56:02+02:00: processed 92% (5612 chunks)
2021-06-23T02:56:02+02:00: processed 93% (5659 chunks)
2021-06-23T02:56:02+02:00: processed 94% (5718 chunks)
2021-06-23T02:56:03+02:00: processed 95% (5758 chunks)
2021-06-23T02:56:03+02:00: processed 96% (5808 chunks)
2021-06-23T02:56:03+02:00: processed 97% (5870 chunks)
2021-06-23T02:56:03+02:00: processed 98% (5935 chunks)
2021-06-23T02:56:04+02:00: processed 99% (5987 chunks)
2021-06-23T02:56:04+02:00: Removed garbage: 0 B
2021-06-23T02:56:04+02:00: Removed chunks: 0
2021-06-23T02:56:04+02:00: Pending removals: 8.88 GiB (in 6051 chunks)
2021-06-23T02:56:04+02:00: Original data usage: 0 B
2021-06-23T02:56:04+02:00: On-Disk chunks: 0
2021-06-23T02:56:04+02:00: Deduplication factor: 1.00
2021-06-23T02:56:04+02:00: TASK OK

Does the "Pending removals: 8.88 GiB (in 6051 chunks)" mean that the GC is still somehow running in the background and because of that I can't start a new one? But I don't see any space freed up the last hour.

dietmar · Jun 23, 2021

Real data removal is deferred by 24 hour (for technical reasons). So if you run GC again tomorrow all chunks will get removed.

But you should be able to start GC again, so I have no real idea why you get that "I/O error". Is the storage still mounted and accessible?

Dunuin · Jun 23, 2021

Yes, the NFS share is still mounted and I can browse, create and delete files inside it. But I need to reboot the PBS VM in order to run the GC again. Then it runs but I can't run it a second time without rebooting again.

Maybe the locking isn't working right?

It looks like the locking service is running on the server:

Code:

root@MainNAS[~]# rpcinfo -p
   program vers proto   port  service
    100000    4   tcp    111  rpcbind
    100000    3   tcp    111  rpcbind
    100000    2   tcp    111  rpcbind
    100000    4   udp    111  rpcbind
    100000    3   udp    111  rpcbind
    100000    2   udp    111  rpcbind
    100000    4 local    111  rpcbind
    100000    3 local    111  rpcbind
    100000    2 local    111  rpcbind
    100005    1   udp    876  mountd
    100005    3   udp    876  mountd
    100005    1   tcp    876  mountd
    100005    3   tcp    876  mountd
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs
    100024    1   udp    648  status
    100024    1   tcp    648  status
    100021    0   udp    634  nlockmgr
    100021    0   tcp    925  nlockmgr
    100021    1   udp    634  nlockmgr
    100021    1   tcp    925  nlockmgr
    100021    3   udp    634  nlockmgr
    100021    3   tcp    925  nlockmgr
    100021    4   udp    634  nlockmgr
    100021    4   tcp    925  nlockmgr

Code:

root@MainNAS[~]# ps aux|grep rpc
root       69649   0.0  0.0   11512   2668  -  Is   15:59       0:00.01 /usr/sbin/rpcbind -h 192.168.45.4 -h 192.168.49.4
root       69694   0.0  0.0  285600   6324  -  Ss   15:59       0:00.00 /usr/sbin/rpc.statd -d -h 192.168.45.4 -h 192.168.49.4
root       69712   0.0  0.0   23480   6304  -  Ss   15:59       0:00.00 /usr/sbin/rpc.lockd -d 10
root       75168   0.0  0.0   11512   2972  6  S+   16:45       0:00.00 grep rpc

My NFS share is using the 192.168.49.4 interface, but if get the rpc.lockd documentation right then a missing "-h" argument should bind the service to all interfaces.

This is my PBS fstab entry for the NFS share:

Code:

#NFS PBS
192.168.49.4:/mnt/HDDpool/LongDatasetNameToPreventTooLongPaths/NFS/PBS /mnt/pbs  nfs      defaults,proto=tcp,nfsvers=4    0       0

dietmar · Jun 23, 2021

PBS only need local locks, so it may help if you mount with option 'nolock'?

Dunuin · Jun 23, 2021

dietmar said:
PBS only need local locks, so it may help if you mount with option 'nolock'?

No, "nolock" isn't helping with NFSv4.

It just looks like TrueNAS or PBS dont like locking with NFS v4. After switching the mount in fstab from "nfsvers=4" to "nfsvers=3" I can run GC multiple times.
And now also the TrueNAS server is showing logs like this:

Code:

Jun 23 17:15:55 MainNAS 1 2021-06-23T17:15:55.911771+02:00 MainNAS.local mountd 78646 - - mount request succeeded from 192.168.49.8 for /mnt/HDDpool/LongDatasetNameToPreventTooLongPaths/NFS/PBS
Jun 23 17:16:23 MainNAS kernel: NLM: new host PBS (sysid 1)
Jun 23 17:16:23 MainNAS kernel: nlm_do_lock(): caller_name = PBS (sysid = 1)
Jun 23 17:16:23 MainNAS kernel: NLM: host PBS (sysid 1) has NSM state 3
Jun 23 17:16:23 MainNAS kernel: NLM: monitoring PBS (sysid 1)
Jun 23 17:16:25 MainNAS 1 2021-06-23T17:16:25.184940+02:00 MainNAS.local rpc.statd 78671 - - Invalid hostname to sm_mon: PBS
Jun 23 17:16:25 MainNAS kernel: Local NSM refuses to monitor PBS
Jun 23 17:16:47 MainNAS kernel[3445]: Last message 'Local NSM refuses to' repeated 1 times, suppressed by syslog-ng on MainNAS.local
Jun 23 17:16:47 MainNAS kernel: nlm_do_unlock(): caller_name = PBS (sysid = 1)
Jun 23 17:17:24 MainNAS kernel: nlm_do_lock(): caller_name = PBS (sysid = 1)
Jun 23 17:17:32 MainNAS kernel: nlm_do_unlock(): caller_name = PBS (sysid = 1)
Jun 23 17:17:36 MainNAS kernel: nlm_do_lock(): caller_name = PBS (sysid = 1)
Jun 23 17:17:44 MainNAS kernel: nlm_do_unlock(): caller_name = PBS (sysid = 1)

If I mount the share as NFS v4 there were no logs at all.
Someone maybe knows NFSv4 isn't working like it should?
I will ask in the TrueNAS forums too but normally no one will answer there.

Search

Search

"unable to get exclusive lock - EIO: I/O error"

Dunuin

Distinguished Member

dietmar

Proxmox Staff Member

Dunuin

Distinguished Member

dietmar

Proxmox Staff Member

Dunuin

Distinguished Member