Verify errors on multiple deployments after garbage collection

Linyu

Well-Known Member
Jun 14, 2019
43
2
48
26
We are running a few proxmox backup servers deployed on top of Truenas, which is based on ZFS, so I am quite sure that there is no hardware disk failure.
But recently, when I was trying to restore backup from PBS. I found some backup is broken, the error log shows some .chunk is missing (no such file or directory). After that, I started verifying jobs on each deployment, and some backup jobs reported failed on each node.

Here is how our backup job works:
Inside a specific region, a few pve nodes(4-5) are connected to one proxmox backup server(no cluster between nodes), each VM on a difference node will be backup from 0:00-6:00, and we will delete a previous backup before creating a new backup, garbage collection is running every day, so I think maybe garbage collection is running at the same time of backing up, did it accidentally remove some chunk files?

I am using:
Code:
Proxmox Backup Server 3.0-1
1702882022623.png
 
Make sure atime isn't disabled in TrueNAS for the datasets. PBS needs the atime to not delete the wrong chunks on a GC. Shouldn't be disabled by default but many people recommend to disable atime for better performance or less SSD wear but this won't work when using PBS.
 
Last edited:
  • Like
Reactions: Linyu
Make sure atime isn't disabled in TrueNAS for the datasets. PBS needs the atime to not delete the wrong chunks on a GC. Shouldn't be disabled by default but many people recommend to disable atime for better performance or less SSD wear but this won't work when using PBS.
Thanks for the help! I found that Truenas has disabled attime by default, so I will give it a few days to see if the same problem still occurs.
 
I opened atime on the dataset, but still encountered validation errors. In fact, my PBS instance is a virtual machine in the Truenas scale, and I created a volume in the Truenas scale and mounted it to the PBS virtual machine. I learned that atime is only a property of the dataset, but not a volume. Do you have any other suggestions?