Hello,
This backup setup was working great for many moons. PVE 7.4-16 and we were on pxbackup 2.4 when this started (or when I noticed) but I've been troubleshooting for two weeks now.
Proxmox backup server has an NFS share mounted on a qnap where majority of the storage is.
The NFS fstab mount has not changed.
Memtest came back clean but the server has ECC ram as well.
We have been able to use the file restore in the past without issues. This all started around July 10th.
The qnap raid 6 has undergone tests, no bad sectors noted.
Settings on the qnap have not changed at all in recent memory.
We started having some chunk issues. Figured out that someone fat fingered a storage path and the local drive filled. Had to delete a chunk folder on that storage which was then removed as a "storage" from pbs. Other backups would also begin to fail and act weird. The "File Restore" option for one particular file server vm shows files and can restore them but they would be corrupt upon download. Assumed the chunk deletion mentioned earlier may have been the issue. This is all moot anyway but its part of the story, the real issue is corrupt files coming from one particular vm's backups on the "file restore" window in PVE.
After a lot of troubleshooting I decided to blow away the install and throw on version 3 of proxmox backup since we are planning on moving PVE to 8 around October anyway. I ran fresh backups last night and everything is verifying with a green checkmark. Garbage collection ran fine this morning after the backups etc. I've poured over the syslog, no errors 400 errors or anything. This gave me high hope this morning.
"File Restore" in PVE from a windows file server is still restoring files as corrupt despite good verification. Usually the file size is close but like 1kb less than what it should be. Everything acts very normal as if the file isn't corrupt until you go to open it. No chunk errors, no transfer errors, drive has 10TB free that we are backing up to. No NFS hangs or issues noted in logs. I tried another browser even just in case it was the downloader lol. We backed up to local storage and found the same corruption as well. Which leads me to think, it's something to do with the VM. Yet the files on the VM are fine, no issues found.
The VM is a server 2016 windows file server, hosted on PVE ZFS seagate nytro enterprise SSD drives. The only thing I can think of is maybe windows dedupe is causing issues on the NTFS drives but like I said, this was never a problem in the past and its been over a year that file restore has worked for us on this vm perpetually. The VM itself does have virtio installed, and qemu guest agent.
I can't download the backup archive and check if they are actually dead files outside the "file restore" in PBS because ztsd uses chunks and I don't have terabytes on my client locally to test that with ATM. Although I may go grab a usb toaster because I'm kind of at that point. I have not tried to restore an entire vm to see if the files are still dead from backup there because the main culprit is a 2016 file server that is fairly large.
I'm happy to share literally like any log. Beyond me cancelling the odd verification task because I opened too many etc. There's nothing I can see after days. I've looked at every thread here with the word corrupt and on reddit. I dunno. I'm stumped. Both PVE and PBS were installed on their servers with ISO's from proxmox. nothing weird about those.
I'm just looking for another thread I might pull on if anyone has an idea. Even better if I don't have to wait for anymore very long tasks to complete with this particular vm lol.
This backup setup was working great for many moons. PVE 7.4-16 and we were on pxbackup 2.4 when this started (or when I noticed) but I've been troubleshooting for two weeks now.
Proxmox backup server has an NFS share mounted on a qnap where majority of the storage is.
The NFS fstab mount has not changed.
Memtest came back clean but the server has ECC ram as well.
We have been able to use the file restore in the past without issues. This all started around July 10th.
The qnap raid 6 has undergone tests, no bad sectors noted.
Settings on the qnap have not changed at all in recent memory.
We started having some chunk issues. Figured out that someone fat fingered a storage path and the local drive filled. Had to delete a chunk folder on that storage which was then removed as a "storage" from pbs. Other backups would also begin to fail and act weird. The "File Restore" option for one particular file server vm shows files and can restore them but they would be corrupt upon download. Assumed the chunk deletion mentioned earlier may have been the issue. This is all moot anyway but its part of the story, the real issue is corrupt files coming from one particular vm's backups on the "file restore" window in PVE.
After a lot of troubleshooting I decided to blow away the install and throw on version 3 of proxmox backup since we are planning on moving PVE to 8 around October anyway. I ran fresh backups last night and everything is verifying with a green checkmark. Garbage collection ran fine this morning after the backups etc. I've poured over the syslog, no errors 400 errors or anything. This gave me high hope this morning.
"File Restore" in PVE from a windows file server is still restoring files as corrupt despite good verification. Usually the file size is close but like 1kb less than what it should be. Everything acts very normal as if the file isn't corrupt until you go to open it. No chunk errors, no transfer errors, drive has 10TB free that we are backing up to. No NFS hangs or issues noted in logs. I tried another browser even just in case it was the downloader lol. We backed up to local storage and found the same corruption as well. Which leads me to think, it's something to do with the VM. Yet the files on the VM are fine, no issues found.
The VM is a server 2016 windows file server, hosted on PVE ZFS seagate nytro enterprise SSD drives. The only thing I can think of is maybe windows dedupe is causing issues on the NTFS drives but like I said, this was never a problem in the past and its been over a year that file restore has worked for us on this vm perpetually. The VM itself does have virtio installed, and qemu guest agent.
I can't download the backup archive and check if they are actually dead files outside the "file restore" in PBS because ztsd uses chunks and I don't have terabytes on my client locally to test that with ATM. Although I may go grab a usb toaster because I'm kind of at that point. I have not tried to restore an entire vm to see if the files are still dead from backup there because the main culprit is a 2016 file server that is fairly large.
I'm happy to share literally like any log. Beyond me cancelling the odd verification task because I opened too many etc. There's nothing I can see after days. I've looked at every thread here with the word corrupt and on reddit. I dunno. I'm stumped. Both PVE and PBS were installed on their servers with ISO's from proxmox. nothing weird about those.
I'm just looking for another thread I might pull on if anyone has an idea. Even better if I don't have to wait for anymore very long tasks to complete with this particular vm lol.
Last edited: