Files Corrupt on File Restore in PVE

bradgillap

New Member
May 30, 2022
10
1
3
Hello,

This backup setup was working great for many moons. PVE 7.4-16 and we were on pxbackup 2.4 when this started (or when I noticed) but I've been troubleshooting for two weeks now.

Proxmox backup server has an NFS share mounted on a qnap where majority of the storage is.
The NFS fstab mount has not changed.
Memtest came back clean but the server has ECC ram as well.
We have been able to use the file restore in the past without issues. This all started around July 10th.
The qnap raid 6 has undergone tests, no bad sectors noted.
Settings on the qnap have not changed at all in recent memory.

We started having some chunk issues. Figured out that someone fat fingered a storage path and the local drive filled. Had to delete a chunk folder on that storage which was then removed as a "storage" from pbs. Other backups would also begin to fail and act weird. The "File Restore" option for one particular file server vm shows files and can restore them but they would be corrupt upon download. Assumed the chunk deletion mentioned earlier may have been the issue. This is all moot anyway but its part of the story, the real issue is corrupt files coming from one particular vm's backups on the "file restore" window in PVE.

After a lot of troubleshooting I decided to blow away the install and throw on version 3 of proxmox backup since we are planning on moving PVE to 8 around October anyway. I ran fresh backups last night and everything is verifying with a green checkmark. Garbage collection ran fine this morning after the backups etc. I've poured over the syslog, no errors 400 errors or anything. This gave me high hope this morning.

"File Restore" in PVE from a windows file server is still restoring files as corrupt despite good verification. Usually the file size is close but like 1kb less than what it should be. Everything acts very normal as if the file isn't corrupt until you go to open it. No chunk errors, no transfer errors, drive has 10TB free that we are backing up to. No NFS hangs or issues noted in logs. I tried another browser even just in case it was the downloader lol. We backed up to local storage and found the same corruption as well. Which leads me to think, it's something to do with the VM. Yet the files on the VM are fine, no issues found.

The VM is a server 2016 windows file server, hosted on PVE ZFS seagate nytro enterprise SSD drives. The only thing I can think of is maybe windows dedupe is causing issues on the NTFS drives but like I said, this was never a problem in the past and its been over a year that file restore has worked for us on this vm perpetually. The VM itself does have virtio installed, and qemu guest agent.

I can't download the backup archive and check if they are actually dead files outside the "file restore" in PBS because ztsd uses chunks and I don't have terabytes on my client locally to test that with ATM. Although I may go grab a usb toaster because I'm kind of at that point. I have not tried to restore an entire vm to see if the files are still dead from backup there because the main culprit is a 2016 file server that is fairly large.

I'm happy to share literally like any log. Beyond me cancelling the odd verification task because I opened too many etc. There's nothing I can see after days. I've looked at every thread here with the word corrupt and on reddit. I dunno. I'm stumped. Both PVE and PBS were installed on their servers with ISO's from proxmox. nothing weird about those.

I'm just looking for another thread I might pull on if anyone has an idea. Even better if I don't have to wait for anymore very long tasks to complete with this particular vm lol.
 
Last edited:
Just follow up, the more I think about it. The whole windows server dedupe that happens ongoing and pre backup is making more sense to me as maybe the source of this. I can't validate that yet at the moment as I have to test this further but it would explain what I am seeing.

If windows reports a file is 6kb but then has 2kb stored in deduplication reference then of course the VM is going to report "this file is 6kb" and verification is going to say "Yes this file is 6kb" in the checksum. ztsd chunking does dedupe anyway along the backup and our VM's are actually stored on ZFS. So really this whole exercise using windows dedupe is just extra cycles and ram I probably don't have to spend.

I'm working through a process to deduplicate some drives on this file server to verify this against a few non working files in the current backup but its a big thing to backup so it's going to take me a day or two.

It also maybe makes sense that this was maybe a problem in the past and perhaps I just didn't see it because maybe dedupe wasn't as mature on the windows side as it is today in optimizing the drives or files like in the past. Maybe I have less unique data now or something.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!