[PSA] ZFS Silent Data Corruption

Just to double check,pve-manager/8.1.3/b46aac3b42da5d15 (running kernel: 6.5.11-6-pve) with zfs-2.2.0-pve4 and zfs-kmod-2.2.0-pve4 already has the silent data corruption bug fixed, right?

Also, while I am here, I do not have zfs-dkms installed currently - is that a problem or is that fine/normal?
 
Last edited:
I see the news here about a fix included in the 6.5 kernels. Will there be anything for 5.5/zfs 2.1.11? Thank you!
If you mean the 5.15 kernel for PVE 7.4 - kernel pve-kernel-5.15.131-2-pve contains the fix (by upgrading zfs to 2.1.14) - this is currently available on the pvetest repository (bullseye).

I hope this helps!
 
Just to double check,pve-manager/8.1.3/b46aac3b42da5d15 (running kernel: 6.5.11-6-pve) with zfs-2.2.0-pve4 and zfs-kmod-2.2.0-pve4 already has the silent data corruption bug fixed, right?
Yes.

Also, while I am here, I do not have zfs-dkms installed currently - is that a problem or is that fine/normal?
This is expected - as the proxmox-kernel ships with the ZFS modules already
 
  • Like
Reactions: ipkpjersi
Here's a good writeup from RobN on this. The key takeaway is that no filesystem (including ZFS) is 100% bug free. So no matter what people claim, always assume that data corruptions due to future or pre-existing bugs could take place (even if the chances are infinitesimally small):

2) Given that it has happened once, there's no reason to suppose that such a bug won't ever happen again; so, for those that don't implement things like long-term off-line/tape backups, MD5 files for mostly-static data trees, etc, it's better to think about how to implement them now rather than wait for the next bug to put your data on notice again.
"Take backups" and "test your backups" is and always has been recommended, and this is why

So the real question, is how we make backups and ensure the integrity of the files in the system. In a NAS or media server, it is easy as you could use an external checksum tool like this to verify the checksum of each file to ensure that bitrot or silent data corruption does not set in.

Within the context of PVE or a live system, I am unsure how reliable this would be, since the ZVOL changes all the time. I guess we have to assume that the ZVOL is mostly valid (99.9999% of the time at least) and to make frequent backups and snapshots to PBS.
 
Last edited:
  • Like
Reactions: ipkpjersi
Hello!

Could we get an official opinion on this?

From the perspective of RobN, this bug should not affect ZVOLs that much.

Just a question to understand the problem: Would use cases like file-backed VMs fall under this? They can be reasonably parallel (but using distinct threads, not processes, as far as the host OS is concerned) and work all on the same (image) file. Or are the access patterns "wrong" and don't trigger this bug?
I am pretty sure not. For zvols, I don't see anything calling the affected function. For file-backed, I don't think the access pattern works. First, there's almost going to be a block cache and a filesystem in the guest OS, so the transition from hole->data actually means the filesystem allocated that some part of the block device (or file masquerading as one) and wrote to it. If so, it already knows if there's data there or not - it doesn't need to seek! And any callers above the OS seeking/reading at the same moment as the write are going to be served from cache, not from "storage". And all this is assuming that the VM even has a reason to call lseek(), which seems strange, as it not actually "is there something here", but, "find the next thing after this position", and that could require a lot of disk access to discover.

Would be good to hear from official Proxmox sources if this ties in with what @robn has mentioned above.
 
Last edited:
From the perspective of RobN, this bug should not affect ZVOLs that much.



Would be good to hear from official Proxmox sources if this ties in with what @robn has mentioned above.

Based on the comments this bug should only affect small number of servers as it requires large number of heavy disk workloads in parallel for the bug to appear. I can only imagine something like a heavy usage database server with ZFS might get hit by this bug. I do understand that this bug can effect anyone but just it's just so small if it ever happens.

Although thought came to mind about PBS with ZFS during backups. I have 14 nodes all talking to one PBS during backups but it's being saved to EXT4 since it's being handled by hardware RAID6 so couldn't use ZFS on that. I have no idea if it does have any impact on PBS's ZFS drive store during backups since it would involve several writing streams at the same time. May not be an issue if it's small number of nodes during backups.
 
Last edited:
it also requires reading while writing the file, which shouldn't happen with PBS chunks (they are written once into a tmpfile, then renamed into their final path).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!