Veeam Silent Data Corruption

Hi NatO, thanks for your report.

Our testing suggests that a Veeam backup of a live VM is not internally consistent, even with a single disk. I recommend only assessing the consistency of multi-disk configuration once the basics are operating as expected.

As soon as Veeam announces a fix, we'll resume our testing and proceed to validate the correctness of point-in-time snapshots across multiple disks.

@Pavel Tide , are there any updates on the issue or an ETA for the resolution?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
There's been an update on the Veeam forum post (after a nudge).. hopefully they will have it sorted soon..
 
  • Like
Reactions: bbgeek17
Thanks for your reply @bbgeek17

I've done some very simple testing and realised that the disks within a VM are not snapped at the same time. So when you get the Veeam backup one disk will have more recent files than the other. Clearly snapshotting the disks sequentially as it backps them up rather than all at once and then backing up (like it does with VMware/HyperV)

my test was simple, create a VM with 2 disks.

create the required folders for the test files and then run a powershell script something like this. it writes one file every second.

Code:
$path1 = "d:\z_snaptest"
$path2 = "c:\z_snaptest"
$path3 = "d:\a_snaptest"
$path4 = "c:\a_snaptest"
 
for ($i = 100; $i -le 459; $i++) {


    $currentTime = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $fileName1 = "$path1\time_$i.txt"
    $fileName2 = "$path2\time_$i.txt"
    $fileName3 = "$path3\time_$i.txt"
    $fileName4 = "$path4\time_$i.txt"
    $currentTime | Out-File -FilePath $fileName1
    $currentTime | Out-File -FilePath $fileName2
    $currentTime | Out-File -FilePath $fileName3
    $currentTime | Out-File -FilePath $fileName4


    Start-Sleep -Seconds 1
}

I know Veeam doesn't support application aware processing so if you were running anyhing like Exchange or SQL you'd want to be using the Veeam agent.

I'm just surprised by this. I'm sure there's a reason for it, but I've checked the release notes and help guides again and can't find any mention of it. Just not something I expected coming from VMware.

this happens if the VM is on LVM (storage over FC) or lvm-thin on the local host, which supports snapshots but from what I saw doesn't use them

I have run this again after the patch and getting the same files in the output folders on each disk.. That's a promising sign :)
 
  • Like
Reactions: Johannes S
Good news! Veeam backups with the new version are functional! Restored VMs passed our snapshot consistency tests. So, we can say that backup of a VM with a single disk is "point in time" (i.e., crash consistent) and has integrity when restored.

We also confirmed that previously taken backups were non-recoverably corrupt. Taking full backups after updating to the version with the fix makes sense.

We have a few more tests to run, but we wanted to keep everyone in the loop. So far, so good!


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Yay! Thanks for letting know.

>We also confirmed that previously taken backups were non-recoverably corrupt.

How come? Was VBR unable to restore objects from the backup, or you found corrupted data inside the files inside the backups? Please tell me more about the methods of testing.

>Taking full backups after updating to the version with the fix makes sense.

Well, not really - after you install the patch a full scan is done anyway.
Which means that we perfrom CRC to compare blocks that are in the most recent restore point against the current state of the VM disk.
That is, whatever's been missing or wrong is getting fixed in the most recent incremental automatically.

Thanks!
 
Last edited:
  • Like
Reactions: Johannes S
Yay! Thanks for letting know.

>We also confirmed that previously taken backups were non-recoverably corrupt.

How come? Was VBR unable to restore objects from the backup, or you found corrupted data inside the files inside the backups? Please tell me more about the methods of testing.

>Taking full backups after updating to the version with the fix makes sense.

Well, not really - after you install the patch a full scan is done anyway.
Which means that we perfrom CRC to compare blocks that are in the most recent restore point against the current state of the VM disk.
That is, whatever's been missing or wrong is getting fixed in the most recent incremental automatically.

Thanks!
Are you saying that you’re modifying the existing vib files in the chain and potentially the vbk file as well when that next backup is run after the patch?
 
  • Like
Reactions: Johannes S
I am following up and closing this out. Based on our testing, the corruption issues with the backup data have been resolved by the Veeam software update.
We ran the following test cases with full backups on lvm, zfs, and Blockbridge to prove the fix:
  • I/O sequence analysis internal to a single disk
  • I/O sequence analysis distributed across eight disks
  • dm_integrity checking of a single disk
In each case, the restored contents were valid, and the data contents were correct. This should be sufficient to support Veeam in our customer environments.

@Pavel Feel free to reach out if you need testing on future updates.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S and NatO

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!