Notes on hash collisions

Aug 20, 2025
5
0
1
Hey there,

I'm wondering how Proxmox Backup Server deals with hash collisions?

Yes, I know, there's the technical documentation - but from how I read it, the documentation says "it doesn't".
Followed by some number crunching about winning the lottery several times.
However if we're talking about a 4 MB chunk, which is mapped to a 32 B hash, and you assume the hashes are uniformly distributed, then it's obvious that multiple versions of said chunk must map to the same hash value, otherwise there would only be 32 bytes of information in said chunk.
What's even worse is that unless you compare the data, verifying the backup wouldn't indicate an issue, because affected data would map to the same hash value, indicating that everything is A-OK. So whether you won the data loss lottery, you'll only know if you need to do a restore, and even then only once you run into the corrupted file.

Yes, it's true, collisions aren't likely, but there is a lot of data to back up, and a lot of chunks you generate. And just like in real life, every once in a while people win the lottery - except that time it's not the grand prize you're getting.
 
Yes, it's true, collisions aren't likely, but there is a lot of data to back up, and a lot of chunks you generate. And just like in real life, every once in a while people win the lottery - except that time it's not the grand prize you're getting.
Collisions are very unlikely, so there is no protection for that.
 
Well, my worries aren't that it's insecure, it's more like that we are backing up a lot of chunks, and obviously collisions must be possible, because we're mapping 33554432 bits to 256 bits, and it's difficult to detect that such a collision has taken place. Not because of a malicious attack, but because of chance.

I agree it's unlikely, but it's not impossible. I also agree there is mitigating factors - many files have a structure, so parts of the chunks aren't actually fully variable (unless it's compressed or encrypted content), and if you never need the backup, you're also not affected. So, I feel mostly comfortable, but not completely.
 
Last edited: