Using qcow2 for 20TB disks on iSCSI SAN (NVMe, 25G, multipath) — any risk of corruption or performance issues?

Nov 19, 2024
17
3
3
Hi everyone,

I’m running Proxmox 9 VE connected to a SAN Pro storage (NVMe backend, 25 Gbps network, multipath iSCSI).

To enable snapshots for my VMs on this SAN, I need to use qcow2 disk format instead of raw.

I currently have multiple virtual machines, and each one has a 20 TB disk.

My question is:
Will using qcow2 files of 20 TB each cause any issues related to corruption, performance degradation, backups, or snapshots in the long term?

Has anyone used large qcow2 disks over iSCSI with similar specs? I’d love to hear about your experience or best practices (tuning, caching, etc.).

Thanks!
 
Last edited:
qcow2 tracks metadata internally. A 20TB file means a huge metadata structure. If a VM crashes or the network goes down temporarily, the larger the file, the higher the chance of metadata inconsistencies and subsequent corruption. Even with 25Gbps and multiple paths, iSCSI is still network dependent. Any prolonged or severe network interruption during heavy I/O can corrupt the file structure, making it harder to recover from than a raw block device.
Thanks a lot for your detailed explanation — that’s very helpful and makes sense regarding qcow2 metadata overhead and potential risks.

Just to confirm my understanding:
If I switch back to using RAW disks over iSCSI (without qcow2) and let the SAN handle snapshots directly at the storage level instead of through Proxmox, would that eliminate those metadata-related corruption risks?

In other words, using RAW LUNs with SAN-side snapshots — do you see any technical drawbacks or limitations with that approach in Proxmox 9?

Appreciate your input!
 
I currently have 9 virtual machines, and each one has a 20 TB disk.

My question is:
Will using qcow2 files of 20 TB each cause any issues related to corruption, performance degradation, backups, or snapshots in the long term?
It would be worthwhile to consider this question in light of the use case. Normally when you have large datasets its better to leave the data on native storage; why are they such large virtual disks?
 
  • Like
Reactions: waltar