[HELP] Proxmox Backup Server stuck at 0% when backing up VM with 30TB disk

flathill

New Member
May 24, 2024
5
1
3
Hello Proxmox community,

I'm encountering an issue with Proxmox Backup Server when trying to back up a virtual machine that contains a very large disk (30TB). I've been attempting different approaches but haven't been able to make progress. I'd greatly appreciate any guidance or best practices for handling backups of VMs with extremely large disks.

Environment Details:​

  • Proxmox Version: 8.4.1
  • VM ID: 317
  • VM Disks:
    • scsi0: 200GB
    • scsi1: 30TB
    • efidisk0: 128KB
  • Backup Storage: PBS (Proxmox Backup Server)
    • Storage is active with plenty of available space (~10TB)
  • VM State: Currently powered off during backup attempts

Issue Description:​

When attempting to back up this VM, the process starts but seems to get stuck at 0% (3.5 GiB of 29.5 TiB). The backup job shows activity (read speed: 1.2 GiB/s, write: 156.0 MiB/s) but doesn't make visible progress beyond this point, even after waiting for extended periods.

What I've Tried So Far:​

  1. Standard backup command:

    Copyvzdump 317 --storage pbs_backup --mode stop --compress zstd<br>
  2. Enabling discard option for all VM disks through the web UI
  3. Using backup fleecing:

    Copyvzdump 317 --storage pbs_backup --mode stop --fleecing enabled=1,storage=local-lvm --compress zstd<br>
    However, this resulted in an error as the fleecing storage needs were too large:

    <span>WARNING: </span>Sum of all thin volume sizes (25.49 TiB) exceeds the size of thin pool pve/data and the size of whole volume group (446.07 GiB).<br>
  4. Process analysis: Using strace -p [PID] on the vzdump process shows that it's connecting to the QEMU monitor socket and executing commands, but the "total" value in the QMP commands remains at about 32GB:

    <span>read</span>(<span>17</span>, <span>"{\\"</span>return\\<span>": {\\"</span>total\\<span>": 32427003215"</span>..., <span>8192</span>) = <span>332</span><br>
    The process repeats this pattern with minor variations but doesn't appear to make significant progress.
  5. Checking process status: The vzdump process (PID 3477135) appears to be in "Ss" state (interruptible sleep), not in "D" state.

Questions:​

  1. Is a 30TB disk too large for standard Proxmox Backup Server functionality? What are the practical size limits?
  2. Are there recommended approaches for backing up VMs with extremely large disks?
  3. Should I consider alternative approaches like:
    • Splitting the backup job to handle each disk separately?
    • Using a different backup strategy for the large disk?
    • Implementing file-level backup within the VM instead?
  4. Would adjusting parameters like compression, buffer sizes, or timeouts help with a disk of this size?
I've searched the forums but haven't found specific guidance for handling backups of disks in this size range. Any advice, best practices, or alternative strategies would be greatly appreciated.

Thank you for your time and assistance.

Best regards,
 
Update on my previous post about backing up a VM with a 30TB disk:

I've seen some progress with the backup! After waiting for a while, the process moved from 0% to 1% (from 3.6 GiB to 302.0 GiB) in about 13 minutes 39 seconds. The transfer rates are now stable at around 374.5 MiB/s read and 88.3 MiB/s write.

Based on these rates, I'm estimating it might take 24-48 hours to complete the entire backup. I plan to let it continue and monitor the progress periodically.

While waiting, I'd appreciate any feedback from the community:

1. Is attempting to back up a 30TB disk directly via PBS a reasonable approach? Or is this pushing the practical limits of Proxmox's backup capabilities?

2. For future planning, would you recommend alternative approaches for managing backups of very large VMs in a FC-connected LVM storage environment (where storage snapshots aren't available)?

3. Are there any specific tuning parameters or best practices I should apply for these large-scale backups to ensure reliability and optimal performance?

Thank you for your insights and assistance!