[HELP] Proxmox Backups Failing Over Cloudflare WARP Connector – Works on PVE2, Fails on PVE1

OptiNation

Member
Apr 6, 2022
3
0
6
Context:
I'm running two Proxmox VE nodes—PVE2 at my house and PVE1 at the data center—backing up to a remote Proxmox Backup Server (PBS VPS). Both use Cloudflare WARP Connector for connectivity.


✅ PVE2 backups complete successfully.
❌ PVE1 backups consistently fail partway through.





Network Setup​


  • PVE2 (Home):
    • UDM-Pro in front of pfSense
    • WARP MTU: 1280
    • No issues backing up
  • PVE1 (DC):
    • Only pfSense at edge
    • Same MTU 1280
    • Backups fail mid-transfer with pipelined request failed: timed out error



Troubleshooting Done​


  • Verified permissions (identical between nodes)
  • MTU on WARP interface is 1280 (CloudflareWARP)
  • Turned off TSO/GSO/GRO with ethtool (still fails)
  • Logs from PBS show:

    2025-07-21T22:06:45-06:00: starting new backup on datastore 'PTU-Backup1' from ::ffff:192.168.1.5: "vm/100/2025-07-22T04:06:45Z"
    2025-07-21T22:06:46-06:00: GET /previous: 400 Bad Request: no valid previous backup
    2025-07-21T22:06:46-06:00: created new fixed index 1 ("vm/100/2025-07-22T04:06:45Z/drive-sata0.img.fidx")
    2025-07-21T22:06:46-06:00: add blob "/Backup/PVE1/vm/100/2025-07-22T04:06:45Z/qemu-server.conf.blob" (500 bytes, comp: 500)
    2025-07-21T22:08:03-06:00: backup failed: connection error: bytes remaining on stream
    2025-07-21T22:08:03-06:00: removing failed backup
    2025-07-21T22:08:03-06:00: POST /fixed_chunk: 400 Bad Request: error reading a body from connection: bytes remaining on stream
    2025-07-21T22:08:03-06:00: TASK ERROR: connection error: bytes remaining on stream


    Logs from PVE1 Show

  • INFO: starting new backup job: vzdump 100 --node pve --notes-template '{{guestname}}' --mode snapshot --notification-mode auto --remove 0 --storage PTU-Backup-Server
    INFO: Starting Backup of VM 100 (qemu)
    INFO: Backup started at 2025-07-21 22:06:45
    INFO: status = running
    INFO: VM Name: Optimus-WOO-Prod-Store
    INFO: include disk 'sata0' 'local-lvm:vm-100-disk-0' 120G
    INFO: backup mode: snapshot
    INFO: ionice priority: 7
    INFO: creating Proxmox Backup Server archive 'vm/100/2025-07-22T04:06:45Z'
    INFO: started backup task 'b78a7df9-60a8-4036-a41b-dc50047bd548'
    INFO: resuming VM again
    INFO: sata0: dirty-bitmap status: existing bitmap was invalid and has been cleared
    INFO: 0% (160.0 MiB of 120.0 GiB) in 3s, read: 53.3 MiB/s, write: 33.3 MiB/s
    INFO: 0% (160.0 MiB of 120.0 GiB) in 1m 17s, read: 0 B/s, write: 0 B/s
    ERROR: backup write data failed: command error: protocol canceled
    INFO: aborting backup job
    INFO: resuming VM again
    ERROR: Backup of VM 100 failed - backup write data failed: command error: protocol canceled
    INFO: Failed at 2025-07-21 22:08:03
    INFO: Backup job finished with errors
    INFO: notified via target `mail-to-root`
    TASK ERROR: job errors

  • Adjusted pfSense timeouts — no difference



Hypothesis​


Looks like packet fragmentation / MTU issues over Cloudflare WARP cause transfers to silently fail, even though control-plane communication works. Odd that PVE2 works fine, but PVE1 fails every time, despite same MTU and config.




❓ My Questions:​


  1. Anyone else seen WARP-related fragmentation issues specifically on Proxmox backups?
  2. Would clamping MSS on the WARP interface help?
  3. Is this a known limitation of using WARP for large TCP streams (like VZDUMP image uploads)?
  4. Are there tunable buffer or TCP keepalive parameters that help with large backup uploads over WARP?



Bonus Info​


  • PBS server is reachable over WARP on both ends
  • ip link confirms WARP MTU is 1280
  • I can ping large packets with DF bit set, but only up to 1272 bytes payload



Would really appreciate community input or a workaround. Would rather not ditch WARP if I can help it — it’s secure, simple, and works well except for this.


Thanks in advance!