Backup of VM fails: broken pipe

dejhost

Member
Dec 13, 2020
64
1
13
45
I mounted a disk from a pve within my LAN, outside my cluster, using sshfs. On this disk, there are about 9TB of free disk space:
Code:
Usage 10.09% (931.79 GB of 9.24 TB)

I want to backup one of my VM's (about 2.5TB) onto this disk, but the process fails:

Task viewer:
VM/CT 110 - Backup OutputStatus Stop INFO: starting new backup job: vzdump 110 --remove 0 --node proxmox03 --compress zstd --notes-template '{{cluster}}, {{guestname}}, {{node}}, {{vmid}}' --mode stop --storage migrate INFO: Starting Backup of VM 110 (qemu) INFO: Backup started at 2022-12-30 09:33:38 INFO: status = stopped INFO: backup mode: stop INFO: ionice priority: 7 INFO: VM Name: NC-host02 INFO: include disk 'scsi0' 'local-lvm:vm-110-disk-0' 32G INFO: include disk 'virtio1' 'Raid1:vm-110-disk-0' 2500G INFO: snapshots found (not included into backup) INFO: creating vzdump archive '/mnt/migrate/dump/vzdump-qemu-110-2022_12_30-09_33_38.vma.zst' INFO: starting kvm to execute backup task INFO: started backup task '93df4bd7-c37f-48d5-b0f2-cffe078fe2d3' INFO: 0% (235.6 MiB of 2.5 TiB) in 3s, read: 78.5 MiB/s, write: 67.7 MiB/s INFO: 1% (25.3 GiB of 2.5 TiB) in 6m 6s, read: 70.9 MiB/s, write: 69.5 MiB/s INFO: 2% (50.6 GiB of 2.5 TiB) in 11m 52s, read: 74.9 MiB/s, write: 73.6 MiB/s .... ... ... NFO: 48% (1.2 TiB of 2.5 TiB) in 3h 58m, read: 102.1 MiB/s, write: 93.6 MiB/s INFO: 49% (1.2 TiB of 2.5 TiB) in 4h 2m 14s, read: 102.0 MiB/s, write: 94.9 MiB/s INFO: 50% (1.2 TiB of 2.5 TiB) in 4h 6m 34s, read: 100.4 MiB/s, write: 94.0 MiB/s zstd: error 25 : Write error : Input/output error (cannot write compressed block) INFO: 50% (1.2 TiB of 2.5 TiB) in 4h 7m 13s, read: 98.5 MiB/s, write: 95.0 MiB/s ERROR: vma_queue_write: write error - Broken pipe INFO: aborting backup job INFO: stopping kvm after backup task trying to acquire lock... OK ERROR: Backup of VM 110 failed - vma_queue_write: write error - Broken pipe INFO: Failed at 2022-12-30 13:41:00 INFO: Backup job finished with error TASK ERROR: job errors


I repeated the backup-task, with exactly the same outcome, at the same 50%. Last night, I initiated a third attempt, swichting to gzip as compression method. I have not reached the critical 50% yet.

Kernel Version: Linux 5.15.35-1-pve #1 SMP PVE 5.15.35-3



most recent /var/log/syslog shows:
Dec 31 10:42:54 proxmox03 ceph-mon[1692505]: 2022-12-31T10:42:54.834+0100 7fb86e7a5700 -1 log_channel(cluster) log [ERR] : pg 2.7c not scrubbed since 2022-11-27T08:47:40.416372+0100 Dec 31 10:42:54 proxmox03 ceph-mon[1692505]: 2022-12-31T10:42:54.834+0100 7fb86e7a5700 -1 log_channel(cluster) log [ERR] : [WRN] POOL_BACKFILLFULL: 4 pool(s) backfillfull Dec 31 10:42:54 proxmox03 ceph-mon[1692505]: 2022-12-31T10:42:54.834+0100 7fb86e7a5700 -1 log_channel(cluster) log [ERR] : pool 'Raid1' is backfillfull Dec 31 10:42:54 proxmox03 ceph-mon[1692505]: 2022-12-31T10:42:54.834+0100 7fb86e7a5700 -1 log_channel(cluster) log [ERR] : pool 'device_health_metrics' is backfillfull Dec 31 10:42:54 proxmox03 ceph-mon[1692505]: 2022-12-31T10:42:54.834+0100 7fb86e7a5700 -1 log_channel(cluster) log [ERR] : pool 'cephfs_data' is backfillfull Dec 31 10:42:54 proxmox03 ceph-mon[1692505]: 2022-12-31T10:42:54.834+0100 7fb86e7a5700 -1 log_channel(cluster) log [ERR] : pool 'cephfs_metadata' is backfillfull Dec 31 10:42:54 proxmox03 ceph-mon[1692505]: 2022-12-31T10:42:54.834+0100 7fb86e7a5700 -1 log_channel(cluster) log [ERR] : [WRN] POOL_TOO_FEW_PGS: 1 pools have too few placement groups Dec 31 10:42:54 proxmox03 ceph-mon[1692505]: 2022-12-31T10:42:54.834+0100 7fb86e7a5700 -1 log_channel(cluster) log [ERR] : Pool device_health_metrics has 8 placement groups, should have 32 Dec 31 10:48:22 proxmox03 pvedaemon[3164583]: <root@pam> starting task UPID:proxmox03:003B4E6A:128D3757:63B00566:vncshell::root@pam: Dec 31 10:48:22 proxmox03 pvedaemon[3886698]: starting termproxy UPID:proxmox03:003B4E6A:128D3757:63B00566:vncshell::root@pam: Dec 31 10:48:23 proxmox03 pvedaemon[3861219]: <root@pam> successful auth for user 'root@pam' Dec 31 10:48:23 proxmox03 systemd[1]: Started Session 2619 of user root. Dec 31 10:50:00 proxmox03 ceph-mon[1692505]: 2022-12-31T10:49:59.996+0100 7fb8727ad700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR mons are allowing insecure global_id reclaim; Module 'devicehealth' has failed: ; mon proxmox03 is low on available space; 1 backfillfull osd(s); 3 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 48 pgs backfill_toofull; 48 pgs not deep-scrubbed in time; 48 pgs not scrubbed in time; 4 pool(s) backfillfull; 1 pools have too few placement groups Dec 31 10:50:31 proxmox03 systemd[1]: Starting Cleanup of Temporary Directories... Dec 31 10:50:31 proxmox03 systemd[1]: systemd-tmpfiles-clean.service: Succeeded. Dec 31 10:50:31 proxmox03 systemd[1]: Finished Cleanup of Temporary Directories. Dec 31 10:51:30 proxmox03 pmxcfs[1680221]: [dcdb] notice: data verification successful Dec 31 10:58:10 proxmox03 corosync[1680017]: [KNET ] link: host: 2 link: 0 is down Dec 31 10:58:10 proxmox03 corosync[1680017]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Dec 31 10:58:10 proxmox03 corosync[1680017]: [KNET ] host: host: 2 has no active links Dec 31 10:58:11 proxmox03 corosync[1680017]: [KNET ] rx: host: 2 link: 0 is up Dec 31 10:58:11 proxmox03 corosync[1680017]: [KNET ] link: Resetting MTU for link 0 because host 2 joined Dec 31 10:58:11 proxmox03 corosync[1680017]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1) Dec 31 10:58:11 proxmox03 corosync[1680017]: [KNET ] pmtud: Global data MTU changed to: 1397 Dec 31 11:00:00 proxmox03 ceph-mon[1692505]: 2022-12-31T10:59:59.999+0100 7fb8727ad700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR mons are allowing insecure global_id reclaim; Module 'devicehealth' has failed: ; mon proxmox03 is low on available space; 1 backfillfull osd(s); 3 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 48 pgs backfill_toofull; 48 pgs not deep-scrubbed in time; 48 pgs not scrubbed in time; 4 pool(s) backfillfull; 1 pools have too few placement groups Dec 31 11:10:00 proxmox03 ceph-mon[1692505]: 2022-12-31T11:09:59.995+0100 7fb8727ad700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR mons are allowing insecure global_id reclaim; Module 'devicehealth' has failed: ; mon proxmox03 is low on available space; 1 backfillfull osd(s); 3 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 48 pgs backfill_toofull; 48 pgs not deep-scrubbed in time; 48 pgs not scrubbed in time; 4 pool(s) backfillfull; 1 pools have too few placement groups Dec 31 11:17:01 proxmox03 CRON[3904114]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Dec 31 11:20:00 proxmox03 ceph-mon[1692505]: 2022-12-31T11:19:59.994+0100 7fb8727ad700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR mons are allowing insecure global_id reclaim; Module 'devicehealth' has failed: ; mon proxmox03 is low on available space; 1 backfillfull osd(s); 3 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 48 pgs backfill_toofull; 48 pgs not deep-scrubbed in time; 48 pgs not scrubbed in time; 4 pool(s) backfillfull; 1 pools have too few placement groups Dec 31 11:30:00 proxmox03 ceph-mon[1692505]: 2022-12-31T11:29:59.993+0100 7fb8727ad700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR mons are allowing insecure global_id reclaim; Module 'devicehealth' has failed: ; mon proxmox03 is low on available space; 1 backfillfull osd(s); 3 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 48 pgs backfill_toofull; 48 pgs not deep-scrubbed in time; 48 pgs not scrubbed in time; 4 pool(s) backfillfull; 1 pools have too few placement groups Dec 31 11:40:00 proxmox03 ceph-mon[1692505]: 2022-12-31T11:39:59.993+0100 7fb8727ad700 -1 log_channel(cluster) log [ERR] : overall HEALTH_ERR mons are allowing insecure global_id reclaim; Module 'devicehealth' has failed: ; mon proxmox03 is low on available space; 1 backfillfull osd(s); 3 nearfull osd(s); Low space hindering backfill (add storage if this doesn't resolve itself): 48 pgs backfill_toofull; 48 pgs not deep-scrubbed in time; 48 pgs not scrubbed in time; 4 pool(s) backfillfull; 1 pools have too few placement groups Dec 31 11:46:23 proxmox03 systemd[1]: session-2619.scope: Succeeded. Dec 31 11:46:23 proxmox03 pvedaemon[3164583]: <root@pam> end task UPID:proxmox03:003B4E6A:128D3757:63B00566:vncshell::root@pam: OK Dec 31 11:46:25 proxmox03 pvedaemon[3164583]: <root@pam> successful auth for user 'root@pam' Dec 31 11:46:26 proxmox03 pvedaemon[3921926]: starting termproxy UPID:proxmox03:003BD806:12928828:63B01302:vncshell::root@pam: Dec 31 11:46:26 proxmox03 pvedaemon[3164583]: <root@pam> starting task UPID:proxmox03:003BD806:12928828:63B01302:vncshell::root@pam: Dec 31 11:46:26 proxmox03 pvedaemon[3069312]: <root@pam> successful auth for user 'root@pam' Dec 31 11:46:26 proxmox03 systemd[1]: Started Session 2622 of user root. root@proxmox03:~#

This thread suggests that the local drive needs to have sufficient space to temporarily store the entire VM. But if this is the case, I would need a workaround since I cannot store 2.5Tb on the local drive...

Could you please help me to troubleshoot this?
 
A friend of mine solved this: He realized that my target drive is CEPHfs, and that the max file size was limited to 1TB.
the command
fs set <fs name> max_file_size <size in bytes>
increased the max file size of the target drive and the backup went fine afterwards.

hope this helps somebody else.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!