Backup (Vzdump) fails and hangs forever

RudyBzh

Member
Jul 9, 2020
18
1
23
44
Hi,
I have an issue with a VM backup task.
It often hangs and the only way to recover is to entirely reload proxmox.
I found several similar issues on the forum but none of my tries permits me to solve my issue.

root@pve:~# pveversion pve-manager/6.2-11/22fb4983 (running kernel: 5.4.60-1-pve)

Task informations :
Code:
INFO: starting new backup job: vzdump 102 --remove 0 --mode snapshot --storage i7-2700k --compress zstd --node pve
INFO: Starting Backup of VM 102 (qemu)
INFO: Backup started at 2020-09-04 11:23:29
INFO: status = running
INFO: VM Name: OLD-Ubuntu
INFO: include disk 'scsi0' 'local-lvm:vm-102-disk-1' 50G
INFO: include disk 'efidisk0' 'local-lvm:vm-102-disk-0' 4M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/i7-2700k/dump/vzdump-qemu-102-2020_09_04-11_23_29.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'b1061af6-74c5-41ac-bbfc-953a64de07a5'
INFO: resuming VM again
INFO:   1% (1007.8 MiB of 50.0 GiB) in  3s, read: 335.9 MiB/s, write: 327.6 MiB/s
INFO:   3% (1.9 GiB of 50.0 GiB) in  6s, read: 306.9 MiB/s, write: 303.7 MiB/s
INFO:   5% (2.8 GiB of 50.0 GiB) in  9s, read: 298.5 MiB/s, write: 296.2 MiB/s
INFO:   7% (3.6 GiB of 50.0 GiB) in 12s, read: 281.5 MiB/s, write: 272.5 MiB/s
INFO:   9% (4.5 GiB of 50.0 GiB) in 15s, read: 321.4 MiB/s, write: 317.9 MiB/s
INFO:  10% (5.4 GiB of 50.0 GiB) in 18s, read: 311.8 MiB/s, write: 311.5 MiB/s
INFO:  12% (6.3 GiB of 50.0 GiB) in 21s, read: 286.4 MiB/s, write: 278.2 MiB/s
INFO:  14% (7.2 GiB of 50.0 GiB) in 24s, read: 300.9 MiB/s, write: 299.9 MiB/s
INFO:  15% (7.9 GiB of 50.0 GiB) in 27s, read: 264.6 MiB/s, write: 263.8 MiB/s
INFO:  17% (8.7 GiB of 50.0 GiB) in 30s, read: 276.7 MiB/s, write: 274.4 MiB/s
INFO:  19% (9.5 GiB of 50.0 GiB) in 33s, read: 274.4 MiB/s, write: 273.7 MiB/s
INFO:  20% (10.4 GiB of 50.0 GiB) in 36s, read: 289.9 MiB/s, write: 283.1 MiB/s
INFO:  22% (11.2 GiB of 50.0 GiB) in 39s, read: 261.0 MiB/s, write: 261.0 MiB/s
INFO:  23% (12.0 GiB of 50.0 GiB) in 42s, read: 271.9 MiB/s, write: 271.7 MiB/s
INFO:  25% (12.8 GiB of 50.0 GiB) in 45s, read: 280.3 MiB/s, write: 271.1 MiB/s
INFO:  26% (13.4 GiB of 50.0 GiB) in 48s, read: 210.2 MiB/s, write: 209.3 MiB/s
INFO:  28% (14.1 GiB of 50.0 GiB) in 51s, read: 233.2 MiB/s, write: 232.5 MiB/s
INFO:  29% (14.6 GiB of 50.0 GiB) in 54s, read: 177.6 MiB/s, write: 168.7 MiB/s
INFO:  30% (15.2 GiB of 50.0 GiB) in 57s, read: 197.0 MiB/s, write: 194.6 MiB/s
INFO:  31% (15.8 GiB of 50.0 GiB) in  1m  0s, read: 200.6 MiB/s, write: 199.1 MiB/s
INFO:  32% (16.3 GiB of 50.0 GiB) in  1m  3s, read: 183.7 MiB/s, write: 175.7 MiB/s
INFO:  33% (16.7 GiB of 50.0 GiB) in  1m  6s, read: 152.2 MiB/s, write: 151.4 MiB/s
INFO:  34% (17.1 GiB of 50.0 GiB) in  1m  9s, read: 114.8 MiB/s, write: 114.6 MiB/s
INFO:  35% (17.6 GiB of 50.0 GiB) in  1m 14s, read: 105.1 MiB/s, write: 105.1 MiB/s
INFO:  36% (18.1 GiB of 50.0 GiB) in  1m 18s, read: 119.6 MiB/s, write: 118.9 MiB/s
INFO:  37% (18.6 GiB of 50.0 GiB) in  1m 21s, read: 198.2 MiB/s, write: 175.9 MiB/s
INFO:  38% (19.1 GiB of 50.0 GiB) in  1m 24s, read: 149.8 MiB/s, write: 130.5 MiB/s
INFO:  39% (19.6 GiB of 50.0 GiB) in  1m 28s, read: 132.3 MiB/s, write: 128.1 MiB/s
INFO:  40% (20.0 GiB of 50.0 GiB) in  1m 33s, read: 84.8 MiB/s, write: 82.0 MiB/s
INFO:  41% (20.5 GiB of 50.0 GiB) in  1m 42s, read: 55.2 MiB/s, write: 50.4 MiB/s
INFO:  42% (21.0 GiB of 50.0 GiB) in  1m 53s, read: 47.1 MiB/s, write: 30.4 MiB/s
INFO:  43% (21.5 GiB of 50.0 GiB) in  2m  1s, read: 64.3 MiB/s, write: 28.0 MiB/s
INFO:  44% (22.0 GiB of 50.0 GiB) in  2m  9s, read: 64.8 MiB/s, write: 31.1 MiB/s
INFO:  45% (22.5 GiB of 50.0 GiB) in  2m 16s, read: 74.6 MiB/s, write: 70.5 MiB/s
INFO:  46% (23.1 GiB of 50.0 GiB) in  2m 20s, read: 154.1 MiB/s, write: 153.3 MiB/s
INFO:  47% (23.6 GiB of 50.0 GiB) in  2m 23s, read: 157.1 MiB/s, write: 156.3 MiB/s
INFO:  48% (24.3 GiB of 50.0 GiB) in  2m 26s, read: 244.1 MiB/s, write: 231.4 MiB/s
INFO:  49% (24.9 GiB of 50.0 GiB) in  2m 29s, read: 197.1 MiB/s, write: 185.1 MiB/s
INFO:  51% (25.6 GiB of 50.0 GiB) in  2m 32s, read: 239.1 MiB/s, write: 233.7 MiB/s
INFO:  52% (26.2 GiB of 50.0 GiB) in  2m 35s, read: 224.8 MiB/s, write: 215.4 MiB/s
INFO:  53% (26.7 GiB of 50.0 GiB) in  2m 38s, read: 143.9 MiB/s, write: 128.0 MiB/s
INFO:  54% (27.2 GiB of 50.0 GiB) in  2m 41s, read: 171.5 MiB/s, write: 163.1 MiB/s
INFO:  55% (27.7 GiB of 50.0 GiB) in  2m 44s, read: 169.2 MiB/s, write: 159.1 MiB/s
INFO:  56% (28.0 GiB of 50.0 GiB) in  2m 48s, read: 89.7 MiB/s, write: 87.3 MiB/s
INFO:  57% (28.5 GiB of 50.0 GiB) in  3m  8s, read: 25.6 MiB/s, write: 24.7 MiB/s
INFO:  58% (29.0 GiB of 50.0 GiB) in  3m 28s, read: 26.1 MiB/s, write: 21.9 MiB/s
INFO:  59% (29.6 GiB of 50.0 GiB) in  3m 31s, read: 186.1 MiB/s, write: 151.0 MiB/s
INFO:  60% (30.5 GiB of 50.0 GiB) in  3m 34s, read: 317.0 MiB/s, write: 275.3 MiB/s
INFO:  62% (31.3 GiB of 50.0 GiB) in  3m 37s, read: 264.2 MiB/s, write: 195.5 MiB/s
INFO:  64% (32.4 GiB of 50.0 GiB) in  3m 40s, read: 377.0 MiB/s, write: 270.1 MiB/s
INFO:  65% (32.7 GiB of 50.0 GiB) in  3m 43s, read: 110.0 MiB/s, write: 102.6 MiB/s
INFO:  66% (33.5 GiB of 50.0 GiB) in  3m 46s, read: 274.3 MiB/s, write: 160.7 MiB/s
INFO:  68% (34.5 GiB of 50.0 GiB) in  3m 49s, read: 326.2 MiB/s, write: 219.6 MiB/s
INFO:  70% (35.2 GiB of 50.0 GiB) in  3m 52s, read: 240.5 MiB/s, write: 157.1 MiB/s
INFO:  72% (36.1 GiB of 50.0 GiB) in  3m 55s, read: 320.2 MiB/s, write: 124.2 MiB/s
INFO:  73% (36.5 GiB of 50.0 GiB) in  4m  2s, read: 65.2 MiB/s, write: 51.6 MiB/s
INFO:  74% (37.0 GiB of 50.0 GiB) in  4m 19s, read: 28.1 MiB/s, write: 19.1 MiB/s
INFO:  75% (37.5 GiB of 50.0 GiB) in  4m 36s, read: 32.0 MiB/s, write: 19.0 MiB/s
INFO:  76% (38.2 GiB of 50.0 GiB) in  4m 39s, read: 217.5 MiB/s, write: 100.0 MiB/s
INFO:  77% (38.6 GiB of 50.0 GiB) in  4m 42s, read: 152.2 MiB/s, write: 121.2 MiB/s
INFO:  79% (39.9 GiB of 50.0 GiB) in  4m 45s, read: 435.1 MiB/s, write: 214.5 MiB/s
INFO:  82% (41.2 GiB of 50.0 GiB) in  4m 48s, read: 457.9 MiB/s, write: 231.1 MiB/s
INFO:  85% (42.6 GiB of 50.0 GiB) in  4m 51s, read: 474.9 MiB/s, write: 236.9 MiB/s
INFO:  88% (44.1 GiB of 50.0 GiB) in  4m 54s, read: 488.2 MiB/s, write: 236.7 MiB/s
INFO:  91% (46.0 GiB of 50.0 GiB) in  4m 57s, read: 652.0 MiB/s, write: 249.0 MiB/s
INFO:  99% (49.5 GiB of 50.0 GiB) in  5m  0s, read: 1.2 GiB/s, write: 72.0 MiB/s
INFO: 100% (50.0 GiB of 50.0 GiB) in  5m  1s, read: 483.1 MiB/s, write: 8.0 KiB/s
INFO: backup is sparse: 12.38 GiB (24%) total zero data
INFO: transferred 50.00 GiB in 301 seconds (170.1 MiB/s)

It's then staying here for hours keeping a lock on the VM. Impossible to kill related processes.

Storage is on a CIFS share :
Code:
root@pve:~# more /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content iso,vztmpl
maxfiles 2
shared 0

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

cifs: i7-2700k
path /mnt/pve/i7-2700k
server 192.168.1.2
share SauvegardesPVE
content backup
maxfiles 5
username XXXX

root@pve:~# mount -t cifs
//192.168.1.2/SauvegardesPVE on /mnt/pve/i7-2700k type cifs (rw,relatime,vers=3.0,cache=strict,username=XXXX,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.1.2,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1)

Already tried to tune /etc/vzdump.conf to uncomment tmpdir to use local FS :
Code:
root@pve:~# more /etc/vzdump.conf
# vzdump default settings

tmpdir: /var/lib/vz/vztmp
#dumpdir: DIR
#storage: STORAGE_ID
#mode: snapshot|suspend|stop
#bwlimit: KBPS
#ionice: PRI
#lockwait: MINUTES
#stopwait: MINUTES
#size: MB
#stdexcludes: BOOLEAN
#mailto: ADDRESSLIST
#maxfiles: N
#script: FILENAME
#exclude-path: PATHLIST
#pigz: N

Related syslog attached.

I'd like to notice than I can succesfully backup others smaller VM/LXC on the same storage, so it's not accessibility issues to the CIFS for me.

Any help please ?

Thanks.
Rudy.
 

Attachments

Same here.
VzDump Backup to CIFS share on Windoof 10 always hang the complete PVE server. Have to reboot PVE.
Last line is always
INFO: transferred xx.xx GiB in xx seconds (xx.x MiB/s)
so the backup is finished, but the Task OK never happens.
 
Some more informations from the console:

[60460.969050] CIFS VFS: \\192.168.179.136 has not responded in 180 seconds. Reconnecting...
[62229.695203] INFO: task zstd:36670 blocked for more than 120 seconds.
[62229.695798] Tainted: P O 5.4.78-2-pve #1
[62229.696084] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[62229.696572] INFO: task kworker/13:32:38180 blocked for more than 120 seconds.
[62229.696844] Tainted: P O 5.4.78-2-pve #1
[62229.697108] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
... this goes endless if I don't reboot PVE.

I can connect from other computers to the CIFS server, it is responding.
So it's not a problem of the CIFS server.
 
With LZO compression, same problem, different messages:

[ 2331.460588] CIFS VFS: \\192.168.179.136 Cancelling wait for mid 2943494 cmd: 5
[ 2331.460643] CIFS VFS: \\192.168.179.136 Cancelling wait for mid 2943495 cmd: 16
[ 2331.460666] CIFS VFS: \\192.168.179.136 Cancelling wait for mid 2943496 cmd: 6
[ 2331.562936] CIFS VFS: Close unmatched open
[ 2418.240281] INFO: task lzop:8669 blocked for more than 120 seconds.
[ 2418.240338] Tainted: P O 5.4.78-2-pve #1
[ 2418.240379] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 2418.240563] INFO: task kworker/27:5:10560 blocked for more than 120 seconds.
[ 2418.240586] Tainted: P O 5.4.78-2-pve #1
[ 2418.240603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 
Hi All.
Same here
Virtual Environment 6.3-3

Backup with ZSTD on local folder mounted via ftpfs
 
Hi,

I am seeing the exact same issue with PVE 6.4.8

Always happens after finishing one of the VMs in the job (not the same)
INFO: transferred xxx GiB in xxx seconds (xxx MiB/s)

Did anyone find a solution for this issue ?

cheers
Michael
 
Still happens unfortunately. Renders automated backups useless. Disabling compression alleviates the issue but is not preferred.
 
Last edited: