Proxmox-Backup-Server 2.3.1-1 stuck after backup

Kadrim

Well-Known Member
May 20, 2018
47
2
48
41
Hi there,

i am using PBS for quite some time now but after the recent switch to 2.3.x a problem occured:

On some VMs the backup is stuck after reaching 100% with the message "waiting for server to finish backup validation".

I *don't* have the option "Verify new backups immediately after completion" enabled, so there should be no point in waiting. But on version 2.3.1-1 the backup is stuck there for around (and maybe exactly?) 24 hours.

The syslog doesn't show anything related to this issue, the machines (PVE and PBS) are pretty much idle with no iowait whatsoever.

So i started digging, recreated new PBS-Datastores, deleted existing VM Backups alltogether but always with the same behaviour.

The relevant source is maybe this file: https://github.com/proxmox/qemu-server/blob/master/PVE/VZDump/QemuServer.pm or at least connected to this issue. Probably the call "mon_cmd($vmid, 'query-backup');" is stuck in a way.

I reverted back to PBS version 2.2.8-1 and the issue is gone.

Best regards
 
any clue on the PBS side in the task log there maybe?
 
nope, checked that aswell. nothing out of the ordinary.

if you can point me somewhere to look or maybe tell me where to implement additional logging (the above mentioned .pm file?) then i can run some tests if you like
 
yes, if you can easily reproduce it you could add the following code at this location:

Code:
            if (!$last_finishing && $status->{finishing}) {
                $self->loginfo("Waiting for server to finish backup validation...");
            }
            $last_finishing = $status->{finishing};
        }
        HERE
        sleep(1);

Code:
if ($last_finishing) {
use Data::Dumper;
$self->loginfo(Dumper($status));
}

and reload pveproxy and pvedaemon: systemctl reload-or-restart pveproxy pvedaemon

if something goes wrong with the edit, apt install --reinstall qemu-server should fix it again by reverting to the stock version.

please post the full output (you can abort the task if it doesn't make anymore progress..)
 
Code:
INFO: Starting Backup of VM 102 (qemu)
INFO: Backup started at 2022-12-22 12:02:32
INFO: status = running
INFO: VM Name: vWin10
INFO: include disk 'scsi0' 'ssdpool:vm-102-disk-1' 160G
INFO: include disk 'efidisk0' 'ssdpool:vm-102-disk-0' 1M
INFO: include disk 'tpmstate0' 'ssdpool:vm-102-disk-2' 4M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: HOOK: backup-start snapshot 102
INFO: HOOK-ENV: vmtype=qemu;dumpdir=;storeid=pbs;hostname=vWin10;tarfile=;logfile=
INFO: HOOK: pre-stop snapshot 102
INFO: HOOK-ENV: vmtype=qemu;dumpdir=;storeid=pbs;hostname=vWin10;tarfile=;logfile=
INFO: HOOK: pre-restart snapshot 102
INFO: HOOK-ENV: vmtype=qemu;dumpdir=;storeid=pbs;hostname=vWin10;tarfile=;logfile=
INFO: HOOK: post-restart snapshot 102
INFO: HOOK-ENV: vmtype=qemu;dumpdir=;storeid=pbs;hostname=vWin10;tarfile=;logfile=
INFO: creating Proxmox Backup Server archive 'vm/102/2022-12-22T11:02:32Z'
INFO: attaching TPM drive to QEMU for backup
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'd757fc40-515d-4724-b4f2-ac09bbdfc170'
INFO: resuming VM again
INFO: efidisk0: dirty-bitmap status: created new
INFO: scsi0: dirty-bitmap status: created new
INFO: tpmstate0-backup: dirty-bitmap status: created new
INFO:   0% (1.1 GiB of 160.0 GiB) in 3s, read: 368.0 MiB/s, write: 180.0 MiB/s
INFO:   1% (1.7 GiB of 160.0 GiB) in 7s, read: 158.0 MiB/s, write: 139.0 MiB/s
INFO:   2% (3.3 GiB of 160.0 GiB) in 19s, read: 138.7 MiB/s, write: 105.3 MiB/s
INFO:   3% (4.9 GiB of 160.0 GiB) in 29s, read: 164.0 MiB/s, write: 96.0 MiB/s
INFO:   4% (6.5 GiB of 160.0 GiB) in 55s, read: 62.2 MiB/s, write: 33.2 MiB/s
INFO:   5% (8.2 GiB of 160.0 GiB) in 1m 4s, read: 194.7 MiB/s, write: 104.9 MiB/s
INFO:   6% (9.7 GiB of 160.0 GiB) in 1m 13s, read: 165.3 MiB/s, write: 115.1 MiB/s
INFO:   7% (11.2 GiB of 160.0 GiB) in 1m 20s, read: 226.9 MiB/s, write: 118.3 MiB/s
INFO:   8% (13.0 GiB of 160.0 GiB) in 1m 45s, read: 74.7 MiB/s, write: 49.1 MiB/s
INFO:   9% (14.6 GiB of 160.0 GiB) in 1m 51s, read: 268.0 MiB/s, write: 126.7 MiB/s
INFO:  10% (16.1 GiB of 160.0 GiB) in 1m 59s, read: 196.0 MiB/s, write: 120.0 MiB/s
INFO:  11% (17.8 GiB of 160.0 GiB) in 2m 21s, read: 78.0 MiB/s, write: 33.8 MiB/s
INFO:  12% (19.2 GiB of 160.0 GiB) in 2m 26s, read: 288.0 MiB/s, write: 152.0 MiB/s
INFO:  13% (21.0 GiB of 160.0 GiB) in 2m 32s, read: 298.7 MiB/s, write: 94.7 MiB/s
INFO:  14% (22.5 GiB of 160.0 GiB) in 2m 36s, read: 388.0 MiB/s, write: 128.0 MiB/s
INFO:  15% (24.1 GiB of 160.0 GiB) in 2m 54s, read: 93.1 MiB/s, write: 34.4 MiB/s
INFO:  16% (25.8 GiB of 160.0 GiB) in 2m 59s, read: 335.2 MiB/s, write: 116.0 MiB/s
INFO:  17% (27.4 GiB of 160.0 GiB) in 3m 4s, read: 344.8 MiB/s, write: 140.8 MiB/s
INFO:  18% (28.8 GiB of 160.0 GiB) in 3m 8s, read: 352.0 MiB/s, write: 154.0 MiB/s
INFO:  19% (30.5 GiB of 160.0 GiB) in 3m 28s, read: 85.0 MiB/s, write: 53.2 MiB/s
INFO:  20% (32.2 GiB of 160.0 GiB) in 3m 36s, read: 226.0 MiB/s, write: 109.0 MiB/s
INFO:  21% (33.8 GiB of 160.0 GiB) in 3m 41s, read: 316.8 MiB/s, write: 152.0 MiB/s
INFO:  22% (35.3 GiB of 160.0 GiB) in 3m 46s, read: 312.8 MiB/s, write: 152.8 MiB/s
INFO:  23% (36.8 GiB of 160.0 GiB) in 4m 5s, read: 81.1 MiB/s, write: 35.8 MiB/s
INFO:  24% (38.5 GiB of 160.0 GiB) in 4m 9s, read: 423.0 MiB/s, write: 144.0 MiB/s
INFO:  25% (40.0 GiB of 160.0 GiB) in 4m 14s, read: 321.6 MiB/s, write: 96.8 MiB/s
INFO:  26% (42.0 GiB of 160.0 GiB) in 4m 18s, read: 494.0 MiB/s, write: 92.0 MiB/s
INFO:  27% (43.4 GiB of 160.0 GiB) in 4m 22s, read: 359.0 MiB/s, write: 155.0 MiB/s
INFO:  28% (45.0 GiB of 160.0 GiB) in 4m 39s, read: 99.3 MiB/s, write: 31.3 MiB/s
INFO:  29% (46.5 GiB of 160.0 GiB) in 4m 43s, read: 373.0 MiB/s, write: 93.0 MiB/s
INFO:  30% (48.4 GiB of 160.0 GiB) in 4m 47s, read: 496.0 MiB/s, write: 114.0 MiB/s
INFO:  31% (50.3 GiB of 160.0 GiB) in 4m 50s, read: 630.7 MiB/s, write: 62.7 MiB/s
INFO:  32% (51.5 GiB of 160.0 GiB) in 4m 53s, read: 428.0 MiB/s, write: 80.0 MiB/s
INFO:  33% (53.0 GiB of 160.0 GiB) in 4m 56s, read: 517.3 MiB/s, write: 94.7 MiB/s
INFO:  34% (55.0 GiB of 160.0 GiB) in 4m 59s, read: 674.7 MiB/s, write: 86.7 MiB/s
INFO:  35% (56.4 GiB of 160.0 GiB) in 5m 2s, read: 481.3 MiB/s, write: 104.0 MiB/s
INFO:  36% (58.2 GiB of 160.0 GiB) in 5m 5s, read: 601.3 MiB/s, write: 57.3 MiB/s
INFO:  37% (60.2 GiB of 160.0 GiB) in 5m 8s, read: 684.0 MiB/s, write: 37.3 MiB/s
INFO:  38% (61.6 GiB of 160.0 GiB) in 5m 11s, read: 470.7 MiB/s, write: 116.0 MiB/s
INFO:  39% (63.1 GiB of 160.0 GiB) in 5m 14s, read: 521.3 MiB/s, write: 52.0 MiB/s
INFO:  40% (64.7 GiB of 160.0 GiB) in 5m 17s, read: 549.3 MiB/s, write: 77.3 MiB/s
INFO:  41% (66.9 GiB of 160.0 GiB) in 5m 20s, read: 740.0 MiB/s, write: 61.3 MiB/s
INFO:  42% (67.3 GiB of 160.0 GiB) in 5m 23s, read: 160.0 MiB/s, write: 20.0 MiB/s
INFO:  43% (69.0 GiB of 160.0 GiB) in 5m 37s, read: 123.4 MiB/s, write: 18.0 MiB/s
INFO:  44% (70.6 GiB of 160.0 GiB) in 5m 40s, read: 541.3 MiB/s, write: 96.0 MiB/s
INFO:  45% (72.2 GiB of 160.0 GiB) in 5m 43s, read: 537.3 MiB/s, write: 53.3 MiB/s
INFO:  46% (73.9 GiB of 160.0 GiB) in 5m 46s, read: 596.0 MiB/s, write: 24.0 MiB/s
INFO:  47% (75.9 GiB of 160.0 GiB) in 5m 49s, read: 665.3 MiB/s, write: 49.3 MiB/s
INFO:  48% (77.4 GiB of 160.0 GiB) in 5m 52s, read: 526.7 MiB/s, write: 24.0 MiB/s
INFO:  49% (79.1 GiB of 160.0 GiB) in 5m 55s, read: 553.3 MiB/s, write: 42.7 MiB/s
INFO:  50% (80.8 GiB of 160.0 GiB) in 5m 58s, read: 585.3 MiB/s, write: 57.3 MiB/s
INFO:  51% (83.0 GiB of 160.0 GiB) in 6m 1s, read: 748.0 MiB/s, write: 26.7 MiB/s
INFO:  52% (84.6 GiB of 160.0 GiB) in 6m 4s, read: 554.7 MiB/s, write: 64.0 MiB/s
INFO:  53% (86.1 GiB of 160.0 GiB) in 6m 7s, read: 534.7 MiB/s, write: 100.0 MiB/s
INFO:  54% (87.9 GiB of 160.0 GiB) in 6m 10s, read: 612.0 MiB/s, write: 62.7 MiB/s
INFO:  55% (89.5 GiB of 160.0 GiB) in 6m 13s, read: 525.3 MiB/s, write: 81.3 MiB/s
INFO:  57% (91.4 GiB of 160.0 GiB) in 6m 16s, read: 638.7 MiB/s, write: 49.3 MiB/s
INFO:  58% (93.1 GiB of 160.0 GiB) in 6m 19s, read: 604.0 MiB/s, write: 44.0 MiB/s
INFO:  59% (94.8 GiB of 160.0 GiB) in 6m 22s, read: 577.3 MiB/s, write: 9.3 MiB/s
INFO:  60% (96.6 GiB of 160.0 GiB) in 6m 25s, read: 618.7 MiB/s, write: 17.3 MiB/s
INFO:  61% (98.6 GiB of 160.0 GiB) in 6m 28s, read: 657.3 MiB/s, write: 64.0 MiB/s
INFO:  62% (100.3 GiB of 160.0 GiB) in 6m 31s, read: 584.0 MiB/s, write: 42.7 MiB/s
INFO:  63% (102.1 GiB of 160.0 GiB) in 6m 34s, read: 633.3 MiB/s, write: 13.3 MiB/s
INFO:  65% (104.3 GiB of 160.0 GiB) in 6m 37s, read: 749.3 MiB/s, write: 41.3 MiB/s
INFO:  66% (106.4 GiB of 160.0 GiB) in 6m 40s, read: 698.7 MiB/s, write: 30.7 MiB/s
INFO:  68% (109.1 GiB of 160.0 GiB) in 6m 43s, read: 920.0 MiB/s, write: 28.0 MiB/s
INFO:  69% (111.4 GiB of 160.0 GiB) in 6m 46s, read: 793.3 MiB/s, write: 9.3 MiB/s
INFO:  71% (113.6 GiB of 160.0 GiB) in 6m 50s, read: 579.0 MiB/s, write: 27.0 MiB/s
INFO:  73% (116.8 GiB of 160.0 GiB) in 6m 53s, read: 1.1 GiB/s, write: 20.0 MiB/s
INFO:  74% (118.7 GiB of 160.0 GiB) in 7m 12s, read: 100.4 MiB/s, write: 31.8 MiB/s
INFO:  75% (121.4 GiB of 160.0 GiB) in 7m 15s, read: 946.7 MiB/s, write: 44.0 MiB/s
INFO:  77% (124.6 GiB of 160.0 GiB) in 7m 18s, read: 1.0 GiB/s, write: 24.0 MiB/s
INFO:  79% (126.8 GiB of 160.0 GiB) in 7m 21s, read: 756.0 MiB/s, write: 25.3 MiB/s
INFO:  80% (129.4 GiB of 160.0 GiB) in 7m 24s, read: 902.7 MiB/s, write: 12.0 MiB/s
INFO:  81% (131.0 GiB of 160.0 GiB) in 7m 27s, read: 549.3 MiB/s, write: 82.7 MiB/s
INFO:  82% (132.4 GiB of 160.0 GiB) in 7m 30s, read: 449.3 MiB/s, write: 93.3 MiB/s
INFO:  84% (135.2 GiB of 160.0 GiB) in 7m 34s, read: 729.0 MiB/s, write: 15.0 MiB/s
INFO:  85% (137.5 GiB of 160.0 GiB) in 7m 37s, read: 781.3 MiB/s, write: 0 B/s
INFO:  88% (141.5 GiB of 160.0 GiB) in 7m 40s, read: 1.3 GiB/s, write: 6.7 MiB/s
INFO:  90% (144.2 GiB of 160.0 GiB) in 7m 43s, read: 932.0 MiB/s, write: 4.0 MiB/s
INFO:  91% (146.2 GiB of 160.0 GiB) in 7m 46s, read: 669.3 MiB/s, write: 0 B/s
INFO:  92% (148.0 GiB of 160.0 GiB) in 7m 49s, read: 630.7 MiB/s, write: 0 B/s
INFO:  93% (150.0 GiB of 160.0 GiB) in 7m 52s, read: 669.3 MiB/s, write: 0 B/s
INFO:  94% (151.8 GiB of 160.0 GiB) in 7m 55s, read: 605.3 MiB/s, write: 84.0 MiB/s
INFO:  95% (152.3 GiB of 160.0 GiB) in 7m 58s, read: 173.3 MiB/s, write: 173.3 MiB/s
INFO:  96% (155.0 GiB of 160.0 GiB) in 8m 1s, read: 932.0 MiB/s, write: 22.7 MiB/s
INFO:  98% (157.5 GiB of 160.0 GiB) in 8m 4s, read: 846.7 MiB/s, write: 12.0 MiB/s
INFO:  99% (159.4 GiB of 160.0 GiB) in 8m 16s, read: 166.0 MiB/s, write: 1.0 MiB/s
INFO: 100% (160.0 GiB of 160.0 GiB) in 8m 19s, read: 190.7 MiB/s, write: 2.7 MiB/s
INFO: Waiting for server to finish backup validation...
INFO: $VAR1 = {
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs',
INFO:           'start-time' => 1671706954,
INFO:           'reused' => '139204755456',
INFO:           'dirty' => '171803017216',
INFO:           'uuid' => 'd757fc40-515d-4724-b4f2-ac09bbdfc170',
INFO:           'total' => '171803017216',
INFO:           'transferred' => '171803017216',
INFO:           'status' => 'active',
INFO:           'zero-bytes' => '39552286720',
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' )
INFO:         };
INFO: $VAR1 = {
INFO:           'zero-bytes' => '39552286720',
INFO:           'status' => 'active',
INFO:           'transferred' => '171803017216',
INFO:           'total' => '171803017216',
INFO:           'uuid' => 'd757fc40-515d-4724-b4f2-ac09bbdfc170',
INFO:           'reused' => '139204755456',
INFO:           'dirty' => '171803017216',
INFO:           'start-time' => 1671706954,
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs',
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' )
INFO:         };
INFO: $VAR1 = {
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
INFO:           'status' => 'active',
INFO:           'zero-bytes' => '39552286720',
INFO:           'uuid' => 'd757fc40-515d-4724-b4f2-ac09bbdfc170',
INFO:           'transferred' => '171803017216',
INFO:           'total' => '171803017216',
INFO:           'start-time' => 1671706954,
INFO:           'reused' => '139204755456',
INFO:           'dirty' => '171803017216',
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs'
INFO:         };
INFO: $VAR1 = {
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
INFO:           'dirty' => '171803017216',
INFO:           'reused' => '139204755456',
INFO:           'start-time' => 1671706954,
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs',
INFO:           'zero-bytes' => '39552286720',
INFO:           'status' => 'active',
INFO:           'total' => '171803017216',
INFO:           'transferred' => '171803017216',
INFO:           'uuid' => 'd757fc40-515d-4724-b4f2-ac09bbdfc170'
INFO:         };



....


INFO: $VAR1 = {
INFO:           'total' => '171803017216',
INFO:           'transferred' => '171803017216',
INFO:           'uuid' => 'd757fc40-515d-4724-b4f2-ac09bbdfc170',
INFO:           'zero-bytes' => '39552286720',
INFO:           'status' => 'active',
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs',
INFO:           'dirty' => '171803017216',
INFO:           'reused' => '139204755456',
INFO:           'start-time' => 1671706954,
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' )
INFO:         };
INFO: $VAR1 = {
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
INFO:           'total' => '171803017216',
INFO:           'transferred' => '171803017216',
INFO:           'uuid' => 'd757fc40-515d-4724-b4f2-ac09bbdfc170',
INFO:           'zero-bytes' => '39552286720',
INFO:           'status' => 'active',
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs',
INFO:           'reused' => '139204755456',
INFO:           'dirty' => '171803017216',
INFO:           'start-time' => 1671706954
INFO:         };
INFO: $VAR1 = {
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
INFO:           'zero-bytes' => '39552286720',
INFO:           'status' => 'active',
INFO:           'transferred' => '171803017216',
INFO:           'total' => '171803017216',
INFO:           'uuid' => 'd757fc40-515d-4724-b4f2-ac09bbdfc170',
INFO:           'reused' => '139204755456',
INFO:           'dirty' => '171803017216',
INFO:           'start-time' => 1671706954,
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs'
INFO:         };
INFO: $VAR1 = {
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
INFO:           'total' => '171803017216',
INFO:           'transferred' => '171803017216',
INFO:           'uuid' => 'd757fc40-515d-4724-b4f2-ac09bbdfc170',
INFO:           'zero-bytes' => '39552286720',
INFO:           'status' => 'active',
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs',
INFO:           'dirty' => '171803017216',
INFO:           'reused' => '139204755456',
INFO:           'start-time' => 1671706954
INFO:         };
ERROR: interrupted by signal
INFO: aborting backup job
 
thanks. so it's not query-backup hanging, but really the backup job never finishing..

can you provide the following information?
- pveversion -v
- VM config of the VM in question
- hookscript contents (or at least description of what it does, and whether the issue triggers without the hookscript as well)
- any guess what the affected VMs might have in common (as opposed to those that aren't affected)
- how reproducible is the issue? does it trigger on the same VMs for each backup attempt? or just sometimes?
- the last few lines of the PBS side task log for the backup would be interesting (before you cancel the task on the PVE side!)

thanks!
 
hi there!
i had to redo all the steps because it interfered with my scheduled verification and backup.

here are the answers to your questions:

- pveversion -v

Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-5.15: 7.3-1
pve-kernel-helper: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.1-1
proxmox-backup-file-restore: 2.3.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve1

- VM config of the VM in question

Code:
agent: 1
audio0: device=ich9-intel-hda,driver=spice
balloon: 512
bios: ovmf
boot: dcn
bootdisk: scsi0
cores: 4
cpu: host
efidisk0: ssdpool:vm-102-disk-0,size=1M
ide2: none,media=cdrom
localtime: 1
lock: backup
memory: 4096
name: vWin10
net0: virtio=00:0c:29:ba:e0:9f,bridge=vmbr0
net1: virtio=CE:6F:2A:83:34:A3,bridge=vmbr5
numa: 0
onboot: 1
ostype: win10
scsi0: ssdpool:vm-102-disk-1,cache=writeback,discard=on,size=160G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=564d3ca1-78c1-5b17-59d8-ae8244bae09f
sockets: 1
spice_enhancements: videostreaming=all
startup: up=180
tpmstate0: ssdpool:vm-102-disk-2,size=4M,version=v2.0
usb0: spice,usb3=1
usb1: spice
vga: qxl

- hookscript contents

Code:
#!/usr/bin/perl -w

# modified hook script, enable l2arc_mfuonly during backup to not taint the l2arc

use strict;

print "HOOK: " . join (' ', @ARGV) . "\n";

my $phase = shift;

if ($phase eq 'job-start' ||
    $phase eq 'job-end'  ||
    $phase eq 'job-abort') {

    # dumpdir is not available with all backup-methods
    my $dumpdir = $ENV{DUMPDIR};
    # set the variable to empty string if it is undefined
    $dumpdir //= "";

    my $storeid = $ENV{STOREID};

    print "HOOK-ENV: dumpdir=$dumpdir;storeid=$storeid\n";

    if ($phase eq 'job-start') {
        print "HOOK-ENV: ZFS disabling L2ARC ...\n";
        system ("/bin/echo 1 > /sys/module/zfs/parameters/l2arc_mfuonly") == 0 ||
                die "zfs L2ARC disable failed!";
    }

    if ($phase eq 'job-end') {
        print "HOOK-ENV: ZFS enabling L2ARC ...\n";
        system ("/bin/echo 0 > /sys/module/zfs/parameters/l2arc_mfuonly") == 0 ||
                die "zfs L2ARC enable failed!";
    }

    # do what you want

} elsif ($phase eq 'backup-start' ||
         $phase eq 'backup-end' ||
         $phase eq 'backup-abort' ||
         $phase eq 'log-end' ||
         $phase eq 'pre-stop' ||
         $phase eq 'pre-restart' ||
         $phase eq 'post-restart') {

    my $mode = shift; # stop/suspend/snapshot

    my $vmid = shift;

    my $vmtype = $ENV{VMTYPE}; # openvz/qemu

    # dumpdir is not available with all backup-methods
    my $dumpdir = $ENV{DUMPDIR};
    # set the variable to empty string if it is undefined
    $dumpdir //= "";

    my $storeid = $ENV{STOREID};

    my $hostname = $ENV{HOSTNAME};

    # tarfile is only available in phase 'backup-end'
    my $tarfile = $ENV{TARFILE};
    # set the variable to empty string if it is undefined
    $tarfile //= "";

    # logfile is only available in phase 'log-end'
    my $logfile = $ENV{LOGFILE};
    # set the variable to empty string if it is undefined
    $logfile //= "";

    print "HOOK-ENV: vmtype=$vmtype;dumpdir=$dumpdir;storeid=$storeid;hostname=$hostname;tarfile=$tarfile;logfile=$logfile\n";
   
} else {

    print "vzdump-hook-script.pl: got unknown phase '$phase'";

}

exit (0);

running the backup without the hook-script does not affect the issue.

- any guess what the affected VMs might have in common (as opposed to those that aren't affected)

i have no real idea concerning that. some are linux-vms, others are windows-vms, some are always running, some always stopped. i did not find anything that they have in common. but i have not yet seen this issue for CTs.


- how reproducible is the issue? does it trigger on the same VMs for each backup attempt? or just sometimes?

i would say at least 95%. sometimes another VM is affected (rare) but for most of my VMs this is easy to reproduce. i even deleted all existing backups for the affected VMs, but this did not change the behaviour.


- the last few lines of the PBS side task log for the backup would be interesting (before you cancel the task on the PVE side!)


here is a snippet of the PVE-Backup-Log (like the one above)
Code:
INFO: $VAR1 = {
INFO:           'transferred' => '171803017216',
INFO:           'start-time' => 1671883766,
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs',
INFO:           'status' => 'active',
INFO:           'total' => '171803017216',
INFO:           'reused' => '149346582528',
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
INFO:           'uuid' => '26723bda-59e8-4477-8403-2641c24869fd',
INFO:           'zero-bytes' => '39153827840',
INFO:           'dirty' => '171803017216'
INFO:         };
INFO: $VAR1 = {
INFO:           'start-time' => 1671883766,
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs',
INFO:           'transferred' => '171803017216',
INFO:           'dirty' => '171803017216',
INFO:           'zero-bytes' => '39153827840',
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
INFO:           'uuid' => '26723bda-59e8-4477-8403-2641c24869fd',
INFO:           'reused' => '149346582528',
INFO:           'status' => 'active',
INFO:           'total' => '171803017216'
INFO:         };
INFO: $VAR1 = {
INFO:           'start-time' => 1671883766,
INFO:           'backup-file' => 'root@pam@127.0.0.1:pbs',
INFO:           'transferred' => '171803017216',
INFO:           'zero-bytes' => '39153827840',
INFO:           'dirty' => '171803017216',
INFO:           'status' => 'active',
INFO:           'total' => '171803017216',
INFO:           'reused' => '149346582528',
INFO:           'finishing' => bless( do{\(my $o = 1)}, 'JSON::PP::Boolean' ),
INFO:           'uuid' => '26723bda-59e8-4477-8403-2641c24869fd'
INFO:         };
ERROR: interrupted by signal
INFO: aborting backup job

and this part is taken from the PBS:

Code:
2022-12-24T13:09:26+01:00: starting new backup on datastore 'pbs': "vm/102/2022-12-24T12:09:24Z"
2022-12-24T13:09:26+01:00: download 'index.json.blob' from previous backup.
2022-12-24T13:09:26+01:00: register chunks in 'drive-efidisk0.img.fidx' from previous backup.
2022-12-24T13:09:26+01:00: download 'drive-efidisk0.img.fidx' from previous backup.
2022-12-24T13:09:26+01:00: created new fixed index 1 ("vm/102/2022-12-24T12:09:24Z/drive-efidisk0.img.fidx")
2022-12-24T13:09:26+01:00: register chunks in 'drive-scsi0.img.fidx' from previous backup.
2022-12-24T13:09:26+01:00: download 'drive-scsi0.img.fidx' from previous backup.
2022-12-24T13:09:26+01:00: created new fixed index 2 ("vm/102/2022-12-24T12:09:24Z/drive-scsi0.img.fidx")
2022-12-24T13:09:26+01:00: register chunks in 'drive-tpmstate0-backup.img.fidx' from previous backup.
2022-12-24T13:09:26+01:00: download 'drive-tpmstate0-backup.img.fidx' from previous backup.
2022-12-24T13:09:26+01:00: created new fixed index 3 ("vm/102/2022-12-24T12:09:24Z/drive-tpmstate0-backup.img.fidx")
2022-12-24T13:09:26+01:00: add blob "/mnt/unsecured/backup/pbs/vm/102/2022-12-24T12:09:24Z/qemu-server.conf.blob" (523 bytes, comp: 523)
2022-12-24T13:18:26+01:00: Upload statistics for 'drive-tpmstate0-backup.img.fidx'
2022-12-24T13:18:26+01:00: UUID: 3cdc0f75798b492baf6e57fd179dcb6a
2022-12-24T13:18:26+01:00: Checksum: 3cc5c55e4eea863a8b7178b722c077bcde3a8ea801f54c759157b2a33f92868e
2022-12-24T13:18:26+01:00: Size: 4194304
2022-12-24T13:18:26+01:00: Chunk count: 1
2022-12-24T13:18:26+01:00: Upload size: 8388608 (200%)
2022-12-24T13:18:26+01:00: Duplicates: 0+1 (100%)
2022-12-24T13:18:26+01:00: Compression: 0%
2022-12-24T13:18:26+01:00: successfully closed fixed index 3
2022-12-24T13:18:26+01:00: Upload statistics for 'drive-scsi0.img.fidx'
2022-12-24T13:18:26+01:00: UUID: 0aa1bfdaa008446aa1a1220c4d085ad2
2022-12-24T13:18:26+01:00: Checksum: 06c3986783f981471bad10406d066aec2a5276be50fbfdde64ecd8d12ff11f7e
2022-12-24T13:18:26+01:00: Size: 171798691840
2022-12-24T13:18:26+01:00: Chunk count: 40960
2022-12-24T13:18:26+01:00: Upload size: 22456303616 (13%)
2022-12-24T13:18:26+01:00: Duplicates: 35606+64 (87%)
2022-12-24T13:18:26+01:00: Compression: 48%
2022-12-24T13:18:26+01:00: successfully closed fixed index 2
2022-12-24T13:18:26+01:00: Upload statistics for 'drive-efidisk0.img.fidx'
2022-12-24T13:18:26+01:00: UUID: 5fae39b0cfa14fd1b9a859418510c640
2022-12-24T13:18:26+01:00: Checksum: 85df4ef233acc700168de16e2738e66bbabb38a99002211133c0db638e3f8bca
2022-12-24T13:18:26+01:00: Size: 131072
2022-12-24T13:18:26+01:00: Chunk count: 1
2022-12-24T13:18:26+01:00: Upload size: 4325376 (3300%)
2022-12-24T13:18:26+01:00: Duplicates: 0+1 (100%)
2022-12-24T13:18:26+01:00: Compression: 0%
2022-12-24T13:18:26+01:00: successfully closed fixed index 1
2022-12-24T13:18:26+01:00: add blob "/mnt/unsecured/backup/pbs/vm/102/2022-12-24T12:09:24Z/index.json.blob" (555 bytes, comp: 555)



Best regards and happy holidays!
 
Last edited:
thanks for the information - especially the backup log from the PBS side looks interesting! one last question - could you post your datastore.cfg (feel free to censor schedules/names/IP addresses)? is your PBS datastore local to PBS or some mounted network file system like NFS/CIFS?
 
my PBS is a bare-metal install alongside PVE, the datastore resides on a local-mounted ZFS-Datastore.

Here is the datastore.cfg:


Code:
datastore: pbs
        comment
        gc-schedule tue *-1..7 23:30
        notify gc=error,sync=error,verify=error
        notify-user root@pam
        path /mnt/unsecured/backup/pbs
 
okay, this likely will require a patched PBS build and qemu library to debug further, since the issue arises in a part of the code where neither prints enough information to determine where it's hanging.. would you be willing to install one (just with added log statements) and repeat the test?
 
Sure, glad to help. Just point me to the debug binaries/instructions
 
sorry, I had other stuff get in the way. I'll post a link later today!
 
http://download.proxmox.com/temp/proxmox-backup-test-forum-119854/

Code:
3660214df2252b6bef92460af26798ac152e6828955b117e835bb19c81f70b354f37ca8411cdd2194efb972d8832f32832a0ac38145e6e02d44df97aa4cbc2ed  libproxmox-backup-qemu0_1.3.2-2~test1_amd64.deb
cb999fd95c9cf5aea03ba193846924d1559b176b28c6d1ce987584be935c4c9afdf29997d7a6b7ff9e5231e8c6e36994f4f348dcdcd82fdf7a30721e5e51be68  proxmox-backup-server_2.3.2-2~test1_amd64.deb

to load the libproxmox-backup-qemu0 changes (both for testing, and in case you want to revert to the stock version) you need to freshly boot the VM, you can check that it worked with qm status XXX --verbose | grep library-version:

Code:
qm status 104 --verbose | grep library-version
        pbs-library-version: 1.3.2 (c36706bf661c2a37f92b999dacaf63f4babd075f)

the patched packages will add the following log lines:

PVE - the lines in bold, logged directly by the VM process (QEMU) - these are not added to the vzdump task log, but are visible in the journal:
Jan 16 14:24:58 yuna vzdump[91535]: <root@pam> starting task UPID:yuna:000165B4:0650365F:63C5502A:vzdump:104:root@pam:
Jan 16 14:24:58 yuna vzdump[91572]: INFO: starting new backup job: vzdump 104 --storage pbs
Jan 16 14:24:58 yuna vzdump[91572]: INFO: Starting Backup of VM 104 (qemu)
Jan 16 14:24:58 yuna QEMU[29615]: uploading manifest
Jan 16 14:24:58 yuna QEMU[29615]: finishing backup
Jan 16 14:25:01 yuna QEMU[29615]: finished backup

Jan 16 14:25:01 yuna vzdump[91572]: INFO: Finished Backup of VM 104 (00:00:03)
Jan 16 14:25:01 yuna vzdump[91572]: INFO: Backup job finished successfully
Jan 16 14:25:01 yuna vzdump[91535]: <root@pam> end task UPID:yuna:000165B4:0650365F:63C5502A:vzdump:104:root@pam: OK

PBS - these are contained in the task log as well as in the journal:
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: starting new backup on datastore 'tank': "vm/104/2023-01-16T13:24:58Z"
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: download 'index.json.blob' from previous backup.
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: register chunks in 'drive-scsi0.img.fidx' from previous backup.
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: download 'drive-scsi0.img.fidx' from previous backup.
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: created new fixed index 1 ("vm/104/2023-01-16T13:24:58Z/drive-scsi0.img.fidx")
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: add blob "/tank/pbs/vm/104/2023-01-16T13:24:58Z/qemu-server.conf.blob" (356 bytes, comp: 356)
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: Upload statistics for 'drive-scsi0.img.fidx'
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: UUID: 3fb9ba12ef8542f892f306cccab550d8
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: Checksum: ab9ffdbaa90c2e5ec311d6cbd770fec068ff8b50a0c74d99cb55c85a4987f2a7
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: Size: 0
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: Chunk count: 0
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: successfully closed fixed index 1
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: add blob "/tank/pbs/vm/104/2023-01-16T13:24:58Z/index.json.blob" (325 bytes, comp: 325)
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: finish backup called
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: updating manifest with stats
Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: ensuring sync level

Jan 16 14:24:58 HOST proxmox-backup-proxy[90105]: syncing filesystem
Jan 16 14:25:01 HOST proxmox-backup-proxy[90105]: finished
Jan 16 14:25:01 HOST proxmox-backup-proxy[90105]: successfully finished backup
Jan 16 14:25:01 HOST proxmox-backup-proxy[90105]: backup finished successfully
Jan 16 14:25:01 HOST proxmox-backup-proxy[90105]: verify_after_complete called
Jan 16 14:25:01 HOST proxmox-backup-proxy[90105]: TASK OK
Jan 16 14:25:01 HOST proxmox-backup-proxy[90105]: Upload backup log to datastore 'tank', root namespace vm/104/2023-01-16T13:24:58Z/client.log.blob

to revert to the stock package versions, use apt install --reinstall proxmox-backup-server=2.3.2-1 libproxmox-backup-qemu0=1.3.1-1 (adapt the versions if a newer version is available in the repositories in the meanwhile!).
 
sorry for the late reply - i was really busy and had no time to conduct the tests.

since my pbs is installed alongside pve (bare-metal) i have a combined log:

Code:
Jan 28 09:12:05 pve pvedaemon[17861]: <root@pam> starting task UPID:pve:00042639:0002E52B:63D4D8D5:vzdump:102:root@pam:
Jan 28 09:12:05 pve pvedaemon[271929]: INFO: starting new backup job: vzdump 102 --storage pbs --remove 0 --notes-template '{{guestname}}' --node pve --mode snapshot
Jan 28 09:12:05 pve pvedaemon[271929]: INFO: Starting Backup of VM 102 (qemu)
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: starting new backup on datastore 'pbs': "vm/102/2023-01-28T08:12:05Z"
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: download 'index.json.blob' from previous backup.
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: register chunks in 'drive-efidisk0.img.fidx' from previous backup.
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: download 'drive-efidisk0.img.fidx' from previous backup.
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: created new fixed index 1 ("vm/102/2023-01-28T08:12:05Z/drive-efidisk0.img.fidx")
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: register chunks in 'drive-scsi0.img.fidx' from previous backup.
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: download 'drive-scsi0.img.fidx' from previous backup.
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: created new fixed index 2 ("vm/102/2023-01-28T08:12:05Z/drive-scsi0.img.fidx")
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: register chunks in 'drive-tpmstate0-backup.img.fidx' from previous backup.
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: download 'drive-tpmstate0-backup.img.fidx' from previous backup.
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: created new fixed index 3 ("vm/102/2023-01-28T08:12:05Z/drive-tpmstate0-backup.img.fidx")
Jan 28 09:12:07 pve proxmox-backup-proxy[236444]: add blob "/mnt/unsecured/backup/pbs/vm/102/2023-01-28T08:12:05Z/qemu-server.conf.blob" (523 bytes, comp: 523)
Jan 28 09:16:21 pve pvedaemon[17859]: <root@pam> successful auth for user 'root@pam'
Jan 28 09:17:18 pve pveproxy[17869]: worker exit
Jan 28 09:17:18 pve pveproxy[17867]: worker 17869 finished
Jan 28 09:17:18 pve pveproxy[17867]: starting 1 worker(s)
Jan 28 09:17:18 pve pveproxy[17867]: worker 421250 started
Jan 28 09:17:25 pve proxmox-backup-proxy[236444]: GET /favicon.ico: 400 Bad Request: [client [::ffff:192.168.1.252]:39568] File access problems: No such file or directory (os error 2)
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Upload statistics for 'drive-tpmstate0-backup.img.fidx'
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: UUID: bed72f28ea084c6c96eb8ed07fa37b0d
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Checksum: 7cc6dc2c9b3cb3fca04cd9f0ea3848b90683c41a5de0d9e2c86d1cef7574bfaf
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Size: 4194304
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Chunk count: 1
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Upload size: 8388608 (200%)
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Duplicates: 0+1 (100%)
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Compression: 0%
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: successfully closed fixed index 3
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Upload statistics for 'drive-scsi0.img.fidx'
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: UUID: 74ba4d4fc8cb48e4bc19f3872a0511e0
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Checksum: d9698cf46c194fbfaeee605afdffa922f86e6289c83caaf8acd4e3d4c9e6e6dc
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Size: 171798691840
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Chunk count: 40960
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Upload size: 19700645888 (11%)
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Duplicates: 36263+7 (88%)
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Compression: 49%
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: successfully closed fixed index 2
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Upload statistics for 'drive-efidisk0.img.fidx'
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: UUID: 2a77ad8ce4324c768c256db4b1abaf02
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Checksum: 04da6c1647c7eb5d2a42ca6bf1077ec4938a686925c9086cffd50aec929b585c
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Size: 131072
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Chunk count: 1
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Upload size: 4325376 (3300%)
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Duplicates: 0+1 (100%)
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: Compression: 0%
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: successfully closed fixed index 1
Jan 28 09:17:39 pve QEMU[244716]: uploading manifest
Jan 28 09:17:39 pve QEMU[244716]: finishing backup
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: syncing filesystem
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: add blob "/mnt/unsecured/backup/pbs/vm/102/2023-01-28T08:12:05Z/index.json.blob" (556 bytes, comp: 556)
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: finish backup called
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: updating manifest with stats
Jan 28 09:17:39 pve proxmox-backup-proxy[236444]: ensuring sync level

i hope this will help you, i will gladly assist with upcoming tests :)
 
okay, so the last line shows that it "hangs" when syncing the datastore. you mentioned earlier that it is located on a local ZFS storage. your version information from the first post lines up with the following change that switched from no syncing as default to syncing the filesystem of the datastore:

https://git.proxmox.com/?p=proxmox-backup.git;a=commit;h=4694dede0e0600ed6b111b95877b12d4326149b7

could you check if the problem goes away if you switch back to "none" as sync level?
 
adding `tuning sync-level=none` to the corresponding datastore in `/etc/proxmox-backup/datastore.cfg` did actually work. the backups are now finishing.

i am wondering why this problem occurs on my local ZFS storage, which indeed does not have a separate ZIL (as this is an archive HDD). During my tests i never saw any IO-waits on the corresponding devices, so maybe some issues with openzfs or even premature starving like @RolandK pointed out.

nevertheless, maybe the default option 'filesystem' should not be used on ZFS devices. I presume only few people use SSDs or HDDs with SSD zil for archiving ;-)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!