[SOLVED] Problem mit Backup

pixelpeter · Dec 9, 2020

Hallo,

Ich habe die letzten Tage auf die aktuelle Version aus dem Enterprise Repo aktualisiert und habe folgendes festgestellt:
Wenn man einen laufenden Backupjob abbricht bleibt die VM gelockt und ist auch nach einiger Zeit nicht mehr ansprechbar.
Backupmethode Snapshot.
Die täglichen Backups laufen sauber durch.
Kann das jemand nachvollziehen?

Peter

Alwin · Dec 9, 2020

Bitte poste die VM und storage.cfg. Und ist es ein manuell gestarteter Job oder automatischer?

pakuzaz · Dec 9, 2020

@Alwin das gleich haben wir auch Siehe Forum

pakuzaz · Dec 9, 2020

https://forum.proxmox.com/threads/backup-problem.80362/#post-355455

pixelpeter · Dec 9, 2020

Hallo Alwin,

Das BAckup wurde manuell gestartet und nach ca. 10% abgebrochen weil ich auf dem falsch Volumen gespeichert habe.

storage.conf

dir: local
disable
path /var/lib/vz
content vztmpl,iso
shared 0

lvmthin: local-lvm
disable
thinpool data
vgname pve
content images
nodes sv-c-vdz4,sv-c-vdz3

nfs: nfs_fast_linux
export /vol_dzfs_fast/linux
path /mnt/pve/nfs_fast_linux
server 10.3.9.100
content images
nodes sv-c-vdz3,sv-c-vdz4
options vers=3

nfs: nfs_slow_linux
export /vol_dzfs_slow/linux
path /mnt/pve/nfs_slow_linux
server 10.3.9.100
content images
nodes sv-c-vdz3,sv-c-vdz4
options vers=3

nfs: nfs_qnap_linux
export /DZ_Root/linux
path /mnt/pve/nfs_qnap_linux
server 10.3.9.111
content backup
nodes sv-c-vdz3,sv-c-vdz4
options rsize=65536,wsize=65536,vers=3
prune-backups keep-last=5

nfs: nfs_qnap_archiv
disable
export /DZ_Root/archiv
path /mnt/pve/nfs_qnap_archiv
server 10.3.9.111
content backup
prune-backups keep-last=1

pbs: pbs_linux
datastore ds_linux
server 10.6.9.250
content backup
fingerprint 17:a8:0b:8e:52:9e:a2:d2:85:d6:9a:83:a2:82:8b:b6:20:9e:2d:c1:57:c8:04:2f:9f:85:03:2b:e2:64:d5:43
prune-backups keep-all=1
username xxxxxx@pbs

vm conf
agent: 1
balloon: 0
boot: dcn
bootdisk: scsi0
cores: 2
cpu: kvm64,flags=+pcid;+spec-ctrl
memory: 4096
name: dz-v-puppet
net0: virtio=5E:37:2D:5E:3E:91,bridge=vmbr0,tag=305
net1: virtio=16:6E:51:17:8C:6C,bridge=vmbr0,tag=367
numa: 1
ostype: l26
scsi0: nfs_fast_linux:107/vm-107-disk-0.raw,size=10G
scsi1: nfs_fast_linux:107/vm-107-disk-1.raw,size=2G
scsihw: virtio-scsi-pci
smbios1: uuid=4e25d58b-0c7a-412d-9d54-acb13380f5f6
sockets: 1
tablet: 0
vmgenid: 63a9dac0-f18a-4432-973f-b99cbe777835

Backup Log

INFO: starting new backup job: vzdump 107 --remove 0 --compress zstd --node sv-c-vdz3 --storage nfs_qnap_linux --mode snapshot
INFO: Starting Backup of VM 107 (qemu)
INFO: Backup started at 2020-12-09 09:53:06
INFO: status = running
INFO: VM Name: dz-v-puppet
INFO: include disk 'scsi0' 'nfs_fast_linux:107/vm-107-disk-0.raw' 10G
INFO: include disk 'scsi1' 'nfs_fast_linux:107/vm-107-disk-1.raw' 2G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/nfs_qnap_linux/dump/vzdump-qemu-107-2020_12_09-09_53_06.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'd5afc3b9-2173-4ab7-b8b6-061778e92287'
INFO: resuming VM again
INFO: 1% (158.8 MiB of 12.0 GiB) in 3s, read: 52.9 MiB/s, write: 32.3 MiB/s
INFO: 2% (308.1 MiB of 12.0 GiB) in 6s, read: 49.8 MiB/s, write: 46.4 MiB/s
INFO: 3% (445.9 MiB of 12.0 GiB) in 9s, read: 45.9 MiB/s, write: 43.3 MiB/s
INFO: 4% (591.4 MiB of 12.0 GiB) in 12s, read: 48.5 MiB/s, write: 47.3 MiB/s
INFO: 5% (730.1 MiB of 12.0 GiB) in 15s, read: 46.2 MiB/s, write: 44.0 MiB/s
INFO: 7% (869.7 MiB of 12.0 GiB) in 18s, read: 46.5 MiB/s, write: 43.2 MiB/s
INFO: 8% (999.8 MiB of 12.0 GiB) in 21s, read: 43.4 MiB/s, write: 41.7 MiB/s
INFO: 9% (1.1 GiB of 12.0 GiB) in 24s, read: 45.0 MiB/s, write: 42.3 MiB/s
INFO: 10% (1.2 GiB of 12.0 GiB) in 27s, read: 47.8 MiB/s, write: 44.2 MiB/s
INFO: 11% (1.3 GiB of 12.0 GiB) in 30s, read: 30.7 MiB/s, write: 30.1 MiB/s
INFO: 12% (1.5 GiB of 12.0 GiB) in 33s, read: 45.9 MiB/s, write: 43.4 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job

Peter

pixelpeter · Dec 10, 2020

Hallo,

Auf unserem Testcluster mit dem Repository "nosubscription" tritt das Problem nicht auf.
Scheint also gefixt dort zu sein.

Peter

Alwin · Dec 10, 2020

pixelpeter said:
Auf unserem Testcluster mit dem Repository "nosubscription" tritt das Problem nicht auf.

Ein pveversion -v zeigt die installierten Pakete. Damit lässt sich leicht ein vergleich anstellen.

pixelpeter · Dec 10, 2020

Hallo Alwin,

Unser Testcluster nutzt das Repository "test" nicht nosubscription.
Diese zwei Pakete sind unterschiedlich:
pve-container
libproxmox-acme-perl.

Peter

Alwin · Dec 10, 2020

There is report on our bugzilla with aborted zstd keeping the lock. But similar as to your test cluster, I couldn't reproduce the issue.
https://bugzilla.proxmox.com/show_bug.cgi?id=2723

pixelpeter · Dec 10, 2020

Hallo Alwin,

Eine Lösung gibt es aber auch nicht.
Mir ist noch dies aufgefallen:
Passiert nur bei zstd.
mit lzo sieht das Log so aus:

Code:

INFO: starting new backup job: vzdump 142 --mode snapshot --remove 0 --node sv-c-vdz4 --compress lzo --storage nfs_qnap_linux
INFO: Starting Backup of VM 142 (qemu)
INFO: Backup started at 2020-12-10 15:29:33
INFO: status = running
INFO: VM Name: dz-v-igp
INFO: include disk 'scsi0' 'nfs_fast_linux:142/vm-142-disk-0.raw' 6G
INFO: include disk 'scsi1' 'nfs_fast_linux:142/vm-142-disk-1.raw' 2G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/nfs_qnap_linux/dump/vzdump-qemu-142-2020_12_10-15_29_33.vma.lzo'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '49b332fa-88c2-44e9-b0b8-4be12bb40165'
INFO: resuming VM again
INFO:   2% (192.4 MiB of 8.0 GiB) in  3s, read: 64.1 MiB/s, write: 4.4 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
ERROR: Backup of VM 142 failed - interrupted by signal
INFO: Failed at 2020-12-10 15:29:39
ERROR: Backup job failed - interrupted by signal
TASK ERROR: interrupted by signal

Mit zstd so:

Code:

INFO: starting new backup job: vzdump 142 --mode snapshot --node sv-c-vdz4 --remove 0 --compress zstd --storage nfs_qnap_linux
INFO: Starting Backup of VM 142 (qemu)
INFO: Backup started at 2020-12-10 15:31:57
INFO: status = running
INFO: VM Name: dz-v-igp
INFO: include disk 'scsi0' 'nfs_fast_linux:142/vm-142-disk-0.raw' 6G
INFO: include disk 'scsi1' 'nfs_fast_linux:142/vm-142-disk-1.raw' 2G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/nfs_qnap_linux/dump/vzdump-qemu-142-2020_12_10-15_31_57.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '2e564331-1997-4ef9-b400-9ab2bb6e254c'
INFO: resuming VM again
INFO:   2% (178.3 MiB of 8.0 GiB) in  3s, read: 59.4 MiB/s, write: 4.4 MiB/s
INFO:   3% (317.2 MiB of 8.0 GiB) in  6s, read: 46.3 MiB/s, write: 0 B/s
ERROR: interrupted by signal
INFO: aborting backup job

Da fehlen Einträge im logfile. Scheinbar bricht da etwas zu früh ab.

Peter

pixelpeter · Dec 10, 2020

Was könnte ich Euch noch zuarbeiten/testen?

Peter

pixelpeter · Dec 10, 2020

Es gibt noch eine neue Erkenntniss.
Ich habe noch einen einzelnen Notfallserver mit Subskription Lizenz und den gleichen Softwarestand wie unsere Cluster. Hier gibt es das Problem auch nicht. Scheint also auch an der Hardware zu liegen.
Kerne hat der Notfallserver 8, der Clusterknoten 52.
Zstd mit einem Thread habe ich schon getestet, bringt nichts.

peter

Alwin · Dec 11, 2020

Tritt die Problematik auch auf, wenn das vzdump direkt von Hand ausgeführt wird? Sollten sich im /etc/pve/vzdump.cron finden lassen.

pixelpeter · Dec 29, 2020

Hallo Alwin,

Das Problem tritt nach wie vor auf.
Wenn man ein Backup abbricht, dann ist die VM tot.
Sehr Problematisch ist es wenn bei den taeglichen Backups bspw. ein Timeout auftritt, dann ist es im Prinzip dasselbe.
Das Problem tritt bei uns allerdings nur im Linuxcluster auf. Hier laufen als VM Debian 10.
Im Windows Cluster mit dem selben Softwarestand tritt dies nicht auf.
Auf dem Testcluster gibt es diese Probleme auch mit Debian nicht. Hier hat allerdings der backup/client die Version 1.0.6-1 anstelle von 1.0.5-1.

Ich warte mal ab bis die 1.0.6 auch im Enterprise Repo auftaucht.

Peter

pakuzaz · Dec 29, 2020

Auf die warte ich auch schon.

pixelpeter · Dec 30, 2020

Hallo,

Heute gab es die 1.0.6 im Repo.
Leider kein Erfolg.
Das Problem beschränkt sich aber bei uns auf einen Cluster.
Hier noch mal das Log wo es nicht funktioniert:

Code:

INFO:   0% (468.3 MiB of 86.0 GiB) in  3s, read: 156.1 MiB/s, write: 109.2 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job

Und hier wo es sauber abbricht:

Code:

INFO:  17% (916.0 MiB of 5.0 GiB) in  3s, read: 305.3 MiB/s, write: 1.3 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
ERROR: Backup of VM 101 failed - interrupted by signal
INFO: Failed at 2020-12-30 07:34:38
ERROR: Backup job failed - interrupted by signal
TASK ERROR: interrupted by signal

Softwarestand auf den verschiedenen Clustern ist komplett identisch. Hardware ist auch identisch.
Das Problem hat auch nichts mit dem neuen Backup Server zu tun. Auch ein Backup auf ein normales Storage lässt sich nicht abbrechen.

Kann ich hier noch irgendetwas testen?

Ansonsten wünsche ich dem Team einen guten Rutsch in das Jahr 2021.

Peter

fabian · Dec 30, 2020

@pixelpeter could you file an entry at https://bugzilla.proxmox.com with your findings so far - sounds like there is still some bug lurking in the error handling code path? since it seems reproducible, if that's an option for you we could try to get you a test build with more output to see where it gets stuck..

pixelpeter · Dec 30, 2020

Hallo Fabian,

Ja, mache ich.

Peter

Stefan_R · Jan 4, 2021

Fürs Protokoll, der Bugzilla-Eintrag ist hier: https://bugzilla.proxmox.com/show_bug.cgi?id=3225

pixelpeter · Jan 4, 2021

Hallo,

Wie in Bugzilla beschrieben tritt der Fehler nur auf wenn es mehr als ein Backupziel gibt.
Auf dem betroffenen Cluster gab es in der Tat neben dem neuen pbs noch ein altes via nfs.
Patch wird sicher die nächsten Tage in Repo laufen.

Peter

[SOLVED] Problem mit Backup

Renowned Member

Proxmox Retired Staff

Well-Known Member

Well-Known Member

Renowned Member

Renowned Member

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

Renowned Member

Renowned Member

Renowned Member

Proxmox Retired Staff

Renowned Member

Well-Known Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Retired Staff

Renowned Member