[SOLVED] Backup stuck always on same VM

adminkc

Member
Sep 28, 2020
91
0
11
28
Hi,

our Backups always get the same error see attached picture.
Does anybody have/had the same issue or some ideas what we can do?

What I know is that, when i delete the Backup with the error and do it manual it finish in 10 seconds without error?

Thank you,
KC IT-Team
 

Attachments

  • Backup Fail.jpg
    Backup Fail.jpg
    64.6 KB · Views: 29
I have seen things like this when there is a timeout, not sure how long it waits on getting a "lock"
 
The strange thing is, that everytime the same Backupjob have this error
Are you running a recent PVE version? IIRC, we had a similar problem years ago that was solved by an update and a VM stop/start to get the new updates. Maybe give that a shot?
 
Yes we run the latest PVE version. I stop and start the VM again, now I will wait to see if this helps.
I think this will not resolve our problem.
 
Can you look at the qemu-guest-agent logfile? Maybe also enable debugging for it to see what is happending and maybe there is something written there that could lead to a timeout.
 
Hi,

they are no errors in the logfile.
 

Attachments

  • 1685087923901.png
    1685087923901.png
    12.7 KB · Views: 21
Hi,
If I interpret the log correctly, the freeze took a little over 2 minutes, which IS a long time. What scripts are executes there? The logfile is just in info not in debug, please change and try again.
it doesn't mean that the freeze operation itself took this long. During the backup setup step, the filesystem needs to stay freezed. The order is, 1. freeze, 2. backup setup, 3. thaw. The timeout for the backup setup is 2 minutes and from the screenshot in the first post, we see that the timeout is hit. So most likely that is the reason why the thaw only happens 2 minutes after the freeze.

@adminkc Please share the output of pveversion -v, qm config 151 and qm config 245. Is there any noticeable difference to other VM configurations? What kind of storage is your backup target?
 
The VM Config's are the same no defference between them.
We are using PBS for Storage.

Here ist the pveversion:

pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.13: 7.1-9
pve-kernel-5.4: 6.4-6
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.39-4-pve: 5.15.39-4
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 16.2.11-pve1
ceph-fuse: 16.2.11-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.6.4
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
 
We are using PBS for Storage.
How does the IO/CPU/network load, both on Proxmox VE and PBS, look like during backup? Do you have backups from multiple nodes to the same PBS instance at the same time?

The VM Config's are the same no defference between them.
Can you still share one of the configurations?
 
How does the IO/CPU/network load, both on Proxmox VE and PBS, look like during backup? Do you have backups from multiple nodes to the same PBS instance at the same time?
Yees all our Backup from all nodes goes on one PBS. The VM's are not on one hypervisor they are distributed.
Can you still share one of the configurations?
I attached to configs. The VM 295 had the error durrings tonight backup and the 212 not so i put this 2 configs here in. Let me know if you need some more information.
 

Attachments

  • 1685601515817.png
    1685601515817.png
    17.2 KB · Views: 7
  • 1685601567097.png
    1685601567097.png
    22.1 KB · Views: 7
  • 1685601579855.png
    1685601579855.png
    21.7 KB · Views: 7
Yees all our Backup from all nodes goes on one PBS. The VM's are not on one hypervisor they are distributed.
Can you try and create different backup jobs, so not all trigger at the same time?
 
O maybe I unterstand what you mean. Not all of our VM's running at the same time, see timetable when the Backups starts.
 

Attachments

  • 1685603136823.png
    1685603136823.png
    23.8 KB · Views: 9
O maybe I unterstand what you mean. Not all of our VM's running at the same time, see timetable when the Backups starts.
Okay, so you do have multiple jobs. But do some of the problematic jobs include VMs on all/many nodes? Because then all those nodes will try to talk to the backup server at the same time. If you do have such a job, I'd try to split it up and shift the start time a bit. Related feature request: https://bugzilla.proxmox.com/show_bug.cgi?id=3086
 
The logfile is just in info not in debug, please change and try again.
After changing to debug level, the log gets flooded with:
Code:
1685699121.67057: debug: received EOF
1685699121.167240: debug: received EOF
1685699121.267419: debug: received EOF
1685699121.367583: debug: received EOF
1685699121.467762: debug: received EOF
1685699121.567933: debug: received EOF
1685699121.668102: debug: received EOF
1685699121.768242: debug: received EOF
1685699121.868408: debug: received EOF
1685699121.968592: debug: received EOF
1685699122.68761: debug: received EOF
1685699122.168938: debug: received EOF
1685699122.269115: debug: received EOF
1685699122.369309: debug: received EOF
1685699122.469485: debug: received EOF
1685699122.569653: debug: received EOF
1685699122.669845: debug: received EOF
1685699122.770020: debug: received EOF
1685699122.870193: debug: received EOF
1685699122.970374: debug: received EOF
10 messages per second. Is there a way to switch off those messages?

All our machines have Ubuntu as OS yet in different versions.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!