vm cpu stalled after backup

Sean510

Member
Apr 18, 2020
19
2
23
45
Hello everybody,
i tried to search but i don't find a problem similar to mine, or i misspelled the search terms.
It still happens, even 3 or 4 times a month, that when the PBS backup passes, the cpus of the vm stall, remaining at 55% with the vm that no longer works. This is true for almost all the VMs in the cluster. I can't understand what the problem is. Restarting all the VMs makes everything go back to normal.

Has anyone encountered a similar problem?
Regards
 
We've had a few reports of VMs hanging on backup start, but not on finish AFAIK. The situation should be improved with pve-qemu-kvm 5.1.0-8, and even more so in the upcoming 5.2.0-1.

To debug your specific issue we'd need a bit more information:
  • Logs from when the hang starts (both PVE and PBS, journalctl, etc...)
  • "Restarting the VM" implies which operations exactly (i.e. "Restart", "Stop -> Start", "Shutdown -> Start")? What logs/output do they produce?
  • VM config (qm config <vmid>)
  • pveversion -v
  • /etc/proxmox-backup/datastore.cfg from your PBS server if you have anything special configured
 
For the vm I execute "stop" and "start". it is very slow to go out
it could really be qemu version problem, i fell behind

Code:
root@hv1:~# qm config 112
agent: 1
bootdisk: scsi0
cores: 1
cpulimit: 2
memory: 8192
name: vps.meteoregionelazio.it
net0: virtio=CE:32:9D:FC:5A:DB,bridge=vmbr0
numa: 0
ostype: l26
scsi0: storage:vm-112-disk-0,discard=on,iothread=1,size=20G,ssd=1
scsi1: storage:vm-112-disk-1,discard=on,iothread=1,size=30G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=140cc6f3-e2b4-4f25-90cf-d49b9dab31a7
sockets: 2
vcpus: 2
vmgenid: c12a8d22-c2f8-4b26-bc3c-8a7b12ffc637

root@hv1:~# pveversion -v proxmox-ve: 6.2-1 (running kernel: 5.4.55-1-pve) pve-manager: 6.2-11 (running version: 6.2-11/22fb4983) pve-kernel-5.4: 6.2-5 pve-kernel-helper: 6.2-5 pve-kernel-5.4.55-1-pve: 5.4.55-1 pve-kernel-5.4.34-1-pve: 5.4.34-2 ceph: 14.2.10-pve1 ceph-fuse: 14.2.10-pve1 corosync: 3.0.4-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: 0.8.35+pve1 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.16-pve1 libproxmox-acme-perl: 1.0.4 libpve-access-control: 6.1-2 libpve-apiclient-perl: 3.0-3 libpve-common-perl: 6.1-5 libpve-guest-common-perl: 3.1-2 libpve-http-server-perl: 3.0-6 libpve-storage-perl: 6.2-6 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.3-1 lxcfs: 4.0.3-pve3 novnc-pve: 1.1.0-1 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.2-10 pve-cluster: 6.1-8 pve-container: 3.1-12 pve-docs: 6.2-5 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-2 pve-firmware: 3.1-2 pve-ha-manager: 3.0-9 pve-i18n: 2.1-3 pve-qemu-kvm: 5.0.0-12 pve-xtermjs: 4.7.0-1 qemu-server: 6.2-11 smartmontools: 7.1-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 0.8.4-pve1

atastore: bk-prox gc-schedule sat 18:15 keep-daily 7 path /bk-prox prune-schedule 0/2:00 datastore: bk-site path /bk-site
 
Hm, pve-qemu-kvm: 5.0.0-12 is pretty old indeed - with respect to PBS support anyway. I'd recommend updating to the newest version and trying again, there have been a heap of PBS-related fixes in the meantime.

If that doesn't work, logs from the time of the crash would be very helpful.