Errors on shutting down VM for backup

cosmos

Well-Known Member
Apr 1, 2013
104
2
58
Hello,

I've got a ready-to-use CentOS 7-based virtual appliance originally made for vmware and wanted to use it on PVE. It was quite easy to do so, just unpack the OVA, get the vmware disk images and dd them to my LVM partition, prepare a PVE guest for CentOS and then connect the LVM-based disk.

The VM runs just fine so far, with one notable exception. I'm trying to get weekly backups in stop mode (to have the best possible consistency). I've scheduled the backup to take place along with 3-4 other VMs, starting weekly on Sundays at 1:15am. The VM under discussion has id 105. From the host daemon.log:
Code:
Jun 24 01:15:02 pve-1 vzdump[11559]: INFO: starting new backup job: vzdump 100 101 104 106 105 --storage backups --quiet 1 --mailto XXXXXXXXXXXXXXXXXXX --mailnotification failure --mode stop --compress gzip
Jun 24 01:15:02 pve-1 vzdump[11559]: INFO: Starting Backup of VM 100 (qemu)
Jun 24 01:15:04 pve-1 qm[11578]: shutdown VM 100: UPID:pve-XXXX:qmshutdown:100:root@pam:
Jun 24 01:15:50 pve-1 systemd[1]: Started 100.scope.
Jun 24 01:15:50 pve-1 systemd-udevd[11673]: Could not generate persistent MAC address for tap100i0: No such file or directory
Jun 24 01:16:00 pve-1 systemd[1]: Starting Proxmox VE replication runner...
Jun 24 01:16:00 pve-1 systemd[1]: Started Proxmox VE replication runner.
[...]

Jun 24 03:21:45 pve-1 vzdump[11559]: INFO: Starting Backup of VM 105 (qemu)
Jun 24 03:21:47 pve-1 qm[28459]: shutdown VM 105: UPID:pve-1:XXXXXX:qmshutdown:105:root@pam:
[...]
Jun 24 03:31:48 pve-1 qm[28459]: VM quit/powerdown failed - got timeout
Jun 24 03:31:48 pve-1 vzdump[11559]: ERROR: Backup of VM 105 failed - command 'qm shutdown 105 --skiplock --keepActive --timeout 600' failed: exit code 255
Jun 24 03:31:48 pve-1 vzdump[11559]: INFO: Starting Backup of VM 106 (qemu)
Do note that the host does not seem to be doing anything else to the guest. But it does, later on.

Examining the guest log, shows that something strange happened around 1:15am:
Code:
Jun 24 01:01:01 eraav systemd: Started Session 135 of user root.
Jun 24 01:01:01 eraav systemd: Starting Session 135 of user root.
Jun 24 01:17:51 eraav kernel: NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [RDSensor:937]
Jun 24 01:17:52 eraav kernel: Modules linked in: binfmt_misc xt_REDIRECT nf_nat_redirect iptable_nat nf_nat_ipv4 ip6table_nat nf_nat_ipv6 nf_nat nf_conntrac
k_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip6table_filter ip6_tables bochs_drm ttm drm_kms_helper sysco
pyarea sysfillrect sysimgblt fb_sys_fops drm i2c_piix4 ppdev i2c_core pcspkr shpchp virtio_balloon sg parport_pc parport ip_tables xfs libcrc32c sd_mod crc_t
10dif crct10dif_generic sr_mod cdrom crct10dif_common ata_generic pata_acpi virtio_scsi virtio_net ata_piix virtio_pci virtio_ring libata serio_raw virtio fl
oppy dm_mirror dm_region_hash dm_log dm_mod
Jun 24 01:17:59 eraav kernel: CPU: 2 PID: 937 Comm: RDSensor Not tainted 3.10.0-514.2.2.el7.x86_64 #1
Jun 24 01:17:59 eraav kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Jun 24 01:17:59 eraav kernel: task: ffff8800b7de8fb0 ti: ffff8800b30d4000 task.ti: ffff8800b30d4000
[System Reboot]

So the first issue is why this happened. This is reproducible at all times. One strange thing is that even though the host has tried and failed to shutdown the guest, at one point it has finished the backup of all the machines prior to 105. I would not expect the host to try and shutdown the VM, but it does at 3:21am:
Code:
Jun 24 03:01:03 esetav systemd: Started Session 137 of user root.
Jun 24 03:01:03 esetav systemd: Starting Session 137 of user root.
Jun 24 03:21:52 esetav systemd-logind: Power key pressed.
Jun 24 03:21:52 esetav systemd-logind: Powering Off...
Jun 24 03:21:52 esetav systemd-logind: System is powering down.
Jun 24 03:21:52 esetav systemd: Stopping LVM2 PV scan on device 8:2...
Jun 24 03:21:52 esetav systemd: Stopping Session 137 of user root.
Jun 24 03:21:52 esetav systemd: Deactivating swap /dev/mapper/centos_ba--eraappl--v-swap...
Jun 24 03:21:52 esetav systemd: Removed slice system-selinux\x2dpolicy\x2dmigrate\x2dlocal\x2dchanges.slice.
Jun 24 03:21:52 esetav systemd: Stopping system-selinux\x2dpolicy\x2dmigrate\x2dlocal\x2dchanges.slice.
Jun 24 03:21:53 esetav systemd: Stopped Dump dmesg to /var/log/dmesg.
Jun 24 03:21:53 esetav systemd: Stopping Dump dmesg to /var/log/dmesg...
Jun 24 03:21:53 esetav systemd: Stopped target Timers.
Jun 24 03:21:53 esetav systemd: Stopping Timers.
Jun 24 03:21:53 esetav systemd: Stopped Daily Cleanup of Temporary Directories.
Jun 24 03:21:53 esetav systemd: Stopping Daily Cleanup of Temporary Directories.
Jun 24 03:21:53 esetav systemd: Stopping ESET Remote Administrator Xvfb service...
Jun 24 03:21:53 esetav systemd: Stopping Authorization Manager...
Jun 24 03:21:55 esetav rsyslogd: [origin software="rsyslogd" swVersion="7.4.7" x-pid="639" x-info="http://www.rsyslog.com"] exiting on signal 15.

Although the system should be backed up and then the VM brought back online, nothing happens in both cases.

There's something fishy with the backup job in stop backup mode. I'd appreciate any help in troubleshooting this. Feel free to ask for additional info




FYI, the VM in hand is ESET Remote Administrator Virtual Appliance (ERA VA) available here: https://www.eset.com/int/business/remote-management/remote-administrator/download/#virtual
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!