Issue with backup and HA

da-alb

Active Member
Jan 18, 2021
124
4
38
Hi,

This morning I had an LXC container stopped and the HA manager didn't start it.

The LXC container was backed up during the night and the HA manager failed to start it apparently.

Here are the logs:

Code:
INFO: Starting Backup of VM 5048 (lxc)
INFO: Backup started at 2021-02-12 00:55:48
INFO: status = running
INFO: CT Name: <customer-name-hidden>
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd74
INFO: creating vzdump archive '/mnt/pve/qnap-m1/dump/vzdump-lxc-5048-2021_02_12-00_55_48.tar.zst'
INFO: Total bytes written: 32153415680 (30GiB, 165MiB/s)
INFO: archive file size: 26.94GB
INFO: removing backup 'qnap-m1:backup/vzdump-lxc-5048-2021_02_09-01_09_29.tar.zst'
INFO: removing backup 'qnap-m1:backup/vzdump-lxc-5048-2021_02_09-02_45_53.tar.zst'
INFO: cleanup temporary 'vzdump' snapshot
Removing snap: 100% complete...done.
INFO: Finished Backup of VM 5048 (00:07:06)
INFO: Backup finished at 2021-02-12 01:02:54

Screenshot_2021-02-12 pm-80 - Proxmox Virtual Environment (0).pngScreenshot_2021-02-12 pm-80 - Proxmox Virtual Environment(1).pngScreenshot_2021-02-12 pm-80 - Proxmox Virtual Environment(2).pngScreenshot_2021-02-12 pm-80 - Proxmox Virtual Environment(3).png

Screenshot_2021-02-12 pm-80 - Proxmox Virtual Environment(4).png


It seems that sometimes during backups some LXC containers remain in the stopped state, even if I use the snapshot mode.

Am I doing something wrong?

Thanks
 
is it possible that load gets so high that HA cannot tell that the container is running and attempts to start it? a snapshot backup does not actually stop the guest at all.. can you give us the system logs from start to end of that backup task?
 
is it possible that load gets so high that HA cannot tell that the container is running and attempts to start it? a snapshot backup does not actually stop the guest at all.. can you give us the system logs from start to end of that backup task?
Can i share it through pastebin?
 
right before the container stops/crashes, an NFS server stops responding. do you have logs from within the container from around 1am? in general your host seems to be overloaded, pvestatd status updates regularly take over 5s..
 
maybe there is some script within the container that did an automatic reboot, and that co-incided with the backup lock preventing the restart?
 
maybe there is some script within the container that did an automatic reboot, and that co-incided with the backup lock preventing the restart?
Hi Fabian, at that time I checked and there is a cron on that container that reboots randomly in a timeframe.

This could be an issue?
 
Hi Fabian,

I think that is the issue. I rebooted another container a few seconds later after launching the backup and I got the same issue.

Screenshot_2021-02-12 pm-80 - Proxmox Virtual Environment.png
 
Last edited:
Hi,

I noticed that when you reboot a container (without HA) while making a snapshot the server doesn't like it too much and gives some errors, but after that the containers starts and works without issues.

Screenshot_2021-02-12 pm-80 - Proxmox Virtual Environment(1).png

Stangely PVE tried to migrate it but no one told it to do so...
 
yeah, rebooting within the guest is outside of our control. I'll see whether there is some way to catch this so that HA does not get confused..