Backups started failing

Canadian-trekky

New Member
Mar 8, 2024
9
0
1
Hoping to get some help.

Backups have been working without issue, however last week the backups started failing, and the VM's become unresponsive. Trying to reboot or shutdown the VM doesn't work, and I get an error saying the VM is locked due to backup. The only way to recover is to power cycle the host.

Backups are being sent to a local server via SMB. Once the backup starts Proxmox loses connection to the SMB storage, and I receive the below errors. I have tried removing and re-adding the SMB storage, and even set it up as NFS, but get the same result. I have also setup new backup jobs, but those fail as well.

Mar 09 18:55:22 pve02 pvedaemon[34648]: INFO: starting new backup job: vzdump 201 --compress zstd --remove 0 --storage backup-omv --notification-mode auto --mode snapshot --notes-template '{{guestname}}' --node pve02
Mar 09 18:55:22 pve02 pvedaemon[34648]: INFO: Starting Backup of VM 201 (qemu)
Mar 09 18:55:36 pve02 pvestatd[976]: got timeout
Mar 09 18:55:42 pve02 pvestatd[976]: unable to activate storage 'backup-omv' - directory '/mnt/pve/backup-omv' does not exist or is unreachable
Mar 09 18:55:42 pve02 pvestatd[976]: status update time (9.630 seconds)
Mar 09 18:55:45 pve02 pvestatd[976]: got timeout

I've tested my backup server from another system, and everything is working fine, and continues to work even though Proxmox shows loss in connection.

VM's being backed up are all linux, and are all virtio SCSI single, iothread enabled, network virtio paravirtualized, with disks from 40 - 200GB in size.

Everything is currently up to date.

proxmox-ve: 8.1.0 (running kernel: 6.5.13-1-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
pve-kernel-6.2: 8.0.5
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5: 6.5.13-1
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.2.16-20-pve: 6.2.16-20
proxmox-kernel-6.2: 6.2.16-20
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.1
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.0
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.0.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.1.3
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.3
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.9-2
pve-ha-manager: 4.0.3
pve-i18n: 3.2.0
pve-qemu-kvm: 8.1.5-3
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve2

Any thoughts?
 
Hi,
any hints on why you loose the connection on the host providing the SMB share? Is the storage mounted correctly before you start the backup? Please share the output of mount | grep backup-omv and pvesm status before you run the backup.
 
Hi Chris,

No idea, it just stopped working. The SMB share continues to work without issue from other systems.

Here is the output from the requested commands.

root@pve01:~# mount | grep backup-omv
//192.168.0.16/backupserver on /mnt/pve/backup-omv type cifs (rw,relatime,vers=3.1.1,cache=strict,username=backupuser,uid=0,noforceuid,gid=0,noforcegid,addr=192.168.0.16,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=60,actimeo=1,closetimeo=1)
root@pve01:~#


root@pve01:~# pvesm status
Name Type Status Total Used Available %
backup-omv cifs active 958802032 52187584 906614448 5.44%
local dir active 98497780 15835528 77612704 16.08%
local-lvm lvmthin active 832888832 248700605 584188226 29.86%
root@pve01:~#

Thanks for your help.
 
Backups have been working without issue, however last week the backups started failing, and the VM's become unresponsive. Trying to reboot or shutdown the VM doesn't work, and I get an error saying the VM is locked due to backup.
You can manually remove such lock with qm unlock VMID, but make sure that the backup task isn't running. That lock is there to make sure no changes in the VM are done while it's being backed up. I suppose that there will be just one VM with the lock, and rebooting the whole host just for a lock seems overkill.

You should check logs in the SMB server too, as backups won't work properly unless the share is fully stable.
 
I just had a look at the SMB server logs, and theirs not much there, and nothing stands out. Unfortunately the backup is still in progress and can't stop it. There were some new updates for Proxmox, so I installed those as well, and rebooted the hosts, and also rebooted the SMB server, but backups still stop working. For example, I tried a manual backup last night of a small 32GB linux VM, and it made it to 71% then stopped. The Proxmox logs showed the same error "unable to activate storage 'backup-omv' - directory '/mnt/pve/backup-omv' does not exist or is unreachable". Once the backup freezes, I check the connection to the SMB server from another computer, and things are fine, and I can still transfer files...

Any other thoughts?
 
Just tried a backup again while pinging my backup server from the Proxmox host. The backup hung at 71% and the host was still able to ping the backup server, and had no blips.

The logs still show "Mar 13 14:28:38 pve02 pvedaemon[1001]: unable to activate storage 'backup-omv' - directory '/mnt/pve/backup-omv' does not exist or is unreachable"

Screenshot from 2024-03-13 14-25-14.png
 
The good news is it's now working. I found a second issue pertaining to transfer speeds with one of the VM's in one direction only, so I swapped out the Ethernet cable, and the speed issue was resolved, and the backups on the main node are now working.

Thanks for everyone's help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!