Backup fails as part of scheduled backup job but successful when manually requested

Jul 18, 2024
4
0
1
I have one vm that fails during the nightly backup job saying it's unable to rename the file "log below"

https://forum.proxmox.com/attachmen....94209/?hash=3d24d8c572666b985b22bd7b7de5d3b3

Everything I have found on this points to either permissions or storage space as the issue. Permissions makes no sense as all vm's backup to the same location and don't get this error and I am able to manually request the backup to that location and it does not get this error either "log below. So only fails when part of the scheduled backup job.

https://forum.proxmox.com/attachmen....94208/?hash=3d24d8c572666b985b22bd7b7de5d3b3

Also storage is not the issue when I look at that as well it's not even at 50% used. I don't see anything in the cluster logs and not sure what else to look at.
 

Attachments

what kind of storage are you using? anything visible in the system logs when it fails?
 
The storage is a nfs share from a truenas server on the same local network as the nodes. In the logs I didn't see anything that stuck out to me but here is the logs from a little before and after the issue.



Dec 19 00:23:00 prox pmxcfs[2239136]: [dcdb] notice: data verification successful
Dec 19 01:00:29 prox systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Dec 19 01:00:29 prox systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Dec 19 01:00:29 prox systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Dec 19 01:00:29 prox systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Dec 19 01:02:27 prox pvescheduler[129847]: ERROR: Backup of VM 110 failed - unable to rename '/mnt/pve/backup-write/dump/vzdump-qemu-110-2025_12_19-00_00_04.vma.dat' to '/mnt/pve/backup-write/dump/vzdump-qemu-110-2025_12_19-00_00_04.vma.zst'
Dec 19 01:02:27 prox pvescheduler[129847]: INFO: Backup job finished with errors
Dec 19 01:02:27 prox perl[129847]: skipping disabled target 'Office365'
Dec 19 01:02:27 prox perl[129847]: notified via target `mail-to-root`
Dec 19 01:02:27 prox pvescheduler[129847]: job errors
Dec 19 01:02:27 prox postfix/pickup[149476]: 8866020FD6: uid=0 from=<root>
Dec 19 01:02:27 prox postfix/cleanup[156316]: 8866020FD6: message-id=<20251219070227.8866020FD6@prox.networkvm.host>
Dec 19 01:02:27 prox postfix/qmgr[2196]: 8866020FD6: from=<root@prox.networkvm.host>, size=28110, nrcpt=2 (queue active)
Dec 19 01:02:27 prox postfix/smtp[156318]: 8866020FD6: to=<nocadmin@simplefiber.com>, relay=none, delay=0.05, delays=0.03/0.02/0/0, dsn=5.4.4, status=bounced (Host or domain name not found. Name service error for name=simplefiber.com type=AAAA: Host found but no data record of requested type)
Dec 19 01:02:27 prox postfix/smtp[156318]: 8866020FD6: to=<zack.swogger@simplefiber.com>, relay=none, delay=0.05, delays=0.03/0.02/0/0, dsn=5.4.4, status=bounced (Host or domain name not found. Name service error for name=simplefiber.com type=AAAA: Host found but no data record of requested type)
Dec 19 01:02:27 prox postfix/cleanup[156316]: 9424C20FD7: message-id=<20251219070227.9424C20FD7@prox.networkvm.host>
Dec 19 01:02:27 prox postfix/bounce[156319]: 8866020FD6: sender non-delivery notification: 9424C20FD7
Dec 19 01:02:27 prox postfix/qmgr[2196]: 9424C20FD7: from=<>, size=30777, nrcpt=1 (queue active)
Dec 19 01:02:27 prox postfix/qmgr[2196]: 8866020FD6: removed
Dec 19 01:02:27 prox postfix/pickup[149476]: A036620FD6: uid=65534 from=<root>
Dec 19 01:02:27 prox postfix/cleanup[156316]: A036620FD6: message-id=<20251219070227.9424C20FD7@prox.networkvm.host>
Dec 19 01:02:27 prox proxmox-mail-forward[156321]: notified via target `mail-to-root`
Dec 19 01:02:27 prox postfix/qmgr[2196]: A036620FD6: from=<root@prox.networkvm.host>, size=30974, nrcpt=2 (queue active)
Dec 19 01:02:27 prox postfix/local[156320]: 9424C20FD7: to=<root@prox.networkvm.host>, relay=local, delay=0.05, delays=0/0.01/0/0.04, dsn=2.0.0, status=sent (delivered to command: /usr/bin/proxmox-mail-forward)
Dec 19 01:02:27 prox postfix/qmgr[2196]: 9424C20FD7: removed
Dec 19 01:02:27 prox postfix/smtp[156318]: A036620FD6: to=<nocadmin@simplefiber.com>, relay=none, delay=0.01, delays=0/0/0/0, dsn=5.4.4, status=bounced (Host or domain name not found. Name service error for name=simplefiber.com type=AAAA: Host found but no data record of requested type)
Dec 19 01:02:27 prox postfix/smtp[156318]: A036620FD6: to=<zack.swogger@simplefiber.com>, relay=none, delay=0.01, delays=0/0/0/0, dsn=5.4.4, status=bounced (Host or domain name not found. Name service error for name=simplefiber.com type=AAAA: Host found but no data record of requested type)
Dec 19 01:02:27 prox postfix/qmgr[2196]: A036620FD6: removed
Dec 19 01:02:27 prox postfix/cleanup[156316]: A1C7320FD7: message-id=<20251219070227.A1C7320FD7@prox.networkvm.host>
Dec 19 01:17:01 prox CRON[162474]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Dec 19 01:17:01 prox CRON[162475]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Dec 19 01:17:01 prox CRON[162474]: pam_unix(cron:session): session closed for user root



To add a little more to this I actually updated the nodes on the 19th after posting this and since that vm110 has been successful with it's backups but vm115 is now failing with the same error and is part of that same backup job.
 
it does sound like the error is coming from the storage, maybe it doesn't like renaming the file while it's still being flushed? hard to say from the distance.. if you have the space, you might try switching the backup target to a local disk and see if the problem disappears then..
 
What is weird with that is it works fine if I just tell it to update manually to the same location. Also it is now working the last few days but it has changed to a different VM having that error and I have 10 others that all go the same location without issue.
 
well, the scheduled backup job runs on all nodes in your cluster, so likely you have more load on the NAS side in that case, increasing the chances of triggering the issue..