Problem backup VMs

Dec 31, 2019
35
0
11
36
Hello,
for a while, I have problems with my VM backup which is done at night.

My configuration
Proxmox 7.2-11
Cluster of 5 nodes
Ceph
Backup on a 40TB NAS
A 10Gb/s network between the nodes.

About 120 VMs are backed up every night and I always have the same ones coming back in error.

Important to know, if I backup it by hand, the backup goes well.

below are the error lines:
104: 2022-10-28 02:43:09 INFO: 94% (188.3 GiB of 200.0 GiB) in 1h 12m 56s, read: 756.4 MiB/s, write: 132.1 MiB/s
104: 2022-10-28 02:43:12 INFO: 95% (190.7 GiB of 200.0 GiB) in 1h 12m 59s, read: 826.5 MiB/s, write: 107.3 MiB/s
104: 2022-10-28 02:43:24 INFO: 96% (193.7 GiB of 200.0 GiB) in 1h 13m 11s, read: 254.1 MiB/s, write: 47.3 MiB/s
104: 2022-10-28 02:43:27 INFO: 99% (199.6 GiB of 200.0 GiB) in 1h 13m 14s, read: 2.0 GiB/s, write: 35.4 MiB/s
104: 2022-10-28 02:43:40 INFO: 100% (200.0 GiB of 200.0 GiB) in 1h 13m 27s, read: 34.7 MiB/s, write: 30.3 MiB/s
104: 2022-10-28 02:43:40 INFO: backup is sparse: 24.73 GiB (12%) total zero data
104: 2022-10-28 02:43:40 INFO: transferred 200.00 GiB in 4407 seconds (46.5 MiB/s)
104: 2022-10-28 02:43:50 ERROR: Backup of VM 104 failed - zstd --rsyncable --threads=1 failed - wrong exit status 1

do you have any idea where the problem can come from?

Thanks
 
hi
could you provide the syslog and backup logs?
 
the backup finished at 02:43:40
I found this in the syslog

Oct 28 02:43:40 pxe1-infra pvescheduler[4025415]: Warning: unable to close filehandle GEN136 properly: Input/output error at /usr/share/perl5/PVE/VZDump/QemuServer.pm line 777.
Oct 28 02:43:50 pxe1-infra pvescheduler[4025415]: ERROR: Backup of VM 104 failed - zstd --rsyncable --threads=1 failed - wrong exit status 1

but where do I find the backup logs in /var/log

Thanks
 
it looks to me like flushing on closing the file fails - how is the backup storage mounted?
 
my backups are done on a synology NAS in NFS (see attached)
 

Attachments

  • Capture d’écran 2022-10-31 à 09.58.42.png
    Capture d’écran 2022-10-31 à 09.58.42.png
    28.6 KB · Views: 7
yeah, then that NAS likely returns an error when under load (scheduled backup) when PVE writes a lot of data.. does it work if you backup to a local storage (if you have the space)?
 
yeah, but then no other node is backing up at the same time, which is likely the trigger..
 
Ok je vois ce que tu veux dire.
le problème c'est que j'ai toujours les mêmes 5 ou 6 VM qui ont un bug et les autres jamais
 
I can't tell you why it's just those - maybe they trigger a specific write pattern towards the end, or they have much more data then the others?

(p.s. - my French is rather limited, and this is an English forum ;))
 
Ok I see what you mean.
the problem is that I always have the same 5 or 6 VMs that have a bug and the others never

sorry, I answered too fast and I was doing something else at the same time lol
 
you could try staggering the backup jobs like that, it would reduce the load on the NAS and Ceph side.
 
Hello,
I changed my backup plan for my proxmox infrastructure.
I have 5 backup jobs, one for each node.
I've changed my backup plan for my proxmox infrastructure, so I have 5 backup jobs, one for node, so the VMs in error have been backed up, but I have a VM that has frozen, you will find attached the capture of the console window.
from the console it's impossible to do "systemctl status" so I stopped it at proxmox level
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!