Problem backup VMs

anthony.sibiodon · Oct 28, 2022

Hello,
for a while, I have problems with my VM backup which is done at night.

My configuration
Proxmox 7.2-11
Cluster of 5 nodes
Ceph
Backup on a 40TB NAS
A 10Gb/s network between the nodes.

About 120 VMs are backed up every night and I always have the same ones coming back in error.

Important to know, if I backup it by hand, the backup goes well.

below are the error lines:
104: 2022-10-28 02:43:09 INFO: 94% (188.3 GiB of 200.0 GiB) in 1h 12m 56s, read: 756.4 MiB/s, write: 132.1 MiB/s
104: 2022-10-28 02:43:12 INFO: 95% (190.7 GiB of 200.0 GiB) in 1h 12m 59s, read: 826.5 MiB/s, write: 107.3 MiB/s
104: 2022-10-28 02:43:24 INFO: 96% (193.7 GiB of 200.0 GiB) in 1h 13m 11s, read: 254.1 MiB/s, write: 47.3 MiB/s
104: 2022-10-28 02:43:27 INFO: 99% (199.6 GiB of 200.0 GiB) in 1h 13m 14s, read: 2.0 GiB/s, write: 35.4 MiB/s
104: 2022-10-28 02:43:40 INFO: 100% (200.0 GiB of 200.0 GiB) in 1h 13m 27s, read: 34.7 MiB/s, write: 30.3 MiB/s
104: 2022-10-28 02:43:40 INFO: backup is sparse: 24.73 GiB (12%) total zero data
104: 2022-10-28 02:43:40 INFO: transferred 200.00 GiB in 4407 seconds (46.5 MiB/s)
104: 2022-10-28 02:43:50 ERROR: Backup of VM 104 failed - zstd --rsyncable --threads=1 failed - wrong exit status 1

do you have any idea where the problem can come from?

Thanks

noel. · Oct 28, 2022

hi
could you provide the syslog and backup logs?

anthony.sibiodon · Oct 28, 2022

the backup finished at 02:43:40
I found this in the syslog

Oct 28 02:43:40 pxe1-infra pvescheduler[4025415]: Warning: unable to close filehandle GEN136 properly: Input/output error at /usr/share/perl5/PVE/VZDump/QemuServer.pm line 777.
Oct 28 02:43:50 pxe1-infra pvescheduler[4025415]: ERROR: Backup of VM 104 failed - zstd --rsyncable --threads=1 failed - wrong exit status 1

but where do I find the backup logs in /var/log

Thanks

anthony.sibiodon · Oct 28, 2022

I found this in /var/log/vzdump
attached is the file qemu-104.log

anthony.sibiodon · Oct 31, 2022

up please

fabian · Oct 31, 2022

it looks to me like flushing on closing the file fails - how is the backup storage mounted?

anthony.sibiodon · Oct 31, 2022

my backups are done on a synology NAS in NFS (see attached)

fabian · Oct 31, 2022

yeah, then that NAS likely returns an error when under load (scheduled backup) when PVE writes a lot of data.. does it work if you backup to a local storage (if you have the space)?

anthony.sibiodon · Oct 31, 2022

I did not make a backup plan on the local.
on the other hand when I backup the machine alone everything goes well.

fabian · Oct 31, 2022

yeah, but then no other node is backing up at the same time, which is likely the trigger..

anthony.sibiodon · Oct 31, 2022

Ok je vois ce que tu veux dire.
le problème c'est que j'ai toujours les mêmes 5 ou 6 VM qui ont un bug et les autres jamais

fabian · Oct 31, 2022

I can't tell you why it's just those - maybe they trigger a specific write pattern towards the end, or they have much more data then the others?

(p.s. - my French is rather limited, and this is an English forum

)

anthony.sibiodon · Oct 31, 2022

Ok I see what you mean.
the problem is that I always have the same 5 or 6 VMs that have a bug and the others never

sorry, I answered too fast and I was doing something else at the same time lol

anthony.sibiodon · Oct 31, 2022

it means that I would have to create 5 backup plans since I have 5 nodes

fabian · Oct 31, 2022

you could try staggering the backup jobs like that, it would reduce the load on the NAS and Ceph side.

anthony.sibiodon · Nov 1, 2022

I will create 5 backup tasks, one for each nodes
we'll see the result and I'll tell you

Neobin · Nov 1, 2022

anthony.sibiodon said:
it means that I would have to create 5 backup plans since I have 5 nodes

For reference:
https://bugzilla.proxmox.com/show_bug.cgi?id=3086

anthony.sibiodon · Nov 4, 2022

Hello,
I changed my backup plan for my proxmox infrastructure.
I have 5 backup jobs, one for each node.
I've changed my backup plan for my proxmox infrastructure, so I have 5 backup jobs, one for node, so the VMs in error have been backed up, but I have a VM that has frozen, you will find attached the capture of the console window.
from the console it's impossible to do "systemctl status" so I stopped it at proxmox level

anthony.sibiodon · Nov 4, 2022

attached

Problem backup VMs

Member

Active Member

Member

Member

Attachments

Member

Proxmox Staff Member

Member

Attachments

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

Proxmox Staff Member

Member

Distinguished Member

Member

Member

Attachments