VM stucks on replication

Dec 6, 2023
6
0
1
Hi guys,

I run a proxmox cluster, all nodes are on 8.2.7 or 8.3.0 on zfs. I've recently live migrated tens of VMs with no problem.

There is one VM (Alma 8, Cpanel/WHM, 4.18.0-553.27.1.el8_10.x86_64, guest agent on) that stucks at all disk migration / copy operation: backup, live migration, and even replication. Looks that the freeze command makes this block, I forcefully stop it to get it back online.

I have similar VMs (OS, kernel) that do not have this issue. I cannot see what's particular to this one.

Does anybody has similar problems? How would you suggest to debug this? Any advice?

Thanks!
 
I don't see any replication logs, or maybe I don't know where to look for it. Please tell me where to find these.

But here is the log of a previous backup that also stuck this VM:

INFO: starting new backup job: vzdump 331 --node px08m --notification-mode auto --storage pbs01 --notes-template '{{guestname}}' --remove 0 --mode snapshot
INFO: Starting Backup of VM 331 (qemu)
INFO: Backup started at 2024-07-16 09:19:16
INFO: status = running
INFO: VM Name: vm331
INFO: include disk 'scsi0' 'data:vm-331-disk-0' 501G
INFO: exclude disk 'scsi1' 'data:vm-331-disk-1' (backup=no)
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: pending configuration changes found (not included into backup)
INFO: creating Proxmox Backup Server archive 'vm/331/2024-07-16T06:19:16Z'
INFO: issuing guest-agent 'fs-freeze' command
closing with read buffer at /usr/share/perl5/IO/Multiplex.pm line 927.
ERROR: interrupted by signal
INFO: issuing guest-agent 'fs-thaw' command

As I remember, the "ERROR: interrupted by signal" was generated after I tried to stop the backup process, because the VM was already blocked at this moment.

Now I see this weird line: "closing with read buffer at /usr/share/perl5/IO/Multiplex.pm line 927.", after fs-freeze. I guess this is the problem.