Snapshot with VM State and RAM fails.

rborg

New Member
Jun 26, 2024
11
4
3
Hi,

any pointers would help.
We have a particular VM that when we try to perform a snapshot including RAM, the snapshot fails. On the Ceph cluster we can see the RBD state image being written to, once it reaches around 99% process fails with the following error:
Code:
TASK ERROR: unable to save VM state and RAM - qemu_savevm_state_complete_precopy error -5

and subsequent snapshot attempts with or without RAM fail with the following error until the VM is stopped or migrated to another node.
Code:
TASK ERROR: VM 5021020 qmp command 'savevm-start' failed - VM snapshot already started

Snapshots without RAM work flawlessly for this VM once the already started error is cleared.
Snapshots for other VMs on the same nodes and storage with RAM work without issues.

A few a seconds before the snapshot fails we also observer the following type of logs
Code:
Oct 27 15:30:42 pvenode07 pvedaemon[2193130]: VM 5021020 qmp command failed - VM 5021020 qmp command 'query-proxmox-support' failed - unable to connect to VM 5021020 qmp socket - timeout after 51 retries
Oct 27 15:30:42 pvenode07 pvedaemon[2185978]: VM 5021020 qmp command failed - VM 5021020 qmp command 'query-proxmox-support' failed - unable to connect to VM 5021020 qmp socket - timeout after 51 retries
Oct 27 15:30:43 pvenode07 pvestatd[1753]: VM 5021020 qmp command failed - VM 5021020 qmp command 'query-proxmox-support' failed - unable to connect to VM 5021020 qmp socket - timeout after 51 retries
Oct 27 15:30:43 pvenode07 pvedaemon[2176923]: VM 5021020 qmp command failed - VM 5021020 qmp command 'query-proxmox-support' failed - unable to connect to VM 5021020 qmp socket - timeout after 51 retries
Oct 27 15:30:47 pvenode07 pvedaemon[2185978]: <user> end task UPID:pvenode07:00216AA8:0CF41FED:68FF8169:qmsnapshot:5021020:<user>: unable to save VM state and RAM - qemu_savevm_state_complete_precopy error -5

Multinode Cluster
Code:
Proxmox Version: 8.4.14
Kernel: Linux 6.8.12-15-pve
Storage: External Ceph 19.2.2 squid (stable)
VM Config:
Code:
agent: enabled=1
bios: seabios
boot: order=scsi0
cores: 4
cpu: Haswell-noTSX
machine: pc-i440fx-9.2
memory: 8192
meta: creation-qemu=9.2.0,ctime=1752059571
name: <redacted>
net0: virtio=BC:24:11:13:FA:7B,bridge=vmbr1,tag=221
numa: 0
ostype: l26
scsi0: <redacted>:vm-5021020-disk-0,size=85G
scsihw: virtio-scsi-pci
smbios1: uuid=ac1fc4f2-4318-44f1-a0fd-d8f004c48245
sockets: 1
tags: <redacted>
vmgenid: 89dca5cb-194c-4325-8050-47d36b2f993c
 
could you share the VM config of other VM on the same node and storage where snapshot works without error?
 
the problem is that its random vms. When there is failure, the job restarts automatically and it is successful. if vm a fails today, tomorrow will be ok.