Snapshot with VM State and RAM fails.

rborg

New Member
Jun 26, 2024
9
3
3
Hi,

any pointers would help.
We have a particular VM that when we try to perform a snapshot including RAM, the snapshot fails. On the Ceph cluster we can see the RBD state image being written to, once it reaches around 99% process fails with the following error:
Code:
TASK ERROR: unable to save VM state and RAM - qemu_savevm_state_complete_precopy error -5

and subsequent snapshot attempts with or without RAM fail with the following error until the VM is stopped or migrated to another node.
Code:
TASK ERROR: VM 5021020 qmp command 'savevm-start' failed - VM snapshot already started

Snapshots without RAM work flawlessly for this VM once the already started error is cleared.
Snapshots for other VMs on the same nodes and storage with RAM work without issues.

A few a seconds before the snapshot fails we also observer the following type of logs
Code:
Oct 27 15:30:42 pvenode07 pvedaemon[2193130]: VM 5021020 qmp command failed - VM 5021020 qmp command 'query-proxmox-support' failed - unable to connect to VM 5021020 qmp socket - timeout after 51 retries
Oct 27 15:30:42 pvenode07 pvedaemon[2185978]: VM 5021020 qmp command failed - VM 5021020 qmp command 'query-proxmox-support' failed - unable to connect to VM 5021020 qmp socket - timeout after 51 retries
Oct 27 15:30:43 pvenode07 pvestatd[1753]: VM 5021020 qmp command failed - VM 5021020 qmp command 'query-proxmox-support' failed - unable to connect to VM 5021020 qmp socket - timeout after 51 retries
Oct 27 15:30:43 pvenode07 pvedaemon[2176923]: VM 5021020 qmp command failed - VM 5021020 qmp command 'query-proxmox-support' failed - unable to connect to VM 5021020 qmp socket - timeout after 51 retries
Oct 27 15:30:47 pvenode07 pvedaemon[2185978]: <user> end task UPID:pvenode07:00216AA8:0CF41FED:68FF8169:qmsnapshot:5021020:<user>: unable to save VM state and RAM - qemu_savevm_state_complete_precopy error -5

Multinode Cluster
Code:
Proxmox Version: 8.4.14
Kernel: Linux 6.8.12-15-pve
Storage: External Ceph 19.2.2 squid (stable)
VM Config:
Code:
agent: enabled=1
bios: seabios
boot: order=scsi0
cores: 4
cpu: Haswell-noTSX
machine: pc-i440fx-9.2
memory: 8192
meta: creation-qemu=9.2.0,ctime=1752059571
name: <redacted>
net0: virtio=BC:24:11:13:FA:7B,bridge=vmbr1,tag=221
numa: 0
ostype: l26
scsi0: <redacted>:vm-5021020-disk-0,size=85G
scsihw: virtio-scsi-pci
smbios1: uuid=ac1fc4f2-4318-44f1-a0fd-d8f004c48245
sockets: 1
tags: <redacted>
vmgenid: 89dca5cb-194c-4325-8050-47d36b2f993c