Snapshot causes VM to become unresponsive.

m4rtin · Feb 26, 2025

Don't you need ZFS for snapshots?

fiona · Feb 26, 2025

m4rtin said:
Don't you need ZFS for snapshots?

No, qcow2 supports snapshots too. Your deleted post shows that the main thread n QEMU is still busy doing IO, so might very well be that the snapshot was just not finished. ZFS can be much faster than huge qcow2 files for snapshots though.

noahk · Sep 9, 2025

Hi... I have the same problem. It occurred with PVE version 9.0.5. I updated to 9.0.6 and I have the same problem.

I took a snapshot including RAM, and the host became inaccessible (sometimes it only respond to ping). When I took a snapshot without RAM, everything worked fine.

Code:

qm config 113
boot: order=scsi0
cores: 8
cpu: host
memory: 8192
meta: creation-qemu=9.2.0,ctime=1751910974
name: srvpaffrw001
net0: virtio=BC:24:11:65:D3:71,bridge=vmbr4,firewall=1,link_down=1
net1: virtio=BC:24:11:6F:08:3F,bridge=vmbr6,firewall=1,link_down=1
net10: virtio=BC:24:11:7B:91:9F,bridge=vmbr131,firewall=1,link_down=1
net2: virtio=BC:24:11:8A:5F:61,bridge=vmbr0,firewall=1
net3: virtio=BC:24:11:ED:0F:00,bridge=vmbr100,firewall=1,link_down=1
net4: virtio=BC:24:11:A9:3B:40,bridge=vmbr110,firewall=1,link_down=1
net5: virtio=BC:24:11:A9:9A:0F,bridge=vmbr120,firewall=1,link_down=1
net6: virtio=BC:24:11:F1:26:23,bridge=vmbr130,firewall=1,link_down=1
net7: virtio=BC:24:11:8F:7E:E9,bridge=vmbr101,firewall=1,link_down=1
net8: virtio=BC:24:11:5E:36:18,bridge=vmbr111,firewall=1,link_down=1
net9: virtio=BC:24:11:0C:65:49,bridge=vmbr121,firewall=1,link_down=1
numa: 0
ostype: l26
parent: BACKUP01
scsi0: VG01_LV01_PVE005:vm-113-disk-0,iothread=1,size=80G
scsihw: virtio-scsi-single
smbios1: uuid=15f83ce4-cb33-45d5-b671-7a145bb74991
sockets: 1
startup: order=1
vmgenid: 2a49f77a-f31b-4bef-98d1-7a113ebcc526

Code:

qm status 113 --verbose
cpus: 8
disk: 0
diskread: 0
diskwrite: 0
lock: snapshot
maxdisk: 85899345920
maxmem: 8589934592
mem: 8592143360
memhost: 8592143360
name: srvpaffrw001
netin: 30754541
netout: 2848030
nics:
        tap113i0:
                netin: 21382609
                netout: 0
        tap113i1:
                netin: 731553
                netout: 0
        tap113i10:
                netin: 210
                netout: 0
        tap113i2:
                netin: 2413048
                netout: 2848030
        tap113i3:
                netin: 1291711
                netout: 0
        tap113i4:
                netin: 73217
                netout: 0
        tap113i5:
                netin: 3713153
                netout: 0
        tap113i6:
                netin: 1148480
                netout: 0
        tap113i7:
                netin: 210
                netout: 0
        tap113i8:
                netin: 70
                netout: 0
        tap113i9:
                netin: 280
                netout: 0
pid: 2236560
pressurecpufull: 0
pressurecpusome: 0
pressureiofull: 0
pressureiosome: 0
pressurememoryfull: 0
pressurememorysome: 0
proxmox-support:
qmpstatus: running
status: running
uptime: 2898
vmid: 113

The VM stayed in locked status for a long time. After that, become online again, but inaccessible.
One thing I don't know if is ok: when the VM freezes and I get this verbose status, the MEM (8592143360) parameter is greater than MAXMEM (8589934592). In PVE GUI I can see 100.03% memory usage. Some type of memory overflow?

Code:

pveversion -v
proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve)
pve-manager: 9.0.6 (running version: 9.0.6/49c767b70aeb6648)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-1-pve-signed: 6.14.11-1
proxmox-kernel-6.14: 6.14.11-1
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.8.12-13-pve-signed: 6.8.12-13
proxmox-kernel-6.8: 6.8.12-13
proxmox-kernel-6.8.12-11-pve-signed: 6.8.12-11
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
intel-microcode: 3.20250512.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.9
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.6
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.14-1
proxmox-backup-file-restore: 4.0.14-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.2
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.1
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.10
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-4
pve-ha-manager: 5.0.4
pve-i18n: 3.5.2
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.19
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux: 2.3.4-pve1

There is a patch for this bug?

nikybiasion · Tuesday at 11:54

Hi, same problem here.
We have several servers with the same configuration, ZFS storage, and automatic VM snapshots (autosnapshot).

The VM freezing problem only occurs on this latest PVE9 server. The VMs that have currently frozen are two Windows VMs (Windows 10 and Windows Server 2025), both during a disk-only snapshot (no RAM).

Compared to other servers, these two VMs had:

Machine: pc-q35-10.0
Controller: virtio-scsi-single
SCSI disk: aio=native; ssd=1

Other VMs that have never had problems mainly have a different Machine level. Recently we modified several disks, converting them to use aio=native and ssd=1, but we currently have no freezes on other servers.
One more thing to note: in most VMs that don't have problems, the machine model is i440fx.

fiona · Tuesday at 13:44

Hi,
@noahk sorry for the late response! Your output suggests that the QEMU main thread is blocked. Should the issue occur again, to diagnose it, make sure you have debugger and debug symbols installed apt install pve-qemu-kvm-dbgsym gdb and then obtain a trace with gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/113.pid) replacing 113 with the correct ID if it's a different VM.

@nikybiasion please share the output of pveversion -v and the VM configurations qm config ID with the correct IDs. When the VM is stuck, what does qm status ID --verbose say? If it's similar to @noahk (e.g. a proxmox-support: line without any value) see the first part of my response.

For both of you, please also check the host's and the guest's system logs from around the time the issue occured.

nikybiasion · Tuesday at 14:56

Hi @fiona
This is the output of pveversion -v

Code:

proxmox-ve: 9.0.0 (running kernel: 6.14.11-2-pve)
pve-manager: 9.0.10 (running version: 9.0.10/deb1ca707ec72a89)
proxmox-kernel-helper: 9.0.4
proxmox-kernel-6.14.11-4-pve-signed: 6.14.11-4
proxmox-kernel-6.14: 6.14.11-4
proxmox-kernel-6.14.11-2-pve-signed: 6.14.11-2
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
ceph-fuse: 19.2.3-pve2
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx10
intel-microcode: 3.20250812.1~deb13u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.11
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.5
libpve-network-perl: 1.1.8
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2+pmx1
lxc-pve: 6.0.5-1
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.16-1
proxmox-backup-file-restore: 4.0.16-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.2.0
proxmox-kernel-helper: 9.0.4
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.2
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.12
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.17-2
pve-ha-manager: 5.0.4
pve-i18n: 3.6.0
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.22
smartmontools: 7.4-pve1
spiceterm: 3.4.1
swtpm: 0.8.0+pve2
vncterm: 1.9.1
zfsutils-linux: 2.3.4-pve1

and the output of qm config VMID

Code:

agent: 1
balloon: 2048
boot: order=scsi0;ide2;net0
cores: 2
cpu: x86-64-v2-AES
ide2: none,media=cdrom
machine: pc-q35-10.0
memory: 4096
meta: creation-qemu=10.0.2,ctime=1756798448
name: SERVERDATI
net0: virtio=BC:24:11:D6:D8:A0,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
parent: auto-orario-251028100412
protection: 1
scsi0: zfspool:vm-102-disk-0,aio=native,discard=on,iothread=1,size=60G,ssd=1
scsi1: zfspool:vm-102-disk-1,backup=0,cache=unsafe,discard=on,iothread=1,size=10G,ssd=1
scsi2: zfspool:vm-102-disk-2,aio=native,discard=on,iothread=1,size=600G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=00c45879-9778-4911-afbd-d18314684cbb
sockets: 1
startup: order=1,up=60
vmgenid: 598235c2-1912-4f0b-9e2f-b1150ce3254e

Qemu Guest Agent inside the vm is 1.271

Output of syslog:

Code:

Oct 28 08:04:10 pvedma1 pvedaemon[2342071]: <root@pam> starting task UPID:pvedma1:002F94DC:11FE5713:69006AEA:qmsnapshot:102:root@pam:
Oct 28 08:04:10 pvedma1 pvedaemon[3118300]: <root@pam> snapshot VM 102: auto-orario-251028080410
Oct 28 08:04:23 pvedma1 pvestatd[2465]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - got timeout
Oct 28 08:04:23 pvedma1 pvestatd[2465]: status update time (9.192 seconds)
Oct 28 08:04:33 pvedma1 pvestatd[2465]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 51 retries
Oct 28 08:04:34 pvedma1 pvestatd[2465]: status update time (9.148 seconds)
Oct 28 08:04:42 pvedma1 pvestatd[2465]: VM 102 qmp command failed - VM 102 qmp command 'query-proxmox-support' failed - unable to connect to VM 102 qmp socket - timeout after 51 retries

Search

Search

Snapshot causes VM to become unresponsive.

m4rtin

Member

fiona

Proxmox Staff Member

noahk

New Member

nikybiasion

Renowned Member

fiona

Proxmox Staff Member

nikybiasion

Renowned Member

We value your privacy