[SOLVED] Windows freezes when backup / snapshot

kegloadam

New Member
Aug 31, 2022
5
1
3
Dear Community,

We are using several Proxmox servers and we have some random issues with Windows VMs (Windows 10 & Windows Srv 2019, 2022) when VMs dies during backup and/or taking snapshots. No issues with any Linux VMs or LXCs. It happens on multiple servers but mostly on Intel Xeon E5-26* servers. Snapshots are taking without or with RAM, does not matter.
For backups we got the following error message: "VM [ID] qmp command 'cont' failed - unable to connect to VM [ID] qmp socket - timeout after [n*100] retries"

We have tried different CPU setups: default, host and its "better" with SandyBridge CPU, VMs fail only few times per month but still too much since it requires VM reboot which is not really acceptable.

We are using the Virtio-win-guest-tools (0.1.229)

Memory ballooning is disabled.

We have running the following version:
Kernel Version
Linux 5.15.74-1-pve #1 SMP PVE 5.15.74-1 (Mon, 14 Nov 2022 20:17:15 +0100)

PVE Manager Version
pve-manager/7.3-3/c3928077


Thank you for all the ideas and recommendations and please let me know in case any further details may help to advise.
 
Hi,
please share the output of pveversion -v and qm config <ID> with the ID of some affected VMs. If you are using SATA/IDE, there is a related fix in pve-qemu-kvm>=7.2.0-7.
 
  • Like
Reactions: kegloadam
@fiona here are the details you have requested:

For the related fix, I will patch it and will update this thread.

proxmox-ve: 7.3-1 (running kernel: 5.15.74-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-8
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.2-12
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

agent: 1
balloon: 0
bios: ovmf
boot: order=ide0;ide2
cores: 6
cpu: SandyBridge
efidisk0: nvme-mirror:vm-500-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
ide0: nvme-mirror:vm-500-disk-1,discard=on,size=500G,ssd=1
machine: pc-q35-7.1
memory: 32768
meta: creation-qemu=7.1.0,ctime=1675763501
name: REMOVED
net0: e1000=REMOVED_MAC_ID,bridge=vmbr2,firewall=1
numa: 0
onboot: 1
ostype: win11 (it is Win Sri 2022)
parent: snapshot_28_0900
scsihw: virtio-scsi-single
smbios1: uuid=REMOVED
sockets: 1
tpmstate0: nvme-mirror:vm-500-disk-2,size=4M,version=v2.0
vmgenid: REMOVED
 
[SOLVED]
Dear @fiona it looks solved with the update non of windows VMs frozen in the last 2 weeks after the patch so I mark this thread as solved and thank you for your help.
 
Would anyone be willing to help a beginner who isn't familiar with git through this patch? Or, if a simple update of the software will do the trick I'll just update. Thanks!
 
Last edited:
Hi,
Would anyone be willing to help a beginner who isn't familiar with git through this patch? Or, if a simple update of the software will do the trick I'll just update. Thanks!
yes, can just update the system. The package with the fix is pve-qemu-kvm=7.2.0-7 and is currently available on the no-subscription repository. If you don't have that repository enabled, you can
  1. temporarily enable it (can be done in the UI under [your node] > Updates > Repositories > Add)
  2. run apt update
  3. run apt install pve-qemu-kvm
  4. disable the repository again
  5. run apt update again.
You can check the installed version with e.g. dpkg-query --list pve-qemu-kvm. If it's at least 7.2.0-7, then the fix is included. Note that a VM needs to be started again (reboot from inside the guest is not enough) or migrated to a node with an upgraded version to pick up the change.

If you are still on Proxmox VE 7, the fix has not landed there yet. You can upgrade to Proxmox VE 8 following the official guide: https://pve.proxmox.com/wiki/Upgrade_from_7_to_8
 
Here's what I'm seeing:

Code:
root@pve:~# dpkg-query --list pve-qemu-kvm
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-===================================
ii  pve-qemu-kvm   8.0.2-7      amd64        Full virtualization on x86 hardware

I am not sure how to ready that but it looks like it's not installed...

Out of curiosity, why do I want to disable the repository? I already had the no-subscription repository in my list.

Thanks!
 
Code:
root@pve:~# dpkg-query --list pve-qemu-kvm
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-===================================
ii  pve-qemu-kvm   8.0.2-7      amd64        Full virtualization on x86 hardware

I am not sure how to ready that but it looks like it's not installed...
The ii indicates that it is installed. You probably are facing a different issue then. Please post the output of pveversion -v and qm config <ID> --current replacing <ID> with the actual ID of your VM. What exact error do you get? When doing what?

Out of curiosity, why do I want to disable the repository? I already had the no-subscription repository in my list.
No, then you don't need to disable it. Actually, I was confusing 7.2.0-7 with 8.0.2-7. Didn't realize you replied to such an old thread. The former version is available on all repositories since a long time ;) The latter is not on the enterprise repository yet. If you were using that you would have needed those extra steps.
 
Last edited:
Thanks for explaining the situation. Yes, that was an old thread...I hadn't noticed that! Here's the output:

Code:
root@pve1:~# pveversion -v
proxmox-ve: 8.0.2 (running kernel: 6.2.16-14-pve)
pve-manager: 8.0.4 (running version: 8.0.4/d258a813cfa6b390)
pve-kernel-6.2: 8.0.5
proxmox-kernel-helper: 8.0.3
proxmox-kernel-6.2.16-19-pve: 6.2.16-19
proxmox-kernel-6.2: 6.2.16-19
proxmox-kernel-6.2.16-14-pve: 6.2.16-14
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx5
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.5
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.9
libpve-guest-common-perl: 5.0.5
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.4-1
proxmox-backup-file-restore: 3.0.4-1
proxmox-kernel-helper: 8.0.3
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.9
pve-cluster: 8.0.4
pve-container: 5.0.5
pve-docs: 8.0.5
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.8-3
pve-ha-manager: 4.0.2
pve-i18n: 3.0.7
pve-qemu-kvm: 8.0.2-7
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.7
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.13-pve1

Code:
root@pve1:~# qm config 107 --current
agent: 1
args: -acpitable file=/etc/pve/qemu-server/slic_table_node_pve1
bios: ovmf
boot: order=sata0
cores: 4
cpu: kvm64
description: Blue Iris
efidisk0: ZFS-Data:vm-107-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
lock: backup
machine: pc-i440fx-8.0
memory: 8192
meta: creation-qemu=8.0.2,ctime=1693089416
name: windows-11
net0: e1000=86:FF:74:31:2D:AC,bridge=vmbr0
numa: 0
onboot: 1
ostype: win11
sata0: ZFS-Data:vm-107-disk-1,cache=writethrough,discard=on,size=121433M,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=1be96853-78b2-4dac-89ff-88a691bcda52
sockets: 1
vmgenid: a05b48c1-763a-47d0-a3cd-219a6cd2b980

I'm sure there are a few issues there :)

Generally I'm facing issues with this Windows VM only: I frequently find it frozen, sometimes with CPU at 100% and a blue screen, and other times stuck in the middle of a backup. I run Blue Iris on it and I have not been able to keep it going for more than a day or two.

Today, I'm finding that it's been backing up for almost two hours, much slower than usual. I can access it with RDP but it's mostly frozen and unusable, so slow that RDP sometimes disconnects and then tries to reconnect. Here's the current backup log just to show the variations in speed. I'm not going to stop it this time since it seems to be making progress:

Code:
INFO: Starting Backup of VM 107 (qemu)
INFO: Backup started at 2023-11-06 04:46:46
INFO: status = running
INFO: VM Name: windows-11
INFO: include disk 'sata0' 'ZFS-Data:vm-107-disk-1' 121433M
INFO: include disk 'efidisk0' 'ZFS-Data:vm-107-disk-0' 1M
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/mnt/pve/proxmox-backups/dump/vzdump-qemu-107-2023_11_06-04_46_46.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '6ca45e01-f694-4e30-9a4e-0d235db706c1'
INFO: resuming VM again
INFO:   0% (300.9 MiB of 118.6 GiB) in 3s, read: 100.3 MiB/s, write: 74.3 MiB/s
INFO:   1% (1.2 GiB of 118.6 GiB) in 44s, read: 23.0 MiB/s, write: 21.5 MiB/s
INFO:   2% (2.4 GiB of 118.6 GiB) in 54s, read: 122.5 MiB/s, write: 121.0 MiB/s
INFO:   3% (3.6 GiB of 118.6 GiB) in 1m 39s, read: 27.2 MiB/s, write: 26.1 MiB/s
INFO:   4% (4.8 GiB of 118.6 GiB) in 23m 13s, read: 935.0 KiB/s, write: 916.2 KiB/s
INFO:   5% (6.0 GiB of 118.6 GiB) in 23m 24s, read: 118.3 MiB/s, write: 84.4 MiB/s
INFO:   6% (7.1 GiB of 118.6 GiB) in 23m 27s, read: 371.0 MiB/s, write: 367.6 MiB/s
INFO:   7% (8.4 GiB of 118.6 GiB) in 23m 36s, read: 142.6 MiB/s, write: 138.0 MiB/s
INFO:   8% (9.5 GiB of 118.6 GiB) in 24m 41s, read: 17.7 MiB/s, write: 16.7 MiB/s
INFO:   9% (10.8 GiB of 118.6 GiB) in 24m 50s, read: 146.2 MiB/s, write: 142.3 MiB/s
INFO:  10% (11.9 GiB of 118.6 GiB) in 25m 22s, read: 35.1 MiB/s, write: 32.8 MiB/s
INFO:  11% (13.1 GiB of 118.6 GiB) in 26m 18s, read: 21.6 MiB/s, write: 20.4 MiB/s
INFO:  12% (14.3 GiB of 118.6 GiB) in 26m 57s, read: 33.5 MiB/s, write: 32.0 MiB/s
INFO:  13% (15.5 GiB of 118.6 GiB) in 27m 4s, read: 168.3 MiB/s, write: 158.6 MiB/s
INFO:  14% (16.7 GiB of 118.6 GiB) in 27m 53s, read: 25.2 MiB/s, write: 24.0 MiB/s
INFO:  15% (17.9 GiB of 118.6 GiB) in 28m 52s, read: 20.5 MiB/s, write: 19.8 MiB/s
INFO:  16% (19.0 GiB of 118.6 GiB) in 28m 59s, read: 168.8 MiB/s, write: 163.5 MiB/s
INFO:  17% (20.2 GiB of 118.6 GiB) in 30m, read: 20.3 MiB/s, write: 19.5 MiB/s
INFO:  18% (21.5 GiB of 118.6 GiB) in 31m 6s, read: 19.0 MiB/s, write: 18.7 MiB/s
INFO:  19% (22.6 GiB of 118.6 GiB) in 33m 48s, read: 7.0 MiB/s, write: 6.9 MiB/s
INFO:  20% (23.8 GiB of 118.6 GiB) in 33m 55s, read: 179.2 MiB/s, write: 170.9 MiB/s
INFO:  21% (24.9 GiB of 118.6 GiB) in 34m 58s, read: 18.3 MiB/s, write: 17.7 MiB/s
INFO:  22% (26.2 GiB of 118.6 GiB) in 36m 40s, read: 13.1 MiB/s, write: 12.9 MiB/s
INFO:  23% (27.3 GiB of 118.6 GiB) in 39m 39s, read: 6.2 MiB/s, write: 6.1 MiB/s
INFO:  24% (28.9 GiB of 118.6 GiB) in 39m 46s, read: 226.2 MiB/s, write: 137.6 MiB/s
INFO:  26% (31.8 GiB of 118.6 GiB) in 39m 49s, read: 991.2 MiB/s, write: 92.9 MiB/s
INFO:  27% (32.4 GiB of 118.6 GiB) in 39m 52s, read: 209.0 MiB/s, write: 202.5 MiB/s
INFO:  28% (33.3 GiB of 118.6 GiB) in 40m 9s, read: 58.9 MiB/s, write: 57.9 MiB/s
INFO:  29% (34.5 GiB of 118.6 GiB) in 42m 3s, read: 10.3 MiB/s, write: 10.2 MiB/s
INFO:  30% (35.6 GiB of 118.6 GiB) in 42m 20s, read: 65.9 MiB/s, write: 63.6 MiB/s
INFO:  31% (36.8 GiB of 118.6 GiB) in 45m 11s, read: 7.4 MiB/s, write: 7.2 MiB/s
INFO:  32% (37.9 GiB of 118.6 GiB) in 46m 18s, read: 17.4 MiB/s, write: 16.7 MiB/s
INFO:  33% (39.2 GiB of 118.6 GiB) in 48m 5s, read: 11.7 MiB/s, write: 11.4 MiB/s
INFO:  34% (40.5 GiB of 118.6 GiB) in 51m 5s, read: 7.3 MiB/s, write: 7.2 MiB/s
INFO:  35% (41.6 GiB of 118.6 GiB) in 51m 16s, read: 107.1 MiB/s, write: 105.1 MiB/s
INFO:  36% (42.7 GiB of 118.6 GiB) in 54m 30s, read: 5.8 MiB/s, write: 5.6 MiB/s
INFO:  37% (43.9 GiB of 118.6 GiB) in 54m 40s, read: 122.5 MiB/s, write: 118.3 MiB/s
INFO:  38% (45.2 GiB of 118.6 GiB) in 56m 46s, read: 10.5 MiB/s, write: 10.1 MiB/s
INFO:  39% (46.3 GiB of 118.6 GiB) in 56m 55s, read: 123.7 MiB/s, write: 119.0 MiB/s
INFO:  40% (47.4 GiB of 118.6 GiB) in 59m 43s, read: 7.2 MiB/s, write: 6.7 MiB/s
INFO:  41% (48.7 GiB of 118.6 GiB) in 59m 48s, read: 263.2 MiB/s, write: 113.2 MiB/s
INFO:  42% (49.8 GiB of 118.6 GiB) in 1h 2m 53s, read: 6.1 MiB/s, write: 6.0 MiB/s
INFO:  43% (51.0 GiB of 118.6 GiB) in 1h 3m 9s, read: 74.5 MiB/s, write: 73.4 MiB/s
INFO:  44% (52.4 GiB of 118.6 GiB) in 1h 6m 11s, read: 7.7 MiB/s, write: 7.4 MiB/s
INFO:  45% (53.4 GiB of 118.6 GiB) in 1h 6m 35s, read: 42.6 MiB/s, write: 33.3 MiB/s
INFO:  46% (54.7 GiB of 118.6 GiB) in 1h 9m 23s, read: 7.9 MiB/s, write: 7.7 MiB/s
INFO:  47% (55.9 GiB of 118.6 GiB) in 1h 11m 47s, read: 8.4 MiB/s, write: 7.9 MiB/s
INFO:  48% (57.0 GiB of 118.6 GiB) in 1h 12m 17s, read: 38.3 MiB/s, write: 36.2 MiB/s
INFO:  49% (58.2 GiB of 118.6 GiB) in 1h 13m 51s, read: 13.0 MiB/s, write: 11.5 MiB/s
INFO:  50% (59.4 GiB of 118.6 GiB) in 1h 16m 38s, read: 7.7 MiB/s, write: 7.3 MiB/s
INFO:  51% (60.6 GiB of 118.6 GiB) in 1h 16m 47s, read: 137.8 MiB/s, write: 119.6 MiB/s
INFO:  52% (61.7 GiB of 118.6 GiB) in 1h 19m 37s, read: 6.7 MiB/s, write: 5.5 MiB/s
INFO:  53% (62.9 GiB of 118.6 GiB) in 1h 19m 45s, read: 150.4 MiB/s, write: 115.2 MiB/s
INFO:  54% (64.1 GiB of 118.6 GiB) in 1h 22m 26s, read: 7.3 MiB/s, write: 5.5 MiB/s
INFO:  55% (65.3 GiB of 118.6 GiB) in 1h 23m 38s, read: 17.1 MiB/s, write: 12.6 MiB/s
INFO:  56% (66.4 GiB of 118.6 GiB) in 1h 25m 19s, read: 12.0 MiB/s, write: 9.0 MiB/s
INFO:  57% (67.6 GiB of 118.6 GiB) in 1h 28m 11s, read: 7.1 MiB/s, write: 5.5 MiB/s
INFO:  58% (68.9 GiB of 118.6 GiB) in 1h 28m 18s, read: 183.8 MiB/s, write: 129.3 MiB/s
INFO:  59% (70.1 GiB of 118.6 GiB) in 1h 30m 41s, read: 8.3 MiB/s, write: 5.8 MiB/s
INFO:  60% (71.4 GiB of 118.6 GiB) in 1h 31m 9s, read: 48.4 MiB/s, write: 8.2 MiB/s
INFO:  61% (72.5 GiB of 118.6 GiB) in 1h 31m 15s, read: 181.9 MiB/s, write: 119.8 MiB/s
INFO:  62% (73.7 GiB of 118.6 GiB) in 1h 31m 34s, read: 68.5 MiB/s, write: 50.1 MiB/s
INFO:  63% (74.8 GiB of 118.6 GiB) in 1h 33m 11s, read: 11.5 MiB/s, write: 6.7 MiB/s
INFO:  64% (75.9 GiB of 118.6 GiB) in 1h 33m 18s, read: 165.2 MiB/s, write: 123.7 MiB/s
INFO:  65% (77.2 GiB of 118.6 GiB) in 1h 36m 7s, read: 7.9 MiB/s, write: 6.5 MiB/s
INFO:  66% (78.6 GiB of 118.6 GiB) in 1h 36m 13s, read: 229.0 MiB/s, write: 122.4 MiB/s
 
Last edited:
See these interesting graphs: there's no activity on the machine right now other than the backup.

1699277790272.png
1699277818262.png

1699277848431.png
 
If you have one HDD for your ZFS pool , slow perf is normal.
Moreover Cache=writethrough disable "Read cache"

edit: check Disk IO graph of the VM ou IO from your host will be high.
 
Last edited:
sorry, I mean IO Delay from "Host" view in same graph of CPU.
Keep Cache=Defaut in your vDisk sata0: in HW options of your VM.
What is your ZFS specs ? What HDD ?
 
1699282671256.png

this is from the node. the drop in the blue coincides with the end of a long backup.

How do I get the ZFS specs? The HDD is two samsung m.2 2TB ssds set up as stripe (for total of 4TB)
 
I'm wrong on IO delay, I thought it was going to be higher.
I mean ZFS specs = ZFS configuration , like you write. missing exact model SSD , sorry for my language.
I bet on consumer model , which can be ok for ZFS PVE OS itself but should be forbidden for ZFS VM storage, imo.
 
Last edited:
it's considered as consumer drive for ZFS.
zfs require enterprise ssd with PLP protection, with capacitors, which allow fast sync writes.
mainly datacenter ssd in 22110 format for nvme.
forum is filled about this.
 
Last edited:
Do you recommend rebuilding this node without ZFS? I can migrate machines to another node and rebuild using btrfs or...what do you recommend? Thanks :)
 
I like ZFS install for its mirror boot OS , so I choose from installer to only use 64 GB ( or 128 GB if I need many ISOs for "local" space).
Free space will be not partionned, so "cfdisk" a first SSD to create a partition , then create a regular "non mirrored" LVMThin datastore.
Performance and wear level will be ok.
You loose of course ZFS file system integrity in case of crash or power lost , if you need these , you need the proper hardware.

I often install Proxmox Backup Server alongside PVE (not recommended , VM for PBS is recommended ...)
Then "cfdisk" the second SSD to create a partition then format as ext4 for PBS Datastore.

(sorry for my wording )
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!