Hello to everyone. For some period of time I'm evaluating Proxmox VE as a replacement for my all in one ESXI + FreeNas setup. During IO benchmarks I discovered serious stability problems related to Windows 10 (1803 - clean install with WU) virtual machines. As an effect of these problems VM host is experiencing massive slowdown, extreme memory and swap usage, OOM kills, kernel panics or hard resets.
Easiest way to invoke this issue is to run Crystal Disk Benchmark (v6.0.1 from MS Store) with relative low(!) file size setting (50MB or 100MB) and wait until write test time come. After read test finish I'm observing almost immediately 90-100% memory usage, IO delay and host starts swaping pages. Sometimes I'm observing OOM kills and Proxmox resets (without anything in kern.log / syslog).
To eliminate hardware variable, I made test on few different hosts – same behavior.
This problem occurs only when Win VM is on ZFS volume, disk is using VirtIO, cache is set to NONE (default) and ZFS sync is set to standard (default also). So, this are default values and this is key to replicate this scenario. With Linux VM using same setup I'm not able to reproduce problem.
My hosts have 32GB of RAM. ProxmoxVE is running from different SSD with ext4/LVM (to avoid swap on zfs problems). VM is on ZFS volume with different configurations (raidz2, mirror, stripe, with/without separate SSD cache, with/without separate SSD SLOG).
ARC size seems to be not relevant - usually was set by module parameter to min 4GB and max 8GB of RAM. Rest of ZFS settings have default values.
When CrystalDiskBenchmark “write test” starts all free memory disappears in 3-4 seconds. In SLAB zio_cache objects are on the top. In zfs iostat I see that system is trying to flush all buffers to intend log (sync writes?).
I don't know why VM host is allocating whole available mem for write buffers. Cached write speed for first 2-3 seconds is at 5-6GB/s level. After that IO throttle begin (and sometimes, but not always OOM kills, host reset etc.)
I have few workarounds for described problems:
1. limit write speed in vm config
2. change sync to disabled for ZVOL
3. change cache to something different than "none"
===
What makes me wonder, why default VM settings (cache=none,zfs sync=standard) are so unstable? Why my Win10 VM is eating all resources? (notice: ARC is limited to 4-8GB and is not expanding, VM has 2-6GB of RAM)
Maybe there is some setting to produce quicker / earlier write IO throttle? This should prevent memory filling by buffers. Is there any setting to limit buffers usage during massive writes?
PS.
Win10 VM on LVM/ext4 runs without any problems. Before running Crystal benchmark there is more than 20GB of free memory. No other VMs are running.
VM setup:
VirtIO drivers 0.1.141 (stable)
bootdisk: virtio0
cores: 4
memory: 6144
name: Win10
net0: virtio=AE:85:A6:95:66:F4,bridge=vmbr0
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=005764ce-eb09-4d0b-94e4-d5122e58d002
sockets: 1
virtio0: vault:vm-100-disk-1,cache=none,size=64G
===
proxmox-ve: 5.2-2 (running kernel: 4.15.18-8-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-11
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-41
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-9
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve2~bpo1
Easiest way to invoke this issue is to run Crystal Disk Benchmark (v6.0.1 from MS Store) with relative low(!) file size setting (50MB or 100MB) and wait until write test time come. After read test finish I'm observing almost immediately 90-100% memory usage, IO delay and host starts swaping pages. Sometimes I'm observing OOM kills and Proxmox resets (without anything in kern.log / syslog).
To eliminate hardware variable, I made test on few different hosts – same behavior.
This problem occurs only when Win VM is on ZFS volume, disk is using VirtIO, cache is set to NONE (default) and ZFS sync is set to standard (default also). So, this are default values and this is key to replicate this scenario. With Linux VM using same setup I'm not able to reproduce problem.
My hosts have 32GB of RAM. ProxmoxVE is running from different SSD with ext4/LVM (to avoid swap on zfs problems). VM is on ZFS volume with different configurations (raidz2, mirror, stripe, with/without separate SSD cache, with/without separate SSD SLOG).
ARC size seems to be not relevant - usually was set by module parameter to min 4GB and max 8GB of RAM. Rest of ZFS settings have default values.
When CrystalDiskBenchmark “write test” starts all free memory disappears in 3-4 seconds. In SLAB zio_cache objects are on the top. In zfs iostat I see that system is trying to flush all buffers to intend log (sync writes?).
I don't know why VM host is allocating whole available mem for write buffers. Cached write speed for first 2-3 seconds is at 5-6GB/s level. After that IO throttle begin (and sometimes, but not always OOM kills, host reset etc.)
I have few workarounds for described problems:
1. limit write speed in vm config
2. change sync to disabled for ZVOL
3. change cache to something different than "none"
===
What makes me wonder, why default VM settings (cache=none,zfs sync=standard) are so unstable? Why my Win10 VM is eating all resources? (notice: ARC is limited to 4-8GB and is not expanding, VM has 2-6GB of RAM)
Maybe there is some setting to produce quicker / earlier write IO throttle? This should prevent memory filling by buffers. Is there any setting to limit buffers usage during massive writes?
PS.
Win10 VM on LVM/ext4 runs without any problems. Before running Crystal benchmark there is more than 20GB of free memory. No other VMs are running.
VM setup:
VirtIO drivers 0.1.141 (stable)
bootdisk: virtio0
cores: 4
memory: 6144
name: Win10
net0: virtio=AE:85:A6:95:66:F4,bridge=vmbr0
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=005764ce-eb09-4d0b-94e4-d5122e58d002
sockets: 1
virtio0: vault:vm-100-disk-1,cache=none,size=64G
===
proxmox-ve: 5.2-2 (running kernel: 4.15.18-8-pve)
pve-manager: 5.2-10 (running version: 5.2-10/6f892b40)
pve-kernel-4.15: 5.2-11
pve-kernel-4.15.18-8-pve: 4.15.18-28
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-41
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-3
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-29
pve-docs: 5.2-9
pve-firewall: 3.0-14
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-38
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve2~bpo1
Last edited: