Upgrade 6.2 to 6.4 - high disk utilization in VMs

czechsys

Renowned Member
Nov 18, 2015
480
53
93
Hi,

we upgraded our PVE cluster (very old HP G7 and 3yr old Dell R940) 6.2 to 6.4 and disk utilization in VMs raised from floor.

1633338463637.png

This problem is same for VMs on:
- nfs ssd storage (raw files), default (no cache)
- local ssd disks (LVM thick), default (no cache)

The change depends on VM type (higher on DBs, lower on system disks etc).
There is no utilization change from hardware view for both disk backends (even PVE system disks no change) so it's change only in VMs.


proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve) pve-manager: 6.4-13 (running version: 6.4-13/9f411e79) pve-kernel-5.4: 6.4-6 pve-kernel-helper: 6.4-6 pve-kernel-5.3: 6.1-6 pve-kernel-5.4.140-1-pve: 5.4.140-1 pve-kernel-5.4.44-2-pve: 5.4.44-2 pve-kernel-4.15: 5.4-12 pve-kernel-5.3.18-3-pve: 5.3.18-3 pve-kernel-5.3.13-1-pve: 5.3.13-1 pve-kernel-4.13: 5.2-2 pve-kernel-4.15.18-24-pve: 4.15.18-52 pve-kernel-4.15.18-12-pve: 4.15.18-36 pve-kernel-4.15.17-1-pve: 4.15.17-9 pve-kernel-4.13.16-4-pve: 4.13.16-51 pve-kernel-4.13.16-2-pve: 4.13.16-48 pve-kernel-4.13.16-1-pve: 4.13.16-46 pve-kernel-4.13.13-6-pve: 4.13.13-42 pve-kernel-4.13.13-2-pve: 4.13.13-33 ceph-fuse: 12.2.11+dfsg1-2.1+b1 corosync: 3.1.2-pve1 criu: 3.11-3 glusterfs-client: 5.5-3 ifupdown: 0.8.35+pve1 ksm-control-daemon: 1.3-1 libjs-extjs: 6.0.1-10 libknet1: 1.20-pve1 libproxmox-acme-perl: 1.1.0 libproxmox-backup-qemu0: 1.1.0-1 libpve-access-control: 6.4-3 libpve-apiclient-perl: 3.1-3 libpve-common-perl: 6.4-3 libpve-guest-common-perl: 3.1-5 libpve-http-server-perl: 3.2-3 libpve-storage-perl: 6.4-1 libqb0: 1.0.5-1 libspice-server1: 0.14.2-4~pve6+1 lvm2: 2.03.02-pve4 lxc-pve: 4.0.6-2 lxcfs: 4.0.6-pve1 novnc-pve: 1.1.0-1 openvswitch-switch: 2.12.3-1 proxmox-backup-client: 1.1.13-2 proxmox-mini-journalreader: 1.1-1 proxmox-widget-toolkit: 2.6-1 pve-cluster: 6.4-1 pve-container: 3.3-6 pve-docs: 6.4-2 pve-edk2-firmware: 2.20200531-1 pve-firewall: 4.1-4 pve-firmware: 3.3-1 pve-ha-manager: 3.1-1 pve-i18n: 2.3-1 pve-qemu-kvm: 5.2.0-6 pve-xtermjs: 4.7.0-3 qemu-server: 6.4-2 smartmontools: 7.2-pve2 spiceterm: 3.1-1 vncterm: 1.6-2 zfsutils-linux: 2.0.5-pve1~bpo10+1
 
how is the 'disk utilization' measured? if the storage does not show a change in behaviour from outside, maybe just some metrics exposed to the vm changed?
 
Previous image is zabbix standard disk template.
Even netdata shows something crazy for VM system disk:
1633352867216.png

And VM db data disk:
1633352938383.png

Both disks are on this PVE host, dedicated raid for VM images:
1633353098115.png

PVE OS raid diskset:
1633353194261.png

All VMs are Debian 10. Upgrade was done only on PVE host level.