VM Soft lockups when heavy load in PVE 8.0 + kernel 6.x

pandada8

Active Member
Jun 25, 2018
13
2
43
28
I recently came across VM soft lockups when running heavy loads with compute and io.
It began with PVE 7.4 and with 6.1 kernel. When the PVE 8.0 was released. I upgraded all nodes with kernel 6.2 but nothing seems to change .
The CPU running is two socket 7742 & 7702 systems. I tried upgrade the cpu microcode using unstable repo with `patch_level=0x08301072`. It seems getting things a little better and still a lot of soft lockups.
 
Switch to threads seems fix the softlockup issue. Still, I hope io_uring can be enabled one day since it seems have better io performance
 
  • Like
Reactions: Zerstoiber
So.....
I switch to new 6.2.16-5-pve kernel and the softlockup comes backup !

all locked up vm is running with aio=native and virtio-single

e.g.

Code:
acpi: 1
agent: enabled=1
bios: seabios
boot: order=scsi0
cicustom: vendor=cephfs:snippets/ci-k8s.yaml
cores: 64
cpu: host
ide2: hp6hdd:155/vm-155-cloudinit.raw,media=cdrom,size=4M
ipconfig0: ip=10.2.12.241/24,gw=10.2.12.1
machine: q35
memory: 262144
meta: creation-qemu=6.2.0,ctime=1650715383
name: ci-k8s-1
net0: virtio=3A:03:AD:06:8D:60,bridge=vmbr0,tag=1012
net1: virtio=9A:AF:79:97:6B:7D,bridge=bachang
numa: 0
ostype: l26
scsi0: hp6hdd:155/vm-155-disk-0.qcow2,aio=native,discard=on,iothread=1,size=10G
scsi1: hp9hdd:155/vm-155-disk-0.qcow2,aio=native,backup=0,discard=on,iothread=1,size=200G
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=8d78b470-3dfa-4c23-b55e-9b4d04233123
sockets: 1
vmgenid: 97a33c7a-9a88-4af5-a3e7-a5ea1145c906

Host is AMD EPYC 7742 dual socket

Code:
pveversion --verbose
proxmox-ve: 8.0.1 (running kernel: 6.2.16-5-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.4
pve-kernel-5.15: 7.4-4
pve-kernel-6.1: 7.3-6
pve-kernel-6.2.16-5-pve: 6.2.16-6
pve-kernel-6.2.16-3-pve: 6.2.16-3
pve-kernel-6.1.15-1-pve: 6.1.15-1
pve-kernel-5.15.108-1-pve: 5.15.108-1
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.1
libpve-access-control: 8.0.4
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.7
libpve-guest-common-perl: 5.0.4
libpve-http-server-perl: 5.0.4
libpve-network-perl: 0.8.1
libpve-rs-perl: 0.8.5
libpve-storage-perl: 8.0.2
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 3.0.1-1
proxmox-backup-file-restore: 3.0.1-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.2
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.3
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.3
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
 
Last edited: