LAMP VMs - PVE7 - Kernel 5.11

TwiX · Aug 24, 2021

Hi,

I recently upgraded a 6 nodes PVE cluster to PVE7

proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-6
pve-kernel-helper: 7.0-6
pve-kernel-5.4: 6.4-5
pve-kernel-5.0: 6.0-11
pve-kernel-5.11.22-3-pve: 5.11.22-6
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 16.2.5-pve1
ceph-fuse: 16.2.5-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.2.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-5
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.8-1
proxmox-backup-file-restore: 2.0.8-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

Code:

cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
performance

Users report me some issues (lag, freezed browsers for 1 min) for several LAMP VMs...
Everything seems to be ok inside the VMs including pve nodes ressources...

I'm wondering if the pb could be the kernel, 5.11 is the only pve7 kernel available
What may happen if I try to boot on the last 5.4.128 (pve6) kernel ?

TwiX · Aug 24, 2021

I can confirm issue with all LAMP server (debian based) since PVE7 upgrade

Dont know if its related to kernel or qemu
I will move VM to pve 6.4 in order to confirm

t.lamprecht · Aug 24, 2021

Hi,

Can you post the VM configuration of such a VM, to see what specific options are set. qm config VMID

Also, what is your exact test to determine (which?) regression?

TwiX · Aug 24, 2021

Hi,

Of course :

Code:

qm config 20089

agent: 1
bootdisk: scsi0
cores: 4
cpu: host
description: Dolibarr
ide2: none,media=cdrom
memory: 8192
name: dc-doli-merc02
net0: virtio=36:7C:34:B8:B1:9E,bridge=vmbr0,tag=20
numa: 1
onboot: 1
ostype: l26
scsi0: kvm_pool:vm-20089-disk-0,cache=writeback,discard=on,size=75G
scsihw: virtio-scsi-pci
smbios1: uuid=9f3a5887-716b-460e-8050-f2dd348b2bda
sockets: 2
tablet: 0

What I want to try is to move this VM to another cluster (still in v6.4 + ceph octopus), in order to see if the issue is related to pve7 (kernel/qemu/ceph)

TwiX · Aug 24, 2021

We also noticed relatively more cpu usage & more iowaits for pve 7.0/ ceph pacific

TwiX · Aug 24, 2021

I see now aio=io_uring

I guess previously it was threads.
Where can I change the aio value ?

TwiX · Aug 24, 2021

And basically, it cannot be done without restarting the VM ?

TwiX · Aug 25, 2021

Hi,

I settled all VM disks with aio=threads.

I'll keep you in touch.

I also saw this

t.lamprecht · Aug 25, 2021

TwiX said:
I also saw this

Note that since the release of the kernel package pve-kernel-5.11.22-3-pve in version 5.11.22-6 the io_uring related issue, which was a kernel bug, was fixed, so io_uring should not cause any crash anymore.

Note, also that switching from 5.4 to 5.11 kernel, and 5.2 to 6.0 QEMU with io_uring can also have some effects in regards to how the load is measured or where it happens (user-space vs. kernel space).

You did not yet told us how you actually measure the performance regression, or how/where the freezes actually show up.
In a LAMP stack it would be the clients browser of the LAMP applications? A frozen Browser for over 1 Minute seems rather like a client issue than a (LAMP) server issue?
Is there higher latency connecting to the HTTP servers or slower bandwidth?

TwiX · Aug 25, 2021

Hi,

Thanks for your answer.

First, aio=threads seems to fix the issue. No complains this morning !

I was pretty confident on the fact that issue should be related to PVE7. The complains involved all relatively high loaded LAMP servers on the updated cluster.
Mysql queries were incredibly slow and some transactions triggered dead locks.
Mysql zabbix supervision showed us some kind of unavailability of mysql

On the same hypervisor

Before PVE 7 upgrade :

After PVE 7 upgrade :

So yesterday I took the decision to move 2 VMs to another cluster with same hardware which was still on pve6.4
Everything seems to work as expected this morning.

And I saw that pve7 now uses the new io_uring engine as default, so I also set aio=threads for all remaining PVE7 VMs
Only 2 issues this morning (may be not related). Yesterday, it happens every 5 min for almost every LAMP VMs on that cluster.

I still see some more higher iowaits but as you said maybe related to how the load is measured with new kernel/qemu

TwiX · Aug 25, 2021

For iowaits

Before pve7 update

After pve7 update

TwiX · Aug 25, 2021

Do you think it is possible to boot on the old 5.4.128 kernel for one node ?
I just want to check if iowaits are better

t.lamprecht · Aug 27, 2021

TwiX said:
Do you think it is possible to boot on the old 5.4.128 kernel for one node ?

Do you use ZFS and upgraded any rpool already? As then it may fail to import the rpool.
but no harm done, one can just boot the 5.11 kernel again.

I'd say that it's pretty safe to try booting into the older kernel, but would keep the async handler set to aio instead of io_uring, as the 5.4 Kernels io_uring support is quite a bit more lacking compared to the 5.11 one.

Search

Search

LAMP VMs - PVE7 - Kernel 5.11

TwiX

Renowned Member

TwiX

Renowned Member

t.lamprecht

Proxmox Staff Member

TwiX

Renowned Member

TwiX

Renowned Member

TwiX

Renowned Member

TwiX

Renowned Member

TwiX

Renowned Member

t.lamprecht

Proxmox Staff Member

TwiX

Renowned Member

Attachments

TwiX

Renowned Member

TwiX

Renowned Member

t.lamprecht

Proxmox Staff Member