Possible bug on io_uring

freebee · May 10, 2022

Hi.
Today the vms on my server stopped.
The E/S was very very higher (89%) and VMS was not accessible from outside (was running on the panel but not accessible).
I stopped all VMS and E/S did not drop.
On dmesg I have this output:

[812387.833488] fwbr301i1: port 2(tap301i1) entered disabled state
[812387.901063] fwbr301i1: port 2(tap301i1) entered disabled state
[812687.849277] ------------[ cut here ]------------
[812687.849293] WARNING: CPU: 8 PID: 2735054 at fs/io_uring.c:8811 io_ring_exit_work+0xc6/0x650
[812687.849306] Modules linked in: cls_u32 act_police cls_basic sch_ingress sch_htb veth 8021q garp mrp tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr ipmi_ssif intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass mgag200 crct10dif_pclmul ghash_clmulni_intel drm_kms_helper aesni_intel cec crypto_simd rc_core cryptd i2c_algo_bit fb_sys_fops rapl syscopyarea sysfillrect sysimgblt intel_cstate serio_raw pcspkr ioatdma hpilo acpi_ipmi dca ipmi_si ipmi_devintf tpm_infineon ipmi_msghandler mac_hid acpi_power_meter zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic
[812687.849379] xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio crc32_pclmul psmouse pata_acpi lpc_ich uhci_hcd ehci_pci nvme ehci_hcd bnx2x hpsa nvme_core scsi_transport_sas mdio libcrc32c
[812687.849395] CPU: 8 PID: 2735054 Comm: kworker/u129:0 Tainted: P IO 5.13.19-6-pve #1
[812687.849398] Hardware name: HP ProLiant DL360p Gen8, BIOS P71 01/22/2018
[812687.849400] Workqueue: events_unbound io_ring_exit_work
[812687.849405] RIP: 0010:io_ring_exit_work+0xc6/0x650
[812687.849408] Code: 74 14 b9 01 00 00 00 4c 89 fa 48 c7 c6 e0 d0 b8 88 e8 ce 95 00 00 4c 89 f7 e8 b6 7b ff ff 48 8b 05 bf f5 86 01 49 39 c4 79 81 <0f> 0b e9 7a ff ff ff 48 c7 c2 d8 6d c7 8a 48 8d 7d b0 49 8d 5d 20
[812687.849410] RSP: 0018:ffffb1b7ff463dd8 EFLAGS: 00010293
[812687.849412] RAX: 000000010c1b0ccf RBX: ffff88afddbc7dc0 RCX: 0000000000000003
[812687.849414] RDX: 0000000000000000 RSI: ffff88afddbc7880 RDI: ffff88afddbc7c80
[812687.849415] RBP: ffffb1b7ff463e70 R08: 0000000000000001 R09: 0000000000000002
[812687.849416] R10: 0000000000000000 R11: 0000000000000001 R12: 000000010c1b0cca
[812687.849417] R13: ffff88afddbc7d90 R14: 0000000000000000 R15: ffff88afddbc7800
[812687.849419] FS: 0000000000000000(0000) GS:ffff88c65f200000(0000) knlGS:0000000000000000
[812687.849421] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[812687.849422] CR2: 00007f91045bb308 CR3: 00000004ff488004 CR4: 00000000001726e0
[812687.849424] Call Trace:
[812687.849427] <TASK>
[812687.849432] ? __switch_to+0x11d/0x460
[812687.849438] ? __switch_to_asm+0x36/0x70
[812687.849444] process_one_work+0x220/0x3c0
[812687.849449] worker_thread+0x53/0x420
[812687.849451] ? process_one_work+0x3c0/0x3c0
[812687.849452] kthread+0x12b/0x150
[812687.849457] ? set_kthread_struct+0x50/0x50
[812687.849459] ret_from_fork+0x22/0x30
[812687.849463] </TASK>
[812687.849464] ---[ end trace e251dc33bd633015 ]---
root@pv01:~# reboot
root@pv01:~#
root@pv01:~#
root@pv01:~#
root@pv01:~# echo 1 > /proc/sys/kernel/sysrq
root@pv01:~# echo b > /proc/sysrq-trigger

The last two commands are for reboot because reboot does not work.

pve-manager/7.1-12/b3c09de3

Linux 5.13.19-6-pve #1 SMP PVE 5.13.19-15 (Tue, 29 Mar 2022 15:59:50 +0200)

fiona · May 11, 2022

Hi,
has this ever happened before or only once? You might want to consider upgrading to the 5.15 kernel (or even fully to Proxmox VE 7.2) as there was still a lot of work going into io_uring between 5.13 and 5.15.

freebee · May 11, 2022

Fabian_E said:
Hi,
has this ever happened before or only once? You might want to consider upgrading to the 5.15 kernel (or even fully to Proxmox VE 7.2) as there was still a lot of work going into io_uring between 5.13 and 5.15.

Never happened before, just only once. Context: the server has a higher load, on 80% processor usage and 100%
I visited the link about upgrading the kernel to 5.15 and saw many related problems.
Here in apt search, I have 5.10 and 5.11. Can I use one of this two ?.
I edit all VMS and change io_uring to Theads.

t.lamprecht · May 11, 2022

freebee said:
I visited the link about upgrading the kernel to 5.15 and saw many related problems.

Sorry, but there's just not a single mentioning of io_uring issues in that linked thread, really nothing.
io_uring is something with quite a few fixes in newer kernel, going back to even older ones will make things quite often worse in that front.

Please, upgrade to 5.15, older kernels just are not supported anymore for Proxmox VE 7.x.

If you still run into trouble you can always boot an older kernel, if really wanted
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_kernel_pin

Search

Search

Possible bug on io_uring

freebee

Active Member

fiona

Proxmox Staff Member

freebee

Active Member

t.lamprecht

Proxmox Staff Member