Upgrade to 8.1.3 increased IO delay by far

juppzupp · Dec 23, 2023

Hi,
somewhere between comment and question. Still trying to identify the problem.
I had a machine on 7.4 with the root disk starting to throw smart errors.
Proxmox zfs on a nvme. did a zfs replace to a new nvme connected via usb.
replaced the disk, it booted fine. as the system was in maintenance already, I took opportunity and upgraded to 8.1.3.
system boots fine. however....I experience now high IO delay on the data disk which is also zfs, but sata.
I do not remember seeing that before, I do not remember seeing io delay at all with same hw/containers running.
the process in question is writing with about 300K/second.
Any ideas ?

also see this message from time to time :


2023-12-23T12:30:04.825418+01:00 h3plus1 kernel: [499151.914375]  __schedule+0x3fd/0x1450
2023-12-23T12:30:04.850365+01:00 h3plus1 kernel: [499151.939325]  ? __pfx_autoremove_wake_function+0x10/0x10
2023-12-23T12:30:04.855675+01:00 h3plus1 kernel: [499151.944651]  __cv_timedwait_io+0x19/0x30 [spl]
2023-12-23T12:30:04.895222+01:00 h3plus1 kernel: [499151.984201]  thread_generic_wrapper+0x5c/0x70 [spl]
2023-12-23T12:30:04.907332+01:00 h3plus1 kernel: [499151.996308]  ret_from_fork+0x44/0x70
2023-12-23T12:30:04.972083+01:00 h3plus1 kernel: [499152.060991]  cv_wait_common+0x109/0x140 [spl]
2023-12-23T12:30:04.976584+01:00 h3plus1 kernel: [499152.065491]  ? __pfx_autoremove_wake_function+0x10/0x10
2023-12-23T12:30:05.007805+01:00 h3plus1 kernel: [499152.096658]  do_syscall_64+0x58/0x90
2023-12-23T12:30:05.020304+01:00 h3plus1 kernel: [499152.109152]  ? irqentry_exit_to_user_mode+0x17/0x20
2023-12-23T12:30:05.029225+01:00 h3plus1 kernel: [499152.118044]  ? exc_page_fault+0x94/0x1b0
2023-12-23T12:30:05.033120+01:00 h3plus1 kernel: [499152.122088]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
2023-12-23T12:30:05.042042+01:00 h3plus1 kernel: [499152.131008] RSP: 002b:00007f79d5eb1920 EFLAGS: 00000202 ORIG_RAX: 000000000000004b
2023-12-23T12:30:05.049715+01:00 h3plus1 kernel: [499152.138685] RAX: ffffffffffffffda RBX: 00007f7a902f0090 RCX: 00007f7ab54cc77b
2023-12-23T12:30:05.056998+01:00 h3plus1 kernel: [499152.145958] RDX: 0000000000000000 RSI: 0000000000000081 RDI: 000000000000002f
2023-12-23T12:30:05.071485+01:00 h3plus1 kernel: [499152.160485] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f7a902f00b0
2023-12-23T12:30:05.078766+01:00 h3plus1 kernel: [499152.167730] R13: 00007f7a902f00e8 R14: 00007f7ab54457d0 R15: 00007f7aa31730d0
2023-12-23T12:30:05.085992+01:00 h3plus1 kernel: [499152.174975]  </TASK>
2023-12-23T12:32:05.636707+01:00 h3plus1 kernel: [499272.723474] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2023-12-23T12:32:05.644656+01:00 h3plus1 kernel: [499272.731413] task:TFWSync         state:D stack:0     pid:3026791 ppid:2130917 flags:0x00004002
2023-12-23T12:32:05.655927+01:00 h3plus1 kernel: [499272.742666]  <TASK>
2023-12-23T12:32:05.678958+01:00 h3plus1 kernel: [499272.765701]  ? __pfx_autoremove_wake_function+0x10/0x10
2023-12-23T12:32:05.684291+01:00 h3plus1 kernel: [499272.771027]  __cv_wait+0x15/0x30 [spl]
2023-12-23T12:32:05.693077+01:00 h3plus1 kernel: [499272.774923]  zil_commit_impl+0x2d0/0x1260 [zfs]
2023-12-23T12:32:05.718533+01:00 h3plus1 kernel: [499272.805274]  ? do_syscall_64+0x67/0x90
2023-12-23T12:32:05.727386+01:00 h3plus1 kernel: [499272.814129]  ? irqentry_exit+0x43/0x50
2023-12-23T12:32:05.731245+01:00 h3plus1 kernel: [499272.817979]  ? exc_page_fault+0x94/0x1b0
2023-12-23T12:32:05.751843+01:00 h3plus1 kernel: [499272.838578] RAX: ffffffffffffffda RBX: 00007f7a902f0090 RCX: 00007f7ab54cc77b
2023-12-23T12:32:05.759094+01:00 h3plus1 kernel: [499272.845829] RDX: 0000000000000000 RSI: 0000000000000081 RDI: 000000000000002f
2023-12-23T12:32:05.766341+01:00 h3plus1 kernel: [499272.853078] RBP: 00007f7a902f00b1 R08: 0000000000000000 R09: 0000000000000000
2023-12-23T12:32:05.773573+01:00 h3plus1 kernel: [499272.860307] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f7a902f00b0
2023-12-23T12:32:05.780822+01:00 h3plus1 kernel: [499272.867557] R13: 00007f7a902f00e8 R14: 00007f7ab54457d0 R15: 00007f7aa31730d0
2023-12-23T12:32:05.788076+01:00 h3plus1 kernel: [499272.874816]  </TASK>

NiceRath · Jan 15, 2024

We ran into a similar issue after upgrading to 8.1.3 Linux 6.5.11-7-pve on some nodes.

Node1:
Controller: Adaptec Series 8 12G SAS/PCIe 3
Disks: 8x ata-Samsung_SSD_850_PRO_1TB
Using ZFS
Node2:
Controller: Adaptec Series 8 12G SAS/PCIe 3
Disks: 8x Intel INTEL SSDSC2KB960G
Using LVM

Probably it's an issue related to the hw-controller in our case.

After pinning the kernel to an older version the I/O seems to be OK again:

Code:

pve-efiboot-tool kernel pin 6.2.16-20-pve
reboot

Search

Search

Upgrade to 8.1.3 increased IO delay by far

juppzupp

Member

Attachments

NiceRath

New Member