Upgrade to 8.1.3 increased IO delay by far

juppzupp

Member
May 8, 2020
14
2
8
52
Hi,
somewhere between comment and question. Still trying to identify the problem.
I had a machine on 7.4 with the root disk starting to throw smart errors.
Proxmox zfs on a nvme. did a zfs replace to a new nvme connected via usb.
replaced the disk, it booted fine. as the system was in maintenance already, I took opportunity and upgraded to 8.1.3.
system boots fine. however....I experience now high IO delay on the data disk which is also zfs, but sata.
I do not remember seeing that before, I do not remember seeing io delay at all with same hw/containers running.
the process in question is writing with about 300K/second.
Any ideas ?

also see this message from time to time :
2023-12-23T12:30:04.825418+01:00 h3plus1 kernel: [499151.914375] __schedule+0x3fd/0x1450 2023-12-23T12:30:04.850365+01:00 h3plus1 kernel: [499151.939325] ? __pfx_autoremove_wake_function+0x10/0x10 2023-12-23T12:30:04.855675+01:00 h3plus1 kernel: [499151.944651] __cv_timedwait_io+0x19/0x30 [spl] 2023-12-23T12:30:04.895222+01:00 h3plus1 kernel: [499151.984201] thread_generic_wrapper+0x5c/0x70 [spl] 2023-12-23T12:30:04.907332+01:00 h3plus1 kernel: [499151.996308] ret_from_fork+0x44/0x70 2023-12-23T12:30:04.972083+01:00 h3plus1 kernel: [499152.060991] cv_wait_common+0x109/0x140 [spl] 2023-12-23T12:30:04.976584+01:00 h3plus1 kernel: [499152.065491] ? __pfx_autoremove_wake_function+0x10/0x10 2023-12-23T12:30:05.007805+01:00 h3plus1 kernel: [499152.096658] do_syscall_64+0x58/0x90 2023-12-23T12:30:05.020304+01:00 h3plus1 kernel: [499152.109152] ? irqentry_exit_to_user_mode+0x17/0x20 2023-12-23T12:30:05.029225+01:00 h3plus1 kernel: [499152.118044] ? exc_page_fault+0x94/0x1b0 2023-12-23T12:30:05.033120+01:00 h3plus1 kernel: [499152.122088] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 2023-12-23T12:30:05.042042+01:00 h3plus1 kernel: [499152.131008] RSP: 002b:00007f79d5eb1920 EFLAGS: 00000202 ORIG_RAX: 000000000000004b 2023-12-23T12:30:05.049715+01:00 h3plus1 kernel: [499152.138685] RAX: ffffffffffffffda RBX: 00007f7a902f0090 RCX: 00007f7ab54cc77b 2023-12-23T12:30:05.056998+01:00 h3plus1 kernel: [499152.145958] RDX: 0000000000000000 RSI: 0000000000000081 RDI: 000000000000002f 2023-12-23T12:30:05.071485+01:00 h3plus1 kernel: [499152.160485] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f7a902f00b0 2023-12-23T12:30:05.078766+01:00 h3plus1 kernel: [499152.167730] R13: 00007f7a902f00e8 R14: 00007f7ab54457d0 R15: 00007f7aa31730d0 2023-12-23T12:30:05.085992+01:00 h3plus1 kernel: [499152.174975] </TASK> 2023-12-23T12:32:05.636707+01:00 h3plus1 kernel: [499272.723474] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2023-12-23T12:32:05.644656+01:00 h3plus1 kernel: [499272.731413] task:TFWSync state:D stack:0 pid:3026791 ppid:2130917 flags:0x00004002 2023-12-23T12:32:05.655927+01:00 h3plus1 kernel: [499272.742666] <TASK> 2023-12-23T12:32:05.678958+01:00 h3plus1 kernel: [499272.765701] ? __pfx_autoremove_wake_function+0x10/0x10 2023-12-23T12:32:05.684291+01:00 h3plus1 kernel: [499272.771027] __cv_wait+0x15/0x30 [spl] 2023-12-23T12:32:05.693077+01:00 h3plus1 kernel: [499272.774923] zil_commit_impl+0x2d0/0x1260 [zfs] 2023-12-23T12:32:05.718533+01:00 h3plus1 kernel: [499272.805274] ? do_syscall_64+0x67/0x90 2023-12-23T12:32:05.727386+01:00 h3plus1 kernel: [499272.814129] ? irqentry_exit+0x43/0x50 2023-12-23T12:32:05.731245+01:00 h3plus1 kernel: [499272.817979] ? exc_page_fault+0x94/0x1b0 2023-12-23T12:32:05.751843+01:00 h3plus1 kernel: [499272.838578] RAX: ffffffffffffffda RBX: 00007f7a902f0090 RCX: 00007f7ab54cc77b 2023-12-23T12:32:05.759094+01:00 h3plus1 kernel: [499272.845829] RDX: 0000000000000000 RSI: 0000000000000081 RDI: 000000000000002f 2023-12-23T12:32:05.766341+01:00 h3plus1 kernel: [499272.853078] RBP: 00007f7a902f00b1 R08: 0000000000000000 R09: 0000000000000000 2023-12-23T12:32:05.773573+01:00 h3plus1 kernel: [499272.860307] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f7a902f00b0 2023-12-23T12:32:05.780822+01:00 h3plus1 kernel: [499272.867557] R13: 00007f7a902f00e8 R14: 00007f7ab54457d0 R15: 00007f7aa31730d0 2023-12-23T12:32:05.788076+01:00 h3plus1 kernel: [499272.874816] </TASK>
 

Attachments

  • Screenshot 2023-12-23 17.36.51.png
    Screenshot 2023-12-23 17.36.51.png
    103.1 KB · Views: 6
Last edited:
We ran into a similar issue after upgrading to 8.1.3 Linux 6.5.11-7-pve on some nodes.
  • Node1:
    Controller: Adaptec Series 8 12G SAS/PCIe 3
    Disks: 8x ata-Samsung_SSD_850_PRO_1TB
    Using ZFS
  • Node2:
    Controller: Adaptec Series 8 12G SAS/PCIe 3
    Disks: 8x Intel INTEL SSDSC2KB960G
    Using LVM
Probably it's an issue related to the hw-controller in our case.

1705325992396.png
1705326029907.png

After pinning the kernel to an older version the I/O seems to be OK again:
Code:
pve-efiboot-tool kernel pin 6.2.16-20-pve
reboot
 
Last edited: