Upgrade to 8.1.3 increased IO delay by far

juppzupp

Member
May 8, 2020
14
2
8
50
Hi,
somewhere between comment and question. Still trying to identify the problem.
I had a machine on 7.4 with the root disk starting to throw smart errors.
Proxmox zfs on a nvme. did a zfs replace to a new nvme connected via usb.
replaced the disk, it booted fine. as the system was in maintenance already, I took opportunity and upgraded to 8.1.3.
system boots fine. however....I experience now high IO delay on the data disk which is also zfs, but sata.
I do not remember seeing that before, I do not remember seeing io delay at all with same hw/containers running.
the process in question is writing with about 300K/second.
Any ideas ?

also see this message from time to time :
2023-12-23T12:30:04.825418+01:00 h3plus1 kernel: [499151.914375] __schedule+0x3fd/0x1450 2023-12-23T12:30:04.850365+01:00 h3plus1 kernel: [499151.939325] ? __pfx_autoremove_wake_function+0x10/0x10 2023-12-23T12:30:04.855675+01:00 h3plus1 kernel: [499151.944651] __cv_timedwait_io+0x19/0x30 [spl] 2023-12-23T12:30:04.895222+01:00 h3plus1 kernel: [499151.984201] thread_generic_wrapper+0x5c/0x70 [spl] 2023-12-23T12:30:04.907332+01:00 h3plus1 kernel: [499151.996308] ret_from_fork+0x44/0x70 2023-12-23T12:30:04.972083+01:00 h3plus1 kernel: [499152.060991] cv_wait_common+0x109/0x140 [spl] 2023-12-23T12:30:04.976584+01:00 h3plus1 kernel: [499152.065491] ? __pfx_autoremove_wake_function+0x10/0x10 2023-12-23T12:30:05.007805+01:00 h3plus1 kernel: [499152.096658] do_syscall_64+0x58/0x90 2023-12-23T12:30:05.020304+01:00 h3plus1 kernel: [499152.109152] ? irqentry_exit_to_user_mode+0x17/0x20 2023-12-23T12:30:05.029225+01:00 h3plus1 kernel: [499152.118044] ? exc_page_fault+0x94/0x1b0 2023-12-23T12:30:05.033120+01:00 h3plus1 kernel: [499152.122088] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 2023-12-23T12:30:05.042042+01:00 h3plus1 kernel: [499152.131008] RSP: 002b:00007f79d5eb1920 EFLAGS: 00000202 ORIG_RAX: 000000000000004b 2023-12-23T12:30:05.049715+01:00 h3plus1 kernel: [499152.138685] RAX: ffffffffffffffda RBX: 00007f7a902f0090 RCX: 00007f7ab54cc77b 2023-12-23T12:30:05.056998+01:00 h3plus1 kernel: [499152.145958] RDX: 0000000000000000 RSI: 0000000000000081 RDI: 000000000000002f 2023-12-23T12:30:05.071485+01:00 h3plus1 kernel: [499152.160485] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f7a902f00b0 2023-12-23T12:30:05.078766+01:00 h3plus1 kernel: [499152.167730] R13: 00007f7a902f00e8 R14: 00007f7ab54457d0 R15: 00007f7aa31730d0 2023-12-23T12:30:05.085992+01:00 h3plus1 kernel: [499152.174975] </TASK> 2023-12-23T12:32:05.636707+01:00 h3plus1 kernel: [499272.723474] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. 2023-12-23T12:32:05.644656+01:00 h3plus1 kernel: [499272.731413] task:TFWSync state:D stack:0 pid:3026791 ppid:2130917 flags:0x00004002 2023-12-23T12:32:05.655927+01:00 h3plus1 kernel: [499272.742666] <TASK> 2023-12-23T12:32:05.678958+01:00 h3plus1 kernel: [499272.765701] ? __pfx_autoremove_wake_function+0x10/0x10 2023-12-23T12:32:05.684291+01:00 h3plus1 kernel: [499272.771027] __cv_wait+0x15/0x30 [spl] 2023-12-23T12:32:05.693077+01:00 h3plus1 kernel: [499272.774923] zil_commit_impl+0x2d0/0x1260 [zfs] 2023-12-23T12:32:05.718533+01:00 h3plus1 kernel: [499272.805274] ? do_syscall_64+0x67/0x90 2023-12-23T12:32:05.727386+01:00 h3plus1 kernel: [499272.814129] ? irqentry_exit+0x43/0x50 2023-12-23T12:32:05.731245+01:00 h3plus1 kernel: [499272.817979] ? exc_page_fault+0x94/0x1b0 2023-12-23T12:32:05.751843+01:00 h3plus1 kernel: [499272.838578] RAX: ffffffffffffffda RBX: 00007f7a902f0090 RCX: 00007f7ab54cc77b 2023-12-23T12:32:05.759094+01:00 h3plus1 kernel: [499272.845829] RDX: 0000000000000000 RSI: 0000000000000081 RDI: 000000000000002f 2023-12-23T12:32:05.766341+01:00 h3plus1 kernel: [499272.853078] RBP: 00007f7a902f00b1 R08: 0000000000000000 R09: 0000000000000000 2023-12-23T12:32:05.773573+01:00 h3plus1 kernel: [499272.860307] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f7a902f00b0 2023-12-23T12:32:05.780822+01:00 h3plus1 kernel: [499272.867557] R13: 00007f7a902f00e8 R14: 00007f7ab54457d0 R15: 00007f7aa31730d0 2023-12-23T12:32:05.788076+01:00 h3plus1 kernel: [499272.874816] </TASK>
 

Attachments

  • Screenshot 2023-12-23 17.36.51.png
    Screenshot 2023-12-23 17.36.51.png
    103.1 KB · Views: 6
Last edited:
We ran into a similar issue after upgrading to 8.1.3 Linux 6.5.11-7-pve on some nodes.
  • Node1:
    Controller: Adaptec Series 8 12G SAS/PCIe 3
    Disks: 8x ata-Samsung_SSD_850_PRO_1TB
    Using ZFS
  • Node2:
    Controller: Adaptec Series 8 12G SAS/PCIe 3
    Disks: 8x Intel INTEL SSDSC2KB960G
    Using LVM
Probably it's an issue related to the hw-controller in our case.

1705325992396.png
1705326029907.png

After pinning the kernel to an older version the I/O seems to be OK again:
Code:
pve-efiboot-tool kernel pin 6.2.16-20-pve
reboot
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!