Tainted Kernel P IO 5.44-2-pve

HiltonT

New Member
Jul 12, 2020
9
1
1
45
Fresh install of 6.2-4 ISO on a Dell R510, freshly updated using the interface to 6.2-10 with the no-subscription updates, ZFS mirror, 1 * Win10 (2004) guest, nd the following error - and when this error occurs, there is a "?" over everything under the Datacenter for this (single) node, the VM and the storage...

Jul 15 18:08:00 qrk-pve-1 systemd[1]: Starting Proxmox VE replication runner...
Jul 15 18:08:01 qrk-pve-1 systemd[1]: pvesr.service: Succeeded.
Jul 15 18:08:01 qrk-pve-1 systemd[1]: Started Proxmox VE replication runner.
Jul 15 18:08:32 qrk-pve-1 kernel: INFO: task lvs:16795 blocked for more than 966 seconds.
Jul 15 18:08:32 qrk-pve-1 kernel: Tainted: P IO 5.4.44-2-pve #1
Jul 15 18:08:32 qrk-pve-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 15 18:08:32 qrk-pve-1 kernel: lvs D 0 16795 1665 0x00000000
Jul 15 18:08:32 qrk-pve-1 kernel: Call Trace:
Jul 15 18:08:32 qrk-pve-1 kernel: __schedule+0x2e6/0x6f0
Jul 15 18:08:32 qrk-pve-1 kernel: schedule+0x33/0xa0
Jul 15 18:08:32 qrk-pve-1 kernel: schedule_timeout+0x205/0x300
Jul 15 18:08:32 qrk-pve-1 kernel: ? ttwu_do_activate+0x5a/0x70
Jul 15 18:08:32 qrk-pve-1 kernel: wait_for_completion+0xb7/0x140
Jul 15 18:08:32 qrk-pve-1 kernel: ? wake_up_q+0x80/0x80
Jul 15 18:08:32 qrk-pve-1 kernel: __flush_work+0x131/0x1e0
Jul 15 18:08:32 qrk-pve-1 kernel: ? worker_detach_from_pool+0xb0/0xb0
Jul 15 18:08:32 qrk-pve-1 kernel: ? work_busy+0x90/0x90
Jul 15 18:08:32 qrk-pve-1 kernel: __cancel_work_timer+0x115/0x190
Jul 15 18:08:32 qrk-pve-1 kernel: ? exact_lock+0x11/0x20
Jul 15 18:08:32 qrk-pve-1 kernel: ? kobj_lookup+0xec/0x160
Jul 15 18:08:32 qrk-pve-1 kernel: cancel_delayed_work_sync+0x13/0x20
Jul 15 18:08:32 qrk-pve-1 kernel: disk_block_events+0x78/0x80
Jul 15 18:08:32 qrk-pve-1 kernel: __blkdev_get+0x72/0x560
Jul 15 18:08:32 qrk-pve-1 kernel: blkdev_get+0xe0/0x140
Jul 15 18:08:32 qrk-pve-1 kernel: ? blkdev_get_by_dev+0x50/0x50
Jul 15 18:08:32 qrk-pve-1 kernel: blkdev_open+0x87/0xa0
Jul 15 18:08:32 qrk-pve-1 kernel: do_dentry_open+0x143/0x3a0
Jul 15 18:08:32 qrk-pve-1 kernel: vfs_open+0x2d/0x30
Jul 15 18:08:32 qrk-pve-1 kernel: path_openat+0x2e9/0x16f0
Jul 15 18:08:32 qrk-pve-1 kernel: ? aio_read+0xfe/0x150
Jul 15 18:08:32 qrk-pve-1 kernel: ? __do_page_fault+0x250/0x4c0
Jul 15 18:08:32 qrk-pve-1 kernel: do_filp_open+0x93/0x100
Jul 15 18:08:32 qrk-pve-1 kernel: ? __alloc_fd+0x46/0x150
Jul 15 18:08:32 qrk-pve-1 kernel: do_sys_open+0x177/0x280
Jul 15 18:08:32 qrk-pve-1 kernel: ? __x64_sys_io_submit+0xa9/0x190
Jul 15 18:08:32 qrk-pve-1 kernel: __x64_sys_openat+0x20/0x30
Jul 15 18:08:32 qrk-pve-1 kernel: do_syscall_64+0x57/0x190
Jul 15 18:08:32 qrk-pve-1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 15 18:08:32 qrk-pve-1 kernel: RIP: 0033:0x7f3aa52d91ae
Jul 15 18:08:32 qrk-pve-1 kernel: Code: Bad RIP value.
Jul 15 18:08:32 qrk-pve-1 kernel: RSP: 002b:00007ffd43da9b90 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Jul 15 18:08:32 qrk-pve-1 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3aa52d91ae
Jul 15 18:08:32 qrk-pve-1 kernel: RDX: 0000000000044000 RSI: 0000555b2ccf7e08 RDI: 00000000ffffff9c
Jul 15 18:08:32 qrk-pve-1 kernel: RBP: 00007ffd43da9cf0 R08: 0000555b2cf21000 R09: 0000000000000000
Jul 15 18:08:32 qrk-pve-1 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd43daae6f
Jul 15 18:08:32 qrk-pve-1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Jul 15 18:08:32 qrk-pve-1 kernel: INFO: task lvs:32164 blocked for more than 120 seconds.
Jul 15 18:08:32 qrk-pve-1 kernel: Tainted: P IO 5.4.44-2-pve #1
Jul 15 18:08:32 qrk-pve-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 15 18:08:32 qrk-pve-1 kernel: lvs D 0 32164 1690 0x00000000
Jul 15 18:08:32 qrk-pve-1 kernel: Call Trace:
Jul 15 18:08:32 qrk-pve-1 kernel: __schedule+0x2e6/0x6f0
Jul 15 18:08:32 qrk-pve-1 kernel: ? __switch_to_asm+0x34/0x70
Jul 15 18:08:32 qrk-pve-1 kernel: schedule+0x33/0xa0
Jul 15 18:08:32 qrk-pve-1 kernel: schedule_preempt_disabled+0xe/0x10
Jul 15 18:08:32 qrk-pve-1 kernel: __mutex_lock.isra.10+0x2c9/0x4c0
Jul 15 18:08:32 qrk-pve-1 kernel: __mutex_lock_slowpath+0x13/0x20
Jul 15 18:08:32 qrk-pve-1 kernel: mutex_lock+0x2c/0x30
Jul 15 18:08:32 qrk-pve-1 kernel: disk_block_events+0x31/0x80
Jul 15 18:08:32 qrk-pve-1 kernel: __blkdev_get+0x72/0x560
Jul 15 18:08:32 qrk-pve-1 kernel: blkdev_get+0xe0/0x140
Jul 15 18:08:32 qrk-pve-1 kernel: ? blkdev_get_by_dev+0x50/0x50
Jul 15 18:08:32 qrk-pve-1 kernel: blkdev_open+0x87/0xa0
Jul 15 18:08:32 qrk-pve-1 kernel: do_dentry_open+0x143/0x3a0
Jul 15 18:08:32 qrk-pve-1 kernel: vfs_open+0x2d/0x30
Jul 15 18:08:32 qrk-pve-1 kernel: path_openat+0x2e9/0x16f0
Jul 15 18:08:32 qrk-pve-1 kernel: ? filename_lookup.part.60+0xe0/0x170
Jul 15 18:08:32 qrk-pve-1 kernel: do_filp_open+0x93/0x100
Jul 15 18:08:32 qrk-pve-1 kernel: ? __alloc_fd+0x46/0x150
Jul 15 18:08:32 qrk-pve-1 kernel: do_sys_open+0x177/0x280
Jul 15 18:08:32 qrk-pve-1 kernel: __x64_sys_openat+0x20/0x30
Jul 15 18:08:32 qrk-pve-1 kernel: do_syscall_64+0x57/0x190
Jul 15 18:08:32 qrk-pve-1 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul 15 18:08:32 qrk-pve-1 kernel: RIP: 0033:0x7fd73d3201ae
Jul 15 18:08:32 qrk-pve-1 kernel: Code: Bad RIP value.
Jul 15 18:08:32 qrk-pve-1 kernel: RSP: 002b:00007ffeb1949350 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
Jul 15 18:08:32 qrk-pve-1 kernel: RAX: ffffffffffffffda RBX: 00007ffeb194aed4 RCX: 00007fd73d3201ae
Jul 15 18:08:32 qrk-pve-1 kernel: RDX: 0000000000044000 RSI: 0000564c06b09b68 RDI: 00000000ffffff9c
Jul 15 18:08:32 qrk-pve-1 kernel: RBP: 00007ffeb19494b0 R08: 0000564c06b76b90 R09: 0000000000000000
Jul 15 18:08:32 qrk-pve-1 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffeb194af22
Jul 15 18:08:32 qrk-pve-1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000

And this error repeats...
 
The error you are seeing is not the "tainted" part, but the "INFO: task lvs:16795 blocked for more than 966 seconds."

This means that a task is waiting for some ressource from the kernel to become available, and that is not happening. Since it's LVS in your case, I'm going to take a guess and say some hard drive is faulty or misconfigured.
 
  • Like
Reactions: HiltonT
I had assumed that this was a failing HDD issue - I'm heading to the location where this server is this afternoon and will be binning the existing HDDs and replacing them all. They are old and small, and I wouldn't be surprised if they were failing - the LVM is on the PERC 630 controller in a RAID-1 array, but as we know, RAID isn't all that good at telling us about failing drives, unlike ZFS can (obviously, ZFS isn't on drives in any RAID configuration).

I'll do a fresh Proxmox install on this box this evening and see how it goes from there.