Kernel Oops with kworker getting tainted.

I dig some additional digging on this since my PVE hosts were already mounting the NFS share as V3. I had a container on the offending box that was mounting a share using NFS v4. I modified this mount to use v3 and then updated to the latest 5.3.18-1 kernel and everything has been running smooth for 24 hours now. I will be interested to track this issue to see when v4 becomes workable again.

After disabling NFS4 mount I had also no issues.
 
we also hit this problem:

Code:
Feb 25 01:49:57 hv-vm-01 kernel:[123814.413163] #PF: supervisor read access in kernel mode
Feb 25 01:49:57 hv-vm-01 kernel:[123814.413735] #PF: error_code(0x0000) - not-present page
Feb 25 01:49:57 hv-vm-01 kernel:[123814.414312] PGD 0 P4D 0
Feb 25 01:49:57 hv-vm-01 kernel:[123814.414989] Oops: 0000 [#1] SMP PTI
Feb 25 01:49:57 hv-vm-01 kernel:[123814.415502] CPU: 28 PID: 19508 Comm: kworker/28:1 Tainted: P           O      5.3.13-3-pve #1
Feb 25 01:49:57 hv-vm-01 kernel:[123814.416188] Hardware name: Cisco Systems Inc UCSC-C220-M4S/UCSC-C220-M4S, BIOS C220M4.3.0.4b.0.061018230606/10/2018
Feb 25 01:49:57 hv-vm-01 kernel:[123814.417476] RIP: 0010:keyring_gc_check_iterator+0x30/0x40
Feb 25 01:49:57 hv-vm-01 kernel:[123814.418235] Code: 4883 e7 fc b8 010000004889 e5 f6 8780000000217519488b57584839167c054885 d2 7f0b488b87 a0 000000 <0f> b6 40145d c3 662e0f
1f8400000000000f1f44000055
Feb 25 01:49:57 hv-vm-01 kernel:[123814.424717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 25 01:49:57 hv-vm-01 kernel:[123814.425422] CR2: 0000000000000014 CR3: 00000007b78d4006 CR4: 00000000003626e0
Feb 25 01:49:57 hv-vm-01 kernel:[123814.426132] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 25 01:49:57 hv-vm-01 kernel:[123814.426771] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 25 01:49:57 hv-vm-01 kernel:[123814.427396] Call Trace:
Feb 25 01:49:57 hv-vm-01 kernel:[123814.428123]  assoc_array_subtree_iterate+0x5c/0x100
Feb 25 01:49:57 hv-vm-01 kernel:[123814.428919]  assoc_array_iterate+0x19/0x20
Feb 25 01:49:57 hv-vm-01 kernel:[123814.429506]  keyring_gc+0x43/0x80
Feb 25 01:49:57 hv-vm-01 kernel:[123814.430210]  key_garbage_collector+0x35a/0x400
Feb 25 01:49:57 hv-vm-01 kernel:[123814.430863]  process_one_work+0x20f/0x3d0
Feb 25 01:49:57 hv-vm-01 kernel:[123814.431496]  worker_thread+0x34/0x400
Feb 25 01:49:57 hv-vm-01 kernel:[123814.529667] R13: ffffffff9d427dd0 R14: ffff8d6d818d5b00 R15: ffff8d67727be6d0
Feb 25 01:49:57 hv-vm-01 kernel:[123814.529669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 25 01:49:57 hv-vm-01 kernel:[123814.529671] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000


Feb 28 22:01:35 hv-vm-01 kernel:[306609.246582] BUG: kernel NULL pointer dereference, address: 0000000000000014
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246649] #PF: error_code(0x0000) - not-present page
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246666] PGD 0 P4D 0
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246693] CPU: 7 PID: 15381 Comm: kworker/7:3 Tainted: P           O      5.3.13-3-pve #1
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246755] Workqueue: events key_garbage_collector
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246793] Code: 4883 e7 fc b8 010000004889 e5 f6 8780000000217519488b57584839167c054885 d2 7f0b488b87 a0 000000 <0f> b6 40145d c3 662e0f
1f8400000000000f1f44000055
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246883] RAX: 0000000000000000 RBX: ffff89143b936480 RCX: ffffaf9b89e5be20
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246991] RBP: ffffaf9b89e5bdb8 R08: 0000000000000010 R09: 0000000000000000
Feb 28 22:01:35 hv-vm-01 kernel:[306609.247037] R13: ffffffff9bc27dd0 R14: ffff891425af1400 R15: ffff89143b936490


though I'm not sure if it NFS related.
we downgraded to 5.0 stable kernel and so far no problems.
 
we also hit this problem:

Code:
Feb 25 01:49:57 hv-vm-01 kernel:[123814.413163] #PF: supervisor read access in kernel mode
Feb 25 01:49:57 hv-vm-01 kernel:[123814.413735] #PF: error_code(0x0000) - not-present page
Feb 25 01:49:57 hv-vm-01 kernel:[123814.414312] PGD 0 P4D 0
Feb 25 01:49:57 hv-vm-01 kernel:[123814.414989] Oops: 0000 [#1] SMP PTI
Feb 25 01:49:57 hv-vm-01 kernel:[123814.415502] CPU: 28 PID: 19508 Comm: kworker/28:1 Tainted: P           O      5.3.13-3-pve #1
Feb 25 01:49:57 hv-vm-01 kernel:[123814.416188] Hardware name: Cisco Systems Inc UCSC-C220-M4S/UCSC-C220-M4S, BIOS C220M4.3.0.4b.0.061018230606/10/2018
Feb 25 01:49:57 hv-vm-01 kernel:[123814.417476] RIP: 0010:keyring_gc_check_iterator+0x30/0x40
Feb 25 01:49:57 hv-vm-01 kernel:[123814.418235] Code: 4883 e7 fc b8 010000004889 e5 f6 8780000000217519488b57584839167c054885 d2 7f0b488b87 a0 000000 <0f> b6 40145d c3 662e0f
1f8400000000000f1f44000055
Feb 25 01:49:57 hv-vm-01 kernel:[123814.424717] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 25 01:49:57 hv-vm-01 kernel:[123814.425422] CR2: 0000000000000014 CR3: 00000007b78d4006 CR4: 00000000003626e0
Feb 25 01:49:57 hv-vm-01 kernel:[123814.426132] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 25 01:49:57 hv-vm-01 kernel:[123814.426771] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Feb 25 01:49:57 hv-vm-01 kernel:[123814.427396] Call Trace:
Feb 25 01:49:57 hv-vm-01 kernel:[123814.428123]  assoc_array_subtree_iterate+0x5c/0x100
Feb 25 01:49:57 hv-vm-01 kernel:[123814.428919]  assoc_array_iterate+0x19/0x20
Feb 25 01:49:57 hv-vm-01 kernel:[123814.429506]  keyring_gc+0x43/0x80
Feb 25 01:49:57 hv-vm-01 kernel:[123814.430210]  key_garbage_collector+0x35a/0x400
Feb 25 01:49:57 hv-vm-01 kernel:[123814.430863]  process_one_work+0x20f/0x3d0
Feb 25 01:49:57 hv-vm-01 kernel:[123814.431496]  worker_thread+0x34/0x400
Feb 25 01:49:57 hv-vm-01 kernel:[123814.529667] R13: ffffffff9d427dd0 R14: ffff8d6d818d5b00 R15: ffff8d67727be6d0
Feb 25 01:49:57 hv-vm-01 kernel:[123814.529669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 25 01:49:57 hv-vm-01 kernel:[123814.529671] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000


Feb 28 22:01:35 hv-vm-01 kernel:[306609.246582] BUG: kernel NULL pointer dereference, address: 0000000000000014
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246649] #PF: error_code(0x0000) - not-present page
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246666] PGD 0 P4D 0
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246693] CPU: 7 PID: 15381 Comm: kworker/7:3 Tainted: P           O      5.3.13-3-pve #1
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246755] Workqueue: events key_garbage_collector
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246793] Code: 4883 e7 fc b8 010000004889 e5 f6 8780000000217519488b57584839167c054885 d2 7f0b488b87 a0 000000 <0f> b6 40145d c3 662e0f
1f8400000000000f1f44000055
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246883] RAX: 0000000000000000 RBX: ffff89143b936480 RCX: ffffaf9b89e5be20
Feb 28 22:01:35 hv-vm-01 kernel:[306609.246991] RBP: ffffaf9b89e5bdb8 R08: 0000000000000010 R09: 0000000000000000
Feb 28 22:01:35 hv-vm-01 kernel:[306609.247037] R13: ffffffff9bc27dd0 R14: ffff891425af1400 R15: ffff89143b936490


though I'm not sure if it NFS related.
we downgraded to 5.0 stable kernel and so far no problems.
I had issues with multiple hosts, after changing to NFS v3 no more issues.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!