BUG: Unable to handle kernel paging request

Oct 7, 2018
40
5
13
Hi, I've started using Proxmox about 6 months ago and have been loving it. Last night, I found that my Plex container was not responding. All other VMs and containers were fine (only 2 others on the machine). I was able to login via SSH and found the following in the journal:

Code:
Jan 13 01:24:13 pve1 kernel: BUG: unable to handle kernel paging request at ffffffffc03c3164
Jan 13 01:24:13 pve1 kernel: IP: avl_insert+0x4b/0xd0 [zavl]
Jan 13 01:24:13 pve1 kernel: PGD 49e40e067 P4D 49e40e067 PUD 49e410067 PMD 6312fd067 PTE 62e838061
Jan 13 01:24:13 pve1 kernel: Oops: 0003 [#1] SMP PTI
Jan 13 01:24:13 pve1 kernel: Modules linked in: tcp_diag inet_diag binfmt_misc veth ip_set ip6table_filter ip6_tables cmac arc4 md4 nls_utf8 cifs ccm fscache iptable_filter softdog nfnetlink_log nfnetlink ast intel_powerclamp ttm coretem
Jan 13 01:24:13 pve1 kernel:  e1000e(O) i2c_i801 libahci ptp pps_core
Jan 13 01:24:13 pve1 kernel: CPU: 14 PID: 9883 Comm: Plex Media Serv Tainted: P        W IO     4.15.18-7-pve #1
Jan 13 01:24:13 pve1 kernel: Hardware name: ASUS RS700-E6-RS4/Z8PS-D12-1U, BIOS 0401    01/22/2009
Jan 13 01:24:13 pve1 kernel: RIP: 0010:avl_insert+0x4b/0xd0 [zavl]
Jan 13 01:24:13 pve1 kernel: RSP: 0018:ffffb98d6a11bce0 EFLAGS: 00010282
Jan 13 01:24:13 pve1 kernel: RAX: 0000000000000000 RBX: ffffa09b6cd21600 RCX: ffffffffc03c3165
Jan 13 01:24:13 pve1 kernel: RDX: 0000000000000000 RSI: ffffa09b6cd21608 RDI: ffffa096bf2af948
Jan 13 01:24:13 pve1 kernel: RBP: ffffb98d6a11bd30 R08: ffffffffc03c3164 R09: ffffa09873807180
Jan 13 01:24:13 pve1 kernel: R10: ffffa09b6cd21600 R11: ffffa0981d6071c0 R12: ffffa096bf2af918
Jan 13 01:24:13 pve1 kernel: R13: ffffffffc03c315c R14: 0000000000000000 R15: 0000000000000000
Jan 13 01:24:13 pve1 kernel: FS:  00007fd795bff700(0000) GS:ffffa09873dc0000(0000) knlGS:0000000000000000
Jan 13 01:24:13 pve1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 13 01:24:13 pve1 kernel: CR2: ffffffffc03c3164 CR3: 00000005f56b6000 CR4: 00000000000026e0
Jan 13 01:24:13 pve1 kernel: Call Trace:
Jan 13 01:24:13 pve1 kernel:  ? zfs_range_lock+0x4bf/0x5c0 [zfs]
Jan 13 01:24:13 pve1 kernel:  ? spl_kmem_zalloc+0xa4/0x190 [spl]
Jan 13 01:24:13 pve1 kernel:  ? spl_kmem_zalloc+0xa4/0x190 [spl]
Jan 13 01:24:13 pve1 kernel:  zfs_get_data+0x1b5/0x2a0 [zfs]
Jan 13 01:24:13 pve1 kernel:  zil_commit.part.14+0x451/0x8b0 [zfs]
Jan 13 01:24:13 pve1 kernel:  zil_commit+0x17/0x20 [zfs]
Jan 13 01:24:13 pve1 kernel:  zfs_fsync+0x77/0xf0 [zfs]
Jan 13 01:24:13 pve1 kernel:  zpl_fsync+0x68/0xa0 [zfs]
Jan 13 01:24:13 pve1 kernel:  vfs_fsync_range+0x51/0xb0
Jan 13 01:24:13 pve1 kernel:  do_fsync+0x3d/0x70
Jan 13 01:24:13 pve1 kernel:  SyS_fdatasync+0x13/0x20
Jan 13 01:24:13 pve1 kernel:  do_syscall_64+0x73/0x130
Jan 13 01:24:13 pve1 kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Jan 13 01:24:13 pve1 kernel: RIP: 0033:0x7fd7ec76e2e7
Jan 13 01:24:13 pve1 kernel: RSP: 002b:00007fd795bfd7e0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Jan 13 01:24:13 pve1 kernel: RAX: ffffffffffffffda RBX: 000000000000000a RCX: 00007fd7ec76e2e7
Jan 13 01:24:13 pve1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000a
Jan 13 01:24:13 pve1 kernel: RBP: 00007fd795bfd820 R08: 0000000000000000 R09: 0000000000000022
Jan 13 01:24:13 pve1 kernel: R10: 00007fd795bfe740 R11: 0000000000000293 R12: 00000000000000e4
Jan 13 01:24:13 pve1 kernel: R13: 0000000000f55608 R14: 0000000000000001 R15: 0000000000989680
Jan 13 01:24:13 pve1 kernel: Code: 89 c1 83 e0 04 48 83 c9 01 48 09 c8 4d 85 c0 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 48 89 46 10 0f 84 84 00 00 00 48 63 c2 <49> 89 34 c0 49 8b 50 10 8b 04 85 70 71 42 c0 89 d1 83 e1 03 83
Jan 13 01:24:13 pve1 kernel: RIP: avl_insert+0x4b/0xd0 [zavl] RSP: ffffb98d6a11bce0
Jan 13 01:24:13 pve1 kernel: CR2: ffffffffc03c3164
Jan 13 01:24:13 pve1 kernel: ---[ end trace 2d3ba7e57c5ba825 ]---

It looks like there was an issue with ZFS, perhaps caused by Plex? The only way I could get the container to reset was with a hard reset of the system (container shutdown/stop, host restart/shutdown did not work).
 
Jan 26, 2019
1
0
1
Hi,

I am running into the very same issue the second time now, not sure what version I was running when it happened the first time mid January 2019, I have then upgraded to
Linux node 4.15.18-9-pve #1 SMP PVE 4.15.18-30 (Thu, 15 Nov 2018 13:32:46 +0100) x86_64 GNU/Linux
but it happened again:

Code:
[125945.321160] BUG: unable to handle kernel paging request at ffffffffc07d3164
[125945.321185] IP: avl_insert+0x4b/0xd0 [zavl]
[125945.321195] PGD eb420e067 P4D eb420e067 PUD eb4210067 PMD fdc5df067 PTE fe3edd061
[125945.321240] Oops: 0003 [#1] SMP PTI
[125945.321264] Modules linked in: tcp_diag udp_diag inet_diag xt_multiport binfmt_misc veth ip_set ip6table_filter ip6_tables iptable_filter openvswitch nsh nf_conntrack_ipv6 nf_nat_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_defrag_ipv6 nf_nat nf_conntrack softdog nfnetlink_log nfnetlink dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm zfs(PO) irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel zunicode(PO) aes_x86_64 crypto_simd glue_helper zavl(PO) cryptd icp(PO) intel_cstate i915 intel_rapl_perf drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt snd_pcm snd_timer snd soundcore mei_me mei ie31200_edac pcspkr shpchp intel_pch_thermal serio_raw wmi video acpi_pad
[125945.321457]  mac_hid nfsd auth_rpcgss nfs_acl lockd grace sunrpc zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 xfs libcrc32c btrfs xor zstd_compress raid6_pq uas usb_storage hid_generic psmouse e1000e(O) igb(O) i2c_i801 dca ptp pps_core ahci usbhid libahci hid megaraid_sas
[125945.321555] CPU: 5 PID: 3568354 Comm: Plex Media Serv Tainted: P           O     4.15.18-9-pve #1
[125945.321573] Hardware name: FUJITSU D3417-B1/D3417-B1, BIOS V5.0.0.11 R1.24.0 for D3417-B1x                    11/17/2017
[125945.321596] RIP: 0010:avl_insert+0x4b/0xd0 [zavl]
[125945.321607] RSP: 0018:ffffb153e9bd3ce0 EFLAGS: 00010282
[125945.321619] RAX: 0000000000000000 RBX: ffff9b358f37d400 RCX: ffffffffc07d3165
[125945.321644] RDX: 0000000000000000 RSI: ffff9b358f37d408 RDI: ffff9b2eb3cd70f8
[125945.321659] RBP: ffffb153e9bd3d30 R08: ffffffffc07d3164 R09: ffff9b366d803200
[125945.321674] R10: ffff9b358f37d400 R11: ffff9b365e1f7968 R12: ffff9b2eb3cd70c8
[125945.321690] R13: ffffffffc07d315c R14: 0000000000000000 R15: 0000000000000000
[125945.321705] FS:  00007fd7be3fc700(0000) GS:ffff9b36ae540000(0000) knlGS:0000000000000000
[125945.321722] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[125945.321735] CR2: ffffffffc07d3164 CR3: 0000000ece232006 CR4: 00000000003626e0
[125945.321751] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[125945.321766] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[125945.321782] Call Trace:
[125945.321816]  ? zfs_range_lock+0x4bf/0x5c0 [zfs]
[125945.321830]  ? spl_kmem_zalloc+0xa4/0x190 [spl]
[125945.321843]  ? spl_kmem_zalloc+0xa4/0x190 [spl]
[125945.321874]  zfs_get_data+0x1b5/0x2a0 [zfs]
[125945.321904]  zil_commit.part.14+0x451/0x8b0 [zfs]
[125945.321935]  zil_commit+0x17/0x20 [zfs]
[125945.321963]  zfs_fsync+0x77/0xf0 [zfs]
[125945.321991]  zpl_fsync+0x68/0xa0 [zfs]
[125945.322002]  vfs_fsync_range+0x51/0xb0
[125945.322012]  do_fsync+0x3d/0x70
[125945.322021]  SyS_fdatasync+0x13/0x20
[125945.322031]  do_syscall_64+0x73/0x130
[125945.322041]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[125945.322053] RIP: 0033:0x7fd7f06e881d
[125945.322063] RSP: 002b:00007fd7be3f9ca0 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[125945.322092] RAX: ffffffffffffffda RBX: 00007fd7ec96ad88 RCX: 00007fd7f06e881d
[125945.322107] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000b
[125945.322123] RBP: 00007fd7be3f9cd0 R08: 00007fd7ef4cf838 R09: 0000000000000022
[125945.322151] R10: 0000000000000000 R11: 0000000000000293 R12: 00007fd7cf029fc0
[125945.322179] R13: 00007fd7be3fba50 R14: 0000000000000000 R15: 00007fd7eca14800
[125945.322208] Code: 89 c1 83 e0 04 48 83 c9 01 48 09 c8 4d 85 c0 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 48 89 46 10 0f 84 84 00 00 00 48 63 c2 <49> 89 34 c0 49 8b 50 10 8b 04 85 70 21 ea c0 89 d1 83 e1 03 83
[125945.322289] RIP: avl_insert+0x4b/0xd0 [zavl] RSP: ffffb153e9bd3ce0
[125945.322303] CR2: ffffffffc07d3164
[125945.322312] ---[ end trace 5aebb9a96aaaa55e ]---

I am upgrading to 4.15.18-10-pve now, but seems the issue is there since at least 4.15.18-7-pve as reported by Elliott.

Interestingly it is the very same combination of Plex and ZFS, but probably the Plex process is only running into the issue. I will need to do a hard reboot now as syncing is not coming back anymore and sticks with D in the process list.

Thanks and best,
Torben
 

rxn123talon

New Member
Jan 26, 2019
1
0
1
44
I am seeing the same thing on mine. Plex and ZFS with the lockup occurring on the container requiring a hard reset.

Jan 25 04:58:13 plexserver kernel: [266258.728238] PGD 20220e067 P4D 20220e067 PUD 202210067 PMD 413378067 PTE 41a50a061
Jan 25 04:58:13 plexserver kernel: [266258.728306] CPU: 0 PID: 16530 Comm: Plex Media Serv Tainted: P O 4.15.18-9-pve #1
Jan 25 04:58:13 plexserver kernel: [266258.728311] RIP: 0010:avl_insert+0x4b/0xd0 [zavl]
Jan 25 04:58:13 plexserver kernel: [266258.728316] RDX: 0000000000000000 RSI: ffff8e591a350208 RDI: ffff8e572823d808
Jan 25 04:58:13 plexserver kernel: [266258.728330] R13: ffffffffc045a15c R14: 0000000000000000 R15: 0000000000000000
Jan 25 04:58:13 plexserver kernel: [266258.728337] Call Trace:
Jan 25 04:58:13 plexserver kernel: [266258.728401] zfs_get_data+0x1b5/0x2a0 [zfs]
Jan 25 04:58:13 plexserver kernel: [266258.728485] vfs_fsync_range+0x51/0xb0
Jan 25 04:58:13 plexserver kernel: [266258.728496] RIP: 0033:0x7f86686e6247
Jan 25 04:58:13 plexserver kernel: [266258.728504] R10: 00007f866e9fefcf R11: 0000000000000293 R12: 00007f86340b9480
Jan 25 04:58:13 plexserver kernel: [266258.728533] ---[ end trace 4b1dad477f264a8e ]---

Currently on Linux 4.15.18-9-pve #1 SMP PVE 4.15.18-30.
 
Oct 7, 2018
40
5
13
I just got this again (hasn't happened since my initial post on Oct 7, 2018). I've updated the PVE kernel since the initial post, but looks like there is another update waiting for me. I'm on 4.15.18-11-pve as listed in the trace entry here:

Code:
Mar 20 04:24:51 pve1 kernel: BUG: unable to handle kernel paging request at ffffffffc0232164
Mar 20 04:24:51 pve1 kernel: IP: avl_insert+0x4b/0xd0 [zavl]
Mar 20 04:24:51 pve1 kernel: PGD 55be0e067 P4D 55be0e067 PUD 55be10067 PMD 62d226067 PTE 62c74c061
Mar 20 04:24:51 pve1 kernel: Oops: 0003 [#1] SMP PTI
Mar 20 04:24:51 pve1 kernel: Modules linked in: tcp_diag inet_diag binfmt_misc veth ip_set ip6table_filter ip6_tables cmac arc4 md4 nls_utf8 cifs ccm fscache iptable_filter softdog nfnetlink_log nfnetlink intel_powerclamp cor
Mar 20 04:24:51 pve1 kernel:  e1000e(O) i2c_i801 libahci ptp pps_core
Mar 20 04:24:51 pve1 kernel: CPU: 8 PID: 29287 Comm: Plex Media Serv Tainted: P          IO     4.15.18-11-pve #1
Mar 20 04:24:51 pve1 kernel: Hardware name: ASUS RS700-E6-RS4/Z8PS-D12-1U, BIOS 0401    01/22/2009
Mar 20 04:24:51 pve1 kernel: RIP: 0010:avl_insert+0x4b/0xd0 [zavl]
Mar 20 04:24:51 pve1 kernel: RSP: 0018:ffffa5064c60bce0 EFLAGS: 00010282
Mar 20 04:24:51 pve1 kernel: RAX: 0000000000000000 RBX: ffff8d7043590300 RCX: ffffffffc0232165
Mar 20 04:24:51 pve1 kernel: RDX: 0000000000000000 RSI: ffff8d7043590308 RDI: ffff8d702756aa50
Mar 20 04:24:51 pve1 kernel: RBP: ffffa5064c60bd30 R08: ffffffffc0232164 R09: ffff8d70f3807180
Mar 20 04:24:51 pve1 kernel: R10: ffff8d7043590300 R11: ffff8d708b4b5be0 R12: ffff8d702756aa20
Mar 20 04:24:51 pve1 kernel: R13: ffffffffc023215c R14: 0000000000000000 R15: 0000000000000000
Mar 20 04:24:51 pve1 kernel: FS:  00007f33517fa700(0000) GS:ffff8d70f3d00000(0000) knlGS:0000000000000000
Mar 20 04:24:51 pve1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 20 04:24:51 pve1 kernel: CR2: ffffffffc0232164 CR3: 00000001b9184000 CR4: 00000000000026e0
Mar 20 04:24:51 pve1 kernel: Call Trace:
Mar 20 04:24:51 pve1 kernel:  ? zfs_range_lock+0x4bf/0x5c0 [zfs]
Mar 20 04:24:51 pve1 kernel:  ? spl_kmem_zalloc+0xa4/0x190 [spl]
Mar 20 04:24:51 pve1 kernel:  ? spl_kmem_zalloc+0xa4/0x190 [spl]
Mar 20 04:24:51 pve1 kernel:  zfs_get_data+0x1b5/0x2a0 [zfs]
Mar 20 04:24:51 pve1 kernel:  zil_commit.part.14+0x451/0x8b0 [zfs]
Mar 20 04:24:51 pve1 kernel:  zil_commit+0x17/0x20 [zfs]
Mar 20 04:24:51 pve1 kernel:  zfs_fsync+0x77/0xf0 [zfs]
Mar 20 04:24:51 pve1 kernel:  zpl_fsync+0x68/0xa0 [zfs]
Mar 20 04:24:51 pve1 kernel:  vfs_fsync_range+0x51/0xb0
Mar 20 04:24:51 pve1 kernel:  do_fsync+0x3d/0x70
Mar 20 04:24:51 pve1 kernel:  SyS_fsync+0x10/0x20
Mar 20 04:24:51 pve1 kernel:  do_syscall_64+0x73/0x130
Mar 20 04:24:51 pve1 kernel:  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Mar 20 04:24:51 pve1 kernel: RIP: 0033:0x7f3425869b07
Mar 20 04:24:51 pve1 kernel: RSP: 002b:00007f33517f8ba0 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
Mar 20 04:24:51 pve1 kernel: RAX: ffffffffffffffda RBX: 000000000000000b RCX: 00007f3425869b07
Mar 20 04:24:51 pve1 kernel: RDX: 0000000000000000 RSI: 0000000000000002 RDI: 000000000000000b
Mar 20 04:24:51 pve1 kernel: RBP: 00000000000002ab R08: 0000000000000000 R09: 00007f33980d7a50
Mar 20 04:24:51 pve1 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00000000031aaa78
Mar 20 04:24:51 pve1 kernel: R13: 00000000000aec28 R14: 00000000031eea20 R15: 0000000000000000
Mar 20 04:24:51 pve1 kernel: Code: 89 c1 83 e0 04 48 83 c9 01 48 09 c8 4d 85 c0 48 c7 06 00 00 00 00 48 c7 46 08 00 00 00 00 48 89 46 10 0f 84 84 00 00 00 48 63 c2 <49> 89 34 c0 49 8b 50 10 8b 04 85 70 d1 24 c0 89 d1 83 e1 03
Mar 20 04:24:51 pve1 kernel: RIP: avl_insert+0x4b/0xd0 [zavl] RSP: ffffa5064c60bce0
Mar 20 04:24:51 pve1 kernel: CR2: ffffffffc0232164
Mar 20 04:24:51 pve1 kernel: ---[ end trace 9b38454b5bc9f325 ]---
 
Last edited:

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
6,192
864
148
Mar 20 04:24:51 pve1 kernel: Hardware name: ASUS RS700-E6-RS4/Z8PS-D12-1U, BIOS 0401 01/22/2009
hm - please upgrade your BIOS to a newer version (and install the intel-microcode/amd-microcode packages from non-free) - quite a few vulnerabilities that came up around last year (spectre/meltdown/...) made some changes necessary on levels where bugs in the firmware/BIOS showed up and became visible.
 
Oct 7, 2018
40
5
13
hm - please upgrade your BIOS to a newer version (and install the intel-microcode/amd-microcode packages from non-free) - quite a few vulnerabilities that came up around last year (spectre/meltdown/...) made some changes necessary on levels where bugs in the firmware/BIOS showed up and became visible.

Will do. I just upgraded BIOS on my other identical machine, so for now I'll move the Plex container over there. I'll report back if I see this again on the new BIOS rev. Thanks!
 
  • Like
Reactions: Stoiko Ivanov

ggzengel

New Member
Jul 8, 2019
14
0
1
I got this bug too. It's fixed on github/zfs #8379

I made a issue to put this to 7.14: github.com/zfsonlinux/zfs/issues/9002

Is it posible to put this fix before 7.14. I need to use drbd on zfs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!