Upgrade to pve-kernel-5.13.19-3-pve with a nested container running wireguard, page fault/crash

PhilD_

Member
Jan 21, 2022
1
0
6
43
I updated to the latest kernel, and basically once wireguard starts up in a nested container I get a page fault/panic.

Code:
Jan 20 14:07:08 pve kernel: [   44.267836] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
Jan 20 14:13:24 pve kernel: [  419.576661] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP PTI
Jan 20 14:13:24 pve kernel: [  419.576768] RIP: 0010:get_page_from_freelist+0x174/0xd50
Jan 20 14:13:24 pve kernel: [  419.576820] RSP: 0018:ffffb2bc88287980 EFLAGS: 00010093
Jan 20 14:13:24 pve kernel: [  419.576921] R10: ffff8e7608209000 R11: 0000000000000000 R12: ffffb2bc88287a68
Jan 20 14:13:24 pve kernel: [  419.576980] CR2: 00007f4e9e447000 CR3: 000000018858c005 CR4: 00000000001706e0
Jan 20 14:13:24 pve kernel: [  419.577023]  __alloc_pages+0x17b/0x330
Jan 20 14:13:24 pve kernel: [  419.577074]  fuse_readdir_uncached+0x554/0x8e0
Jan 20 14:13:24 pve kernel: [  419.577124]  fuse_readdir+0x145/0x6c0
Jan 20 14:13:24 pve kernel: [  419.577166]  do_syscall_64+0x61/0xb0
Jan 20 14:13:24 pve kernel: [  419.577208]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan 20 14:13:24 pve kernel: [  419.579115] RAX: ffffffffffffffda RBX: 00007f4e61d61280 RCX: 00007f4f109d843b
Jan 20 14:13:24 pve kernel: [  419.582382] R13: 00007f4ddc190110 R14: 0000000000000000 R15: 00007f4e61d61280
Jan 20 14:13:24 pve kernel: [  419.596376] ---[ end trace c27ac3b7956969c3 ]---
Jan 20 14:13:24 pve kernel: [  420.187478] RAX: ffffd465092c9ac8 RBX: 0000000000000000 RCX: dead000000000100
Jan 20 14:13:24 pve kernel: [  420.191460] R13: ffff8e77dffd4b80 R14: 0000000000000297 R15: ffff8e77cfd3b1e0
Jan 20 14:13:24 pve kernel: [  420.249813]  handle_mm_fault+0xda/0x2c0
Jan 20 14:13:24 pve kernel: [  420.254056]  asm_exc_page_fault+0x1e/0x30
Jan 20 14:13:24 pve kernel: [  420.256822] RAX: 0000000000000000 RBX: 00007f4e5fffe000 RCX: 0000000000001a68
Jan 20 14:13:24 pve kernel: [  420.258850] R13: 00007f4e6b530a68 R14: 00007f4e6b530a70 R15: 0000000000002000
Jan 20 14:13:24 pve kernel: [  420.267875] ---[ end trace c27ac3b7956969c4 ]---
Jan 20 14:13:24 pve kernel: [  420.283652] RAX: ffffd465092c9ac8 RBX: 0000000000000000 RCX: dead000000000100
Jan 20 14:13:24 pve kernel: [  420.285417] R10: ffff8e7608209000 R11: 0000000000000000 R12: ffffb2bc88287a68
Jan 20 14:13:24 pve kernel: [  420.287719] CR2: 00007f4e6b52f000 CR3: 000000018858c006 CR4: 00000000001706e0
Jan 20 14:13:28 pve kernel: [  423.491914] RIP: 0010:get_page_from_freelist+0x174/0xd50
Jan 20 14:13:28 pve kernel: [  423.494969] RAX: ffffd465092c9ac8 RBX: 0000000000000000 RCX: dead000000000100
Jan 20 14:13:28 pve kernel: [  423.496949] R10: ffffffff8ebd7f32 R11: 0000000000000000 R12: ffffb2bc8107fa88
Jan 20 14:13:28 pve kernel: [  423.498809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 20 14:13:28 pve kernel: [  423.501088]  <TASK>
Jan 20 14:13:28 pve kernel: [  423.503542]  pagecache_get_page+0x2c2/0x560
Jan 20 14:13:28 pve kernel: [  423.505284]  fuse_file_write_iter+0x3de/0x430
Jan 20 14:13:28 pve kernel: [  423.507784]  ksys_write+0x67/0xe0
Jan 20 14:13:28 pve kernel: [  423.509718]  ? irqentry_exit+0x19/0x30
Jan 20 14:13:28 pve kernel: [  423.511366] RIP: 0033:0x7f01721e2fb3
Jan 20 14:13:28 pve kernel: [  423.513837] RDX: 0000000000000053 RSI: 000055bb920724a0 RDI: 0000000000000008
Jan 20 14:13:28 pve kernel: [  423.515501]  </TASK>
Jan 20 14:13:28 pve kernel: [  424.102491] RIP: 0010:get_page_from_freelist+0x174/0xd50
Jan 20 14:13:28 pve kernel: [  424.105638] RDX: dead000000000122 RSI: dead000000000100 RDI: 0000000000100cca
Jan 20 14:13:28 pve kernel: [  424.108063] FS:  00007f0171fcf280(0000) GS:ffff8e77cfd00000(0000) knlGS:0000000000000000
Jan 20 14:13:30 pve kernel: [  425.876205] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#4] SMP PTI
Jan 20 14:13:30 pve kernel: [  425.877504] Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.3 03/06/2020
Jan 20 14:13:30 pve kernel: [  425.878107] RIP: 0010:get_page_from_freelist+0x174/0xd50
Jan 20 14:13:30 pve kernel: [  425.880400] RSP: 0000:ffffb2bc8861fb78 EFLAGS: 00010093
Jan 20 14:13:30 pve kernel: [  425.882075] RDX: dead000000000122 RSI: dead000000000100 RDI: 0000000000100cca
Jan 20 14:13:30 pve kernel: [  425.883333] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb2bc8861fc60
Jan 20 14:13:30 pve kernel: [  425.884865] FS:  00007fc11ee58280(0000) GS:ffff8e77cfd00000(0000) knlGS:0000000000000000
Jan 20 14:13:30 pve kernel: [  425.886234] CR2: 00007fff96fb3d08 CR3: 0000000248390003 CR4: 00000000001706e0
Jan 20 14:13:30 pve kernel: [  425.887902]  <TASK>
Jan 20 14:13:30 pve kernel: [  425.889767]  wp_page_copy+0x79/0x5d0
Jan 20 14:13:30 pve kernel: [  425.891926]  do_wp_page+0xef/0x300
Jan 20 14:13:30 pve kernel: [  425.893997]  do_user_addr_fault+0x1bb/0x660
Jan 20 14:13:30 pve kernel: [  425.895501]  ? asm_exc_page_fault+0x8/0x30
Jan 20 14:13:30 pve kernel: [  425.897336] Code: 00 00 48 8b 15 11 29 0f 00 f7 d8 41 bd ff ff ff ff 64 89 02 66 0f 1f 44 00 00 85 ed 0f 85 80 00 00 00 44 89 e6 bf 02 00 00 00 <e8> 3b 9c fb ff 44 89 e8 5d 41 5c 41 5d c3 66 90 e8 eb 8a fb ff e8
Jan 20 14:13:30 pve kernel: [  425.899543] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002
Jan 20 14:13:30 pve kernel: [  425.900866] R13: 0000000000003449 R14: 0000000000000001 R15: 0000000000000001
Jan 20 14:13:30 pve kernel: [  425.901712]  sysfillrect sysimgblt zzstd(O) intel_pch_thermal zlua(O) ie31200_edac zavl(PO) icp(PO) acpi_ipmi ipmi_si ipmi_devintf zcommon(PO) ipmi_msghandler znvpair(PO) spl(O) mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi jc42 coretemp vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio drivetemp drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c mlx4_ib ib_uverbs mlx4_en ib_core hid_generic usbkbd usbmouse usbhid hid crc32_pclmul i2c_i801 i2c_smbus xhci_pci ahci xhci_pci_renesas libahci lpc_ich igb mpt3sas i2c_algo_bit ehci_pci dca raid_class e1000e mlx4_core xhci_hcd ehci_hcd scsi_transport_sas video
Jan 20 14:13:30 pve kernel: [  426.461171] Code: f9 48 c1 e2 04 49 8b 41 10 4c 01 fa 48 39 c2 0f 84 a6 02 00 00 48 be 00 01 00 00 00 00 ad de 49 8b 41 10 48 8b 08 48 8b 50 08 <48> 89 51 08 48 89 0a 48 b9 22 01 00 00 00 00 ad de 48 89 30 48 89
. . .

If I fall back to 5.13.19-2 I have no issues. I did run memtest86 fully with no errors. Sort of stumped.
 
Hmm - could you please:
* post the config of the container (I assume by 'nested container' you mean a lxc container with nesting enabled)?
* what exactly is installed in the container (which container images is used - which version of wireguard-tools)

You could also try installing the pve-kernel-5.15 meta-package (this will be the next kernel-release for PVE - so you might want to give it a test anyways)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!