Upgrade to pve-kernel-5.13.19-3-pve with a nested container running wireguard, page fault/crash

PhilD_

Member
Jan 21, 2022
1
0
6
43
I updated to the latest kernel, and basically once wireguard starts up in a nested container I get a page fault/panic.

Code:
Jan 20 14:07:08 pve kernel: [   44.267836] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
Jan 20 14:13:24 pve kernel: [  419.576661] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#1] SMP PTI
Jan 20 14:13:24 pve kernel: [  419.576768] RIP: 0010:get_page_from_freelist+0x174/0xd50
Jan 20 14:13:24 pve kernel: [  419.576820] RSP: 0018:ffffb2bc88287980 EFLAGS: 00010093
Jan 20 14:13:24 pve kernel: [  419.576921] R10: ffff8e7608209000 R11: 0000000000000000 R12: ffffb2bc88287a68
Jan 20 14:13:24 pve kernel: [  419.576980] CR2: 00007f4e9e447000 CR3: 000000018858c005 CR4: 00000000001706e0
Jan 20 14:13:24 pve kernel: [  419.577023]  __alloc_pages+0x17b/0x330
Jan 20 14:13:24 pve kernel: [  419.577074]  fuse_readdir_uncached+0x554/0x8e0
Jan 20 14:13:24 pve kernel: [  419.577124]  fuse_readdir+0x145/0x6c0
Jan 20 14:13:24 pve kernel: [  419.577166]  do_syscall_64+0x61/0xb0
Jan 20 14:13:24 pve kernel: [  419.577208]  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jan 20 14:13:24 pve kernel: [  419.579115] RAX: ffffffffffffffda RBX: 00007f4e61d61280 RCX: 00007f4f109d843b
Jan 20 14:13:24 pve kernel: [  419.582382] R13: 00007f4ddc190110 R14: 0000000000000000 R15: 00007f4e61d61280
Jan 20 14:13:24 pve kernel: [  419.596376] ---[ end trace c27ac3b7956969c3 ]---
Jan 20 14:13:24 pve kernel: [  420.187478] RAX: ffffd465092c9ac8 RBX: 0000000000000000 RCX: dead000000000100
Jan 20 14:13:24 pve kernel: [  420.191460] R13: ffff8e77dffd4b80 R14: 0000000000000297 R15: ffff8e77cfd3b1e0
Jan 20 14:13:24 pve kernel: [  420.249813]  handle_mm_fault+0xda/0x2c0
Jan 20 14:13:24 pve kernel: [  420.254056]  asm_exc_page_fault+0x1e/0x30
Jan 20 14:13:24 pve kernel: [  420.256822] RAX: 0000000000000000 RBX: 00007f4e5fffe000 RCX: 0000000000001a68
Jan 20 14:13:24 pve kernel: [  420.258850] R13: 00007f4e6b530a68 R14: 00007f4e6b530a70 R15: 0000000000002000
Jan 20 14:13:24 pve kernel: [  420.267875] ---[ end trace c27ac3b7956969c4 ]---
Jan 20 14:13:24 pve kernel: [  420.283652] RAX: ffffd465092c9ac8 RBX: 0000000000000000 RCX: dead000000000100
Jan 20 14:13:24 pve kernel: [  420.285417] R10: ffff8e7608209000 R11: 0000000000000000 R12: ffffb2bc88287a68
Jan 20 14:13:24 pve kernel: [  420.287719] CR2: 00007f4e6b52f000 CR3: 000000018858c006 CR4: 00000000001706e0
Jan 20 14:13:28 pve kernel: [  423.491914] RIP: 0010:get_page_from_freelist+0x174/0xd50
Jan 20 14:13:28 pve kernel: [  423.494969] RAX: ffffd465092c9ac8 RBX: 0000000000000000 RCX: dead000000000100
Jan 20 14:13:28 pve kernel: [  423.496949] R10: ffffffff8ebd7f32 R11: 0000000000000000 R12: ffffb2bc8107fa88
Jan 20 14:13:28 pve kernel: [  423.498809] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 20 14:13:28 pve kernel: [  423.501088]  <TASK>
Jan 20 14:13:28 pve kernel: [  423.503542]  pagecache_get_page+0x2c2/0x560
Jan 20 14:13:28 pve kernel: [  423.505284]  fuse_file_write_iter+0x3de/0x430
Jan 20 14:13:28 pve kernel: [  423.507784]  ksys_write+0x67/0xe0
Jan 20 14:13:28 pve kernel: [  423.509718]  ? irqentry_exit+0x19/0x30
Jan 20 14:13:28 pve kernel: [  423.511366] RIP: 0033:0x7f01721e2fb3
Jan 20 14:13:28 pve kernel: [  423.513837] RDX: 0000000000000053 RSI: 000055bb920724a0 RDI: 0000000000000008
Jan 20 14:13:28 pve kernel: [  423.515501]  </TASK>
Jan 20 14:13:28 pve kernel: [  424.102491] RIP: 0010:get_page_from_freelist+0x174/0xd50
Jan 20 14:13:28 pve kernel: [  424.105638] RDX: dead000000000122 RSI: dead000000000100 RDI: 0000000000100cca
Jan 20 14:13:28 pve kernel: [  424.108063] FS:  00007f0171fcf280(0000) GS:ffff8e77cfd00000(0000) knlGS:0000000000000000
Jan 20 14:13:30 pve kernel: [  425.876205] general protection fault, probably for non-canonical address 0xdead000000000108: 0000 [#4] SMP PTI
Jan 20 14:13:30 pve kernel: [  425.877504] Hardware name: Supermicro X10SLL-F/X10SLL-F, BIOS 3.3 03/06/2020
Jan 20 14:13:30 pve kernel: [  425.878107] RIP: 0010:get_page_from_freelist+0x174/0xd50
Jan 20 14:13:30 pve kernel: [  425.880400] RSP: 0000:ffffb2bc8861fb78 EFLAGS: 00010093
Jan 20 14:13:30 pve kernel: [  425.882075] RDX: dead000000000122 RSI: dead000000000100 RDI: 0000000000100cca
Jan 20 14:13:30 pve kernel: [  425.883333] R10: 0000000000000000 R11: 0000000000000000 R12: ffffb2bc8861fc60
Jan 20 14:13:30 pve kernel: [  425.884865] FS:  00007fc11ee58280(0000) GS:ffff8e77cfd00000(0000) knlGS:0000000000000000
Jan 20 14:13:30 pve kernel: [  425.886234] CR2: 00007fff96fb3d08 CR3: 0000000248390003 CR4: 00000000001706e0
Jan 20 14:13:30 pve kernel: [  425.887902]  <TASK>
Jan 20 14:13:30 pve kernel: [  425.889767]  wp_page_copy+0x79/0x5d0
Jan 20 14:13:30 pve kernel: [  425.891926]  do_wp_page+0xef/0x300
Jan 20 14:13:30 pve kernel: [  425.893997]  do_user_addr_fault+0x1bb/0x660
Jan 20 14:13:30 pve kernel: [  425.895501]  ? asm_exc_page_fault+0x8/0x30
Jan 20 14:13:30 pve kernel: [  425.897336] Code: 00 00 48 8b 15 11 29 0f 00 f7 d8 41 bd ff ff ff ff 64 89 02 66 0f 1f 44 00 00 85 ed 0f 85 80 00 00 00 44 89 e6 bf 02 00 00 00 <e8> 3b 9c fb ff 44 89 e8 5d 41 5c 41 5d c3 66 90 e8 eb 8a fb ff e8
Jan 20 14:13:30 pve kernel: [  425.899543] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002
Jan 20 14:13:30 pve kernel: [  425.900866] R13: 0000000000003449 R14: 0000000000000001 R15: 0000000000000001
Jan 20 14:13:30 pve kernel: [  425.901712]  sysfillrect sysimgblt zzstd(O) intel_pch_thermal zlua(O) ie31200_edac zavl(PO) icp(PO) acpi_ipmi ipmi_si ipmi_devintf zcommon(PO) ipmi_msghandler znvpair(PO) spl(O) mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi jc42 coretemp vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio drivetemp drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c mlx4_ib ib_uverbs mlx4_en ib_core hid_generic usbkbd usbmouse usbhid hid crc32_pclmul i2c_i801 i2c_smbus xhci_pci ahci xhci_pci_renesas libahci lpc_ich igb mpt3sas i2c_algo_bit ehci_pci dca raid_class e1000e mlx4_core xhci_hcd ehci_hcd scsi_transport_sas video
Jan 20 14:13:30 pve kernel: [  426.461171] Code: f9 48 c1 e2 04 49 8b 41 10 4c 01 fa 48 39 c2 0f 84 a6 02 00 00 48 be 00 01 00 00 00 00 ad de 49 8b 41 10 48 8b 08 48 8b 50 08 <48> 89 51 08 48 89 0a 48 b9 22 01 00 00 00 00 ad de 48 89 30 48 89
. . .

If I fall back to 5.13.19-2 I have no issues. I did run memtest86 fully with no errors. Sort of stumped.
 
Hmm - could you please:
* post the config of the container (I assume by 'nested container' you mean a lxc container with nesting enabled)?
* what exactly is installed in the container (which container images is used - which version of wireguard-tools)

You could also try installing the pve-kernel-5.15 meta-package (this will be the next kernel-release for PVE - so you might want to give it a test anyways)