Morning,
we have a couple of PVE nodes. Our largest one is a 256GB and is heavily used. After reboot we are sitting at around 60-65% memory utilization. Over the course of hours it rises to 85-90%, but keeps 20+G of free memory available.
We now have failed lxc and vms and it seems to be memory related, but I have no clue where to check.
Sometimes it starts fine after a couple of tries. Sometimes it helps to remove the NIC but later on starting WITH the NIC works just fine.
Kinda at a loss here and would greatly appreciated any hints at where to go.
Thanks
Marie.
we have a couple of PVE nodes. Our largest one is a 256GB and is heavily used. After reboot we are sitting at around 60-65% memory utilization. Over the course of hours it rises to 85-90%, but keeps 20+G of free memory available.
We now have failed lxc and vms and it seems to be memory related, but I have no clue where to check.
Code:
Sep 03 16:29:09 pve02 pvedaemon[1762147]: starting CT 108: UPID:pve02:001AE363:07104002:631364B5:vzstart:108:root@pam:
Sep 03 16:29:09 pve02 pvedaemon[3450029]: <root@pam> starting task UPID:pve02:001AE363:07104002:631364B5:vzstart:108:root@pam:
Sep 03 16:29:09 pve02 systemd[1]: Started PVE LXC Container: 108.
Sep 03 16:29:10 pve02 audit[1762165]: AVC apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1762165 comm="apparmor_parser"
Sep 03 16:29:10 pve02 kernel: audit: type=1400 audit(1662215350.428:310): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1762165 comm="apparmor_parser"
Sep 03 16:29:10 pve02 kernel: lxc-start: page allocation failure: order:5, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null),cpuset=108,mems_allowed=0
Sep 03 16:29:10 pve02 kernel: CPU: 29 PID: 1762151 Comm: lxc-start Tainted: P O 5.15.35-1-pve #1
Sep 03 16:29:10 pve02 kernel: Hardware name: primeLine Solutions egino BTO/H12SSL-C, BIOS 2.1 06/02/2021
Sep 03 16:29:10 pve02 kernel: Call Trace:
Sep 03 16:29:10 pve02 kernel: <TASK>
Sep 03 16:29:10 pve02 kernel: dump_stack_lvl+0x4a/0x5f
Sep 03 16:29:10 pve02 kernel: dump_stack+0x10/0x12
Sep 03 16:29:10 pve02 kernel: warn_alloc+0x137/0x160
Sep 03 16:29:10 pve02 kernel: __alloc_pages_slowpath.constprop.0+0xdd0/0xe30
Sep 03 16:29:10 pve02 kernel: __alloc_pages+0x308/0x320
Sep 03 16:29:10 pve02 kernel: alloc_pages+0x9e/0x1e0
Sep 03 16:29:10 pve02 kernel: kmalloc_order+0x2f/0xc0
Sep 03 16:29:10 pve02 kernel: kmalloc_order_trace+0x1d/0x90
Sep 03 16:29:10 pve02 kernel: __kmalloc+0x2ad/0x330
Sep 03 16:29:10 pve02 kernel: veth_dev_init+0x88/0x120 [veth]
Sep 03 16:29:10 pve02 kernel: register_netdevice+0x118/0x660
Sep 03 16:29:10 pve02 kernel: ? get_random_bytes+0x43/0x90
Sep 03 16:29:10 pve02 kernel: veth_newlink+0x1a1/0x410 [veth]
Sep 03 16:29:10 pve02 kernel: __rtnl_newlink+0x76a/0xa20
Sep 03 16:29:10 pve02 kernel: ? dmu_object_size_from_db+0x6c/0x80 [zfs]
Sep 03 16:29:10 pve02 kernel: ? __cond_resched+0x1a/0x50
Sep 03 16:29:10 pve02 kernel: ? mutex_lock+0x13/0x40
Sep 03 16:29:10 pve02 kernel: ? __cond_resched+0x1a/0x50
Sep 03 16:29:10 pve02 kernel: ? get_partial_node.part.0+0xdf/0x230
Sep 03 16:29:10 pve02 kernel: rtnl_newlink+0x49/0x70
Sep 03 16:29:10 pve02 kernel: rtnetlink_rcv_msg+0x160/0x410
Sep 03 16:29:10 pve02 kernel: ? rtnl_calcit.isra.0+0x130/0x130
Sep 03 16:29:10 pve02 kernel: netlink_rcv_skb+0x55/0x100
Sep 03 16:29:10 pve02 kernel: rtnetlink_rcv+0x15/0x20
Sep 03 16:29:10 pve02 kernel: netlink_unicast+0x221/0x330
Sep 03 16:29:10 pve02 kernel: netlink_sendmsg+0x23f/0x4a0
Sep 03 16:29:10 pve02 kernel: sock_sendmsg+0x65/0x70
Sep 03 16:29:10 pve02 kernel: ____sys_sendmsg+0x257/0x2a0
Sep 03 16:29:10 pve02 kernel: ? import_iovec+0x31/0x40
Sep 03 16:29:10 pve02 kernel: ? sendmsg_copy_msghdr+0x7e/0xa0
Sep 03 16:29:10 pve02 kernel: ___sys_sendmsg+0x82/0xc0
Sep 03 16:29:10 pve02 kernel: ? wp_page_copy+0x2dc/0x570
Sep 03 16:29:10 pve02 kernel: ? do_wp_page+0xef/0x300
Sep 03 16:29:10 pve02 kernel: ? move_addr_to_user+0x4d/0xe0
Sep 03 16:29:10 pve02 kernel: ? __handle_mm_fault+0xc5a/0x15c0
Sep 03 16:29:10 pve02 kernel: __sys_sendmsg+0x62/0xb0
Sep 03 16:29:10 pve02 kernel: __x64_sys_sendmsg+0x1f/0x30
Sep 03 16:29:10 pve02 kernel: do_syscall_64+0x5c/0xc0
Sep 03 16:29:10 pve02 kernel: ? exit_to_user_mode_prepare+0x37/0x1b0
Sep 03 16:29:10 pve02 kernel: ? irqentry_exit_to_user_mode+0x9/0x20
Sep 03 16:29:10 pve02 kernel: ? irqentry_exit+0x19/0x30
Sep 03 16:29:10 pve02 kernel: ? exc_page_fault+0x89/0x160
Sep 03 16:29:10 pve02 kernel: ? asm_exc_page_fault+0x8/0x30
Sep 03 16:29:10 pve02 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Sep 03 16:29:10 pve02 kernel: RIP: 0033:0x7f5c89a28e13
Sep 03 16:29:10 pve02 kernel: Code: 8b 15 b9 91 00 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
Sep 03 16:29:10 pve02 kernel: RSP: 002b:00007ffdf7f64f68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
Sep 03 16:29:10 pve02 kernel: RAX: ffffffffffffffda RBX: 000055e4c8f1a630 RCX: 00007f5c89a28e13
Sep 03 16:29:10 pve02 kernel: RDX: 0000000000004000 RSI: 00007ffdf7f64f90 RDI: 0000000000000008
Sep 03 16:29:10 pve02 kernel: RBP: 00007ffdf7f65020 R08: 000000000000000a R09: 0000000000000068
Sep 03 16:29:10 pve02 kernel: R10: 0000000000000004 R11: 0000000000000246 R12: 00007ffdf7f65150
Sep 03 16:29:10 pve02 kernel: R13: 000055e4c8f11a58 R14: 000055e4c8f17780 R15: 00007ffdf7f65020
Sep 03 16:29:10 pve02 kernel: </TASK>
Sep 03 16:29:10 pve02 kernel: Mem-Info:
Sep 03 16:29:10 pve02 kernel: active_anon:18215031 inactive_anon:8137544 isolated_anon:0
active_file:19425 inactive_file:16754 isolated_file:0
unevictable:38868 dirty:138 writeback:0
slab_reclaimable:255908 slab_unreclaimable:4421258
mapped:45804 shmem:29525 pagetables:80469 bounce:0
kernel_misc_reclaimable:0
free:8821935 free_pcp:215 free_cma:0
Sep 03 16:29:10 pve02 kernel: Node 0 active_anon:72860124kB inactive_anon:32550176kB active_file:77700kB inactive_file:67016kB unevictable:155472kB isolated(anon):0kB isolated(file):0kB mapped:183216kB dirty:1056kB writeback:0kB shmem:118100kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 3338240kB writeback_tmp:0kB kernel_stack:46784kB pagetables:321876kB all_unreclaimable? yes
Sep 03 16:29:10 pve02 kernel: Node 0 DMA free:11264kB min:0kB low:12kB high:24kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Sep 03 16:29:10 pve02 kernel: lowmem_reserve[]: 0 2551 257499 257499 257499
Sep 03 16:29:10 pve02 kernel: Node 0 DMA32 free:1018944kB min:668kB low:3280kB high:5892kB reserved_highatomic:2048KB active_anon:1301716kB inactive_anon:270164kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2741616kB managed:2674808kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Sep 03 16:29:10 pve02 kernel: lowmem_reserve[]: 0 0 254947 254947 254947
Sep 03 16:29:10 pve02 kernel: Node 0 Normal free:34256288kB min:950460kB low:1211524kB high:1472588kB reserved_highatomic:0KB active_anon:71558408kB inactive_anon:32280012kB active_file:77700kB inactive_file:67016kB unevictable:155472kB writepending:1304kB present:265534464kB managed:261073064kB mlocked:155472kB bounce:0kB free_pcp:1612kB local_pcp:0kB free_cma:0kB
Sep 03 16:29:10 pve02 kernel: lowmem_reserve[]: 0 0 0 0 0
Sep 03 16:29:10 pve02 kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Sep 03 16:29:10 pve02 kernel: Node 0 DMA32: 2012*4kB (UM) 714*8kB (UM) 1170*16kB (UMH) 1225*32kB (UMEH) 973*64kB (UME) 912*128kB (UME) 831*256kB (UME) 557*512kB (UME) 258*1024kB (UME) 3*2048kB (M) 0*4096kB = 1018944kB
Sep 03 16:29:10 pve02 kernel: Node 0 Normal: 769151*4kB (UME) 1821559*8kB (UME) 1037974*16kB (UME) 120*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 34260500kB
Sep 03 16:29:10 pve02 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Sep 03 16:29:10 pve02 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Sep 03 16:29:10 pve02 kernel: 69988 total pagecache pages
Sep 03 16:29:10 pve02 kernel: 0 pages in swap cache
Sep 03 16:29:10 pve02 kernel: Swap cache stats: add 0, delete 0, find 0/0
Sep 03 16:29:10 pve02 kernel: Free swap = 0kB
Sep 03 16:29:10 pve02 kernel: Total swap = 0kB
Sep 03 16:29:10 pve02 kernel: 67073019 pages RAM
Sep 03 16:29:10 pve02 kernel: 0 pages HighMem/MovableOnly
Sep 03 16:29:10 pve02 kernel: 1132211 pages reserved
Sep 03 16:29:10 pve02 kernel: 0 pages hwpoisoned
Sep 03 16:29:10 pve02 pvedaemon[1762147]: startup for container '108' failed
Sep 03 16:29:10 pve02 pvedaemon[3450029]: <root@pam> end task UPID:pve02:001AE363:07104002:631364B5:vzstart:108:root@pam: startup for container '108' failed
Sep 03 16:29:10 pve02 audit[1762176]: AVC apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1762176 comm="apparmor_parser"
Sep 03 16:29:10 pve02 kernel: audit: type=1400 audit(1662215350.712:311): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1762176 comm="apparmor_parser"
Sep 03 16:29:11 pve02 systemd[1]: pve-container@108.service: Main process exited, code=exited, status=1/FAILURE
Sep 03 16:29:11 pve02 systemd[1]: pve-container@108.service: Failed with result 'exit-code'.
Sep 03 16:29:16 pve02 pvedaemon[1762970]: starting CT 108: UPID:pve02:001AE69A:071042D7:631364BC:vzstart:108:root@pam:
Sep 03 16:29:16 pve02 pvedaemon[3409093]: <root@pam> starting task UPID:pve02:001AE69A:071042D7:631364BC:vzstart:108:root@pam:
Sep 03 16:29:17 pve02 systemd[1]: Started PVE LXC Container: 108.
Sep 03 16:29:17 pve02 audit[1762984]: AVC apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1762984 comm="apparmor_parser"
Sep 03 16:29:17 pve02 kernel: audit: type=1400 audit(1662215357.684:312): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1762984 comm="apparmor_parser"
Sep 03 16:29:17 pve02 pvedaemon[1762970]: startup for container '108' failed
Sep 03 16:29:17 pve02 pvedaemon[3409093]: <root@pam> end task UPID:pve02:001AE69A:071042D7:631364BC:vzstart:108:root@pam: startup for container '108' failed
Sep 03 16:29:17 pve02 audit[1762994]: AVC apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1762994 comm="apparmor_parser"
Sep 03 16:29:18 pve02 kernel: audit: type=1400 audit(1662215357.912:313): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-108_</var/lib/lxc>" pid=1762994 comm="apparmor_parser"
Sep 03 16:29:18 pve02 systemd[1]: pve-container@108.service: Main process exited, code=exited, status=1/FAILURE
Sep 03 16:29:18 pve02 systemd[1]: pve-container@108.service: Failed with result 'exit-code'.
Sometimes it starts fine after a couple of tries. Sometimes it helps to remove the NIC but later on starting WITH the NIC works just fine.
Kinda at a loss here and would greatly appreciated any hints at where to go.
Thanks
Marie.