Hi
I have a 3 node cluster as development environment; all the nodes are basically 200GB RAM, 48 cores dell R620. I have found recently that in one of the nodes, OOM is killing always the same daemon running inside an lxc container and also a VM.
The host has a lot of empty ram, taking a look at the memory summary graph in WEBUI it spikes up to 100GB, while having nearly 80GB free ram.
The memory section of the OOM report shows this, but I am unable to make something of it apart from seeing that it has ran out of swap
Can anyone give me a hint? ti does not look to me as a lack of free ram, but free -hm shows the empty memory as shared
root@proxmox-1:~# free -hm
total used free shared buff/cache available
Mem: 173Gi 73Gi 961Mi 90Gi 98Gi 7.9Gi
Swap: 8.0Gi 3.2Gi 4.8Gi
OOM
Jan 20 04:13:50 proxmox-1 kernel: [86074.425454] Mem-Info:
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] active_anon:29519471 inactive_anon:14942755 isolated_anon:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] active_file:532 inactive_file:880 isolated_file:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] unevictable:46700 dirty:0 writeback:0 unstable:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] slab_reclaimable:229939 slab_unreclaimable:152941
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] mapped:15175352 shmem:22540992 pagetables:105914 bounce:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] free:121030 free_pcp:1231 free_cma:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425471] Node 0 active_anon:44693200kB inactive_anon:51957492kB active_file:316kB inactive_file:528kB unevictable:176632kB isolated(anon):0kB isolated(file):0kB mapped:47389748kB dirty:0kB writeback:0kB shmem:47394
372kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 22093824kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Jan 20 04:13:50 proxmox-1 kernel: [86074.425476] Node 1 active_anon:73384684kB inactive_anon:7813528kB active_file:1812kB inactive_file:2992kB unevictable:10168kB isolated(anon):0kB isolated(file):0kB mapped:13311660kB dirty:0kB writeback:0kB shmem:42769
596kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1910784kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jan 20 04:13:50 proxmox-1 kernel: [86074.425478] Node 0 DMA free:15896kB min:4kB low:16kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB managed:15896kB mlocked:0kB kernel_st
ack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425483] lowmem_reserve[]: 0 1882 96596 96596 96596
Jan 20 04:13:50 proxmox-1 kernel: [86074.425488] Node 0 DMA32 free:379812kB min:956kB low:2880kB high:4804kB active_anon:1571916kB inactive_anon:5156kB active_file:4kB inactive_file:0kB unevictable:0kB writepending:0kB present:2034624kB managed:1969088kB
mlocked:0kB kernel_stack:0kB pagetables:2536kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425493] lowmem_reserve[]: 0 0 94714 94714 94714
Jan 20 04:13:50 proxmox-1 kernel: [86074.425497] Node 0 Normal free:47548kB min:48160kB low:145144kB high:242128kB active_anon:43121284kB inactive_anon:51952336kB active_file:312kB inactive_file:528kB unevictable:176632kB writepending:0kB present:9856614
4kB managed:96987768kB mlocked:176632kB kernel_stack:9112kB pagetables:208888kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425502] lowmem_reserve[]: 0 0 0 0 0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425506] Node 1 Normal free:40864kB min:40984kB low:123520kB high:206056kB active_anon:73384684kB inactive_anon:7813528kB active_file:1812kB inactive_file:2992kB unevictable:10168kB writepending:0kB present:8388608
0kB managed:82544048kB mlocked:10168kB kernel_stack:9480kB pagetables:212232kB bounce:0kB free_pcp:4928kB local_pcp:252kB free_cma:0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425511] lowmem_reserve[]: 0 0 0 0 0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425514] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425526] Node 0 DMA32: 105*4kB (UME) 112*8kB (UMH) 124*16kB (UEH) 134*32kB (UE) 138*64kB (UMEH) 145*128kB (UME) 125*256kB (UEH) 117*512kB (UMEH) 75*1024kB (UEH) 30*2048kB (ME) 28*4096kB (M) = 379812kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425540] Node 0 Normal: 641*4kB (UM) 395*8kB (UMEH) 1676*16kB (UMEH) 469*32kB (UME) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47548kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425549] Node 1 Normal: 149*4kB (UMEH) 1899*8kB (UEH) 950*16kB (UEH) 276*32kB (UEH) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 39820kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425560] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425562] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425564] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425565] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425566] 22575802 total pagecache pages
Jan 20 04:13:50 proxmox-1 kernel: [86074.425570] 29334 pages in swap cache
Jan 20 04:13:50 proxmox-1 kernel: [86074.425571] Swap cache stats: add 3142908, delete 3113488, find 2096445/2333307
Jan 20 04:13:50 proxmox-1 kernel: [86074.425572] Free swap = 0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425573] Total swap = 8388604kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425574] 46125707 pages RAM
Jan 20 04:13:50 proxmox-1 kernel: [86074.425575] 0 pages HighMem/MovableOnly
Jan 20 04:13:50 proxmox-1 kernel: [86074.425576] 746507 pages reserved
Jan 20 04:13:50 proxmox-1 kernel: [86074.425576] 0 pages cma reserved
Jan 20 04:13:50 proxmox-1 kernel: [86074.425577] 0 pages hwpoisoned
I have a 3 node cluster as development environment; all the nodes are basically 200GB RAM, 48 cores dell R620. I have found recently that in one of the nodes, OOM is killing always the same daemon running inside an lxc container and also a VM.
The host has a lot of empty ram, taking a look at the memory summary graph in WEBUI it spikes up to 100GB, while having nearly 80GB free ram.
The memory section of the OOM report shows this, but I am unable to make something of it apart from seeing that it has ran out of swap
Can anyone give me a hint? ti does not look to me as a lack of free ram, but free -hm shows the empty memory as shared
root@proxmox-1:~# free -hm
total used free shared buff/cache available
Mem: 173Gi 73Gi 961Mi 90Gi 98Gi 7.9Gi
Swap: 8.0Gi 3.2Gi 4.8Gi
OOM
Jan 20 04:13:50 proxmox-1 kernel: [86074.425454] Mem-Info:
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] active_anon:29519471 inactive_anon:14942755 isolated_anon:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] active_file:532 inactive_file:880 isolated_file:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] unevictable:46700 dirty:0 writeback:0 unstable:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] slab_reclaimable:229939 slab_unreclaimable:152941
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] mapped:15175352 shmem:22540992 pagetables:105914 bounce:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425466] free:121030 free_pcp:1231 free_cma:0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425471] Node 0 active_anon:44693200kB inactive_anon:51957492kB active_file:316kB inactive_file:528kB unevictable:176632kB isolated(anon):0kB isolated(file):0kB mapped:47389748kB dirty:0kB writeback:0kB shmem:47394
372kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 22093824kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Jan 20 04:13:50 proxmox-1 kernel: [86074.425476] Node 1 active_anon:73384684kB inactive_anon:7813528kB active_file:1812kB inactive_file:2992kB unevictable:10168kB isolated(anon):0kB isolated(file):0kB mapped:13311660kB dirty:0kB writeback:0kB shmem:42769
596kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1910784kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jan 20 04:13:50 proxmox-1 kernel: [86074.425478] Node 0 DMA free:15896kB min:4kB low:16kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15980kB managed:15896kB mlocked:0kB kernel_st
ack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425483] lowmem_reserve[]: 0 1882 96596 96596 96596
Jan 20 04:13:50 proxmox-1 kernel: [86074.425488] Node 0 DMA32 free:379812kB min:956kB low:2880kB high:4804kB active_anon:1571916kB inactive_anon:5156kB active_file:4kB inactive_file:0kB unevictable:0kB writepending:0kB present:2034624kB managed:1969088kB
mlocked:0kB kernel_stack:0kB pagetables:2536kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425493] lowmem_reserve[]: 0 0 94714 94714 94714
Jan 20 04:13:50 proxmox-1 kernel: [86074.425497] Node 0 Normal free:47548kB min:48160kB low:145144kB high:242128kB active_anon:43121284kB inactive_anon:51952336kB active_file:312kB inactive_file:528kB unevictable:176632kB writepending:0kB present:9856614
4kB managed:96987768kB mlocked:176632kB kernel_stack:9112kB pagetables:208888kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425502] lowmem_reserve[]: 0 0 0 0 0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425506] Node 1 Normal free:40864kB min:40984kB low:123520kB high:206056kB active_anon:73384684kB inactive_anon:7813528kB active_file:1812kB inactive_file:2992kB unevictable:10168kB writepending:0kB present:8388608
0kB managed:82544048kB mlocked:10168kB kernel_stack:9480kB pagetables:212232kB bounce:0kB free_pcp:4928kB local_pcp:252kB free_cma:0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425511] lowmem_reserve[]: 0 0 0 0 0
Jan 20 04:13:50 proxmox-1 kernel: [86074.425514] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425526] Node 0 DMA32: 105*4kB (UME) 112*8kB (UMH) 124*16kB (UEH) 134*32kB (UE) 138*64kB (UMEH) 145*128kB (UME) 125*256kB (UEH) 117*512kB (UMEH) 75*1024kB (UEH) 30*2048kB (ME) 28*4096kB (M) = 379812kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425540] Node 0 Normal: 641*4kB (UM) 395*8kB (UMEH) 1676*16kB (UMEH) 469*32kB (UME) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 47548kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425549] Node 1 Normal: 149*4kB (UMEH) 1899*8kB (UEH) 950*16kB (UEH) 276*32kB (UEH) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 39820kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425560] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425562] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425564] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425565] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425566] 22575802 total pagecache pages
Jan 20 04:13:50 proxmox-1 kernel: [86074.425570] 29334 pages in swap cache
Jan 20 04:13:50 proxmox-1 kernel: [86074.425571] Swap cache stats: add 3142908, delete 3113488, find 2096445/2333307
Jan 20 04:13:50 proxmox-1 kernel: [86074.425572] Free swap = 0kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425573] Total swap = 8388604kB
Jan 20 04:13:50 proxmox-1 kernel: [86074.425574] 46125707 pages RAM
Jan 20 04:13:50 proxmox-1 kernel: [86074.425575] 0 pages HighMem/MovableOnly
Jan 20 04:13:50 proxmox-1 kernel: [86074.425576] 746507 pages reserved
Jan 20 04:13:50 proxmox-1 kernel: [86074.425576] 0 pages cma reserved
Jan 20 04:13:50 proxmox-1 kernel: [86074.425577] 0 pages hwpoisoned