Proxmox 4.4.5 kernel: Out of memory: Kill process 8543 (kvm) score or sacrifice child

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,216
1,327
164
Hi there!

We have the same issue on our server cluster.
Fabian - is this bug fixed in pve-kernel-4.4.35-2-pve that is available using dist-upgrade or it is better to install the version you built?
Thanks in advance!

like I said a couple posts above, the released kernel (pve-kernel-4.4.35-2-pve with version 4.4.35-78) is identical to the test kernel (pve-kernel-4.4.35-2-pve with version 4.4.35-78~test1) I posted earlier on in this thread - the only thing that changed is the version number.
 

snpz

Active Member
Mar 18, 2013
33
4
28
Sorry - a bit misunderstood :)
Thanks a lot - will make a kernel upgrade tonight.
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,216
1,327
164
Ubuntu decided to go a different route for fixing this issue, and cherry-picked a bigger patch series from more recent upstream kernels instead of reverting the original buggy cherry-pick. there is now a 4.4.44 based kernel which includes this series available via pvetest - it would be great if people that were affected by the original OOM issue could give it a test-drive to check for any potential regressions:

http://download.proxmox.com/debian/...4/pve-kernel-4.4.44-1-pve_4.4.44-83_amd64.deb
http://download.proxmox.com/debian/...4/pve-kernel-4.4.44-1-pve_4.4.44-83.changelog

thanks in advance!
 
  • Like
Reactions: 2beers

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,216
1,327
164
It is already on pve-no-subscription since Thursday, with no reports so far.
 
  • Like
Reactions: 2beers

plokker

New Member
Feb 15, 2017
3
0
1
26
It is already on pve-no-subscription since Thursday, with no reports so far.

Hello fabian, we have had the problem two consecutive days on two different nodes when executing the scheduled proxmox backup in two VM, we are in the kernel 4.4.44-1 with all the packages of proxmox updated.

Mar 28 03:13:06 i15 kernel: [1379745.854937] kvm invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Mar 28 03:13:06 i15 kernel: [1379745.854974] kvm cpuset=/ mems_allowed=0
Mar 28 03:13:06 i15 kernel: [1379745.854996] CPU: 5 PID: 3697 Comm: kvm Tainted: P IO 4.4.44-1-pve #1
Mar 28 03:13:06 i15 kernel: [1379745.855030] Hardware name: MSI MS-7522/MSI X58 Pro (MS-7522) , BIOS V8.14B8 11/09/2012
Mar 28 03:13:06 i15 kernel: [1379745.855067] 0000000000000286 00000000a88c1d71 ffff88059b8efb70 ffffffff813fa0d3
Mar 28 03:13:06 i15 kernel: [1379745.855105] ffff88059b8efd40 0000000000000000 ffff88059b8efbd8 ffffffff8120b23b
Mar 28 03:13:06 i15 kernel: [1379745.855143] 0000000000000fdf 0000000000000000 0000000000000000 0000000000000000
Mar 28 03:13:06 i15 kernel: [1379745.855181] Call Trace:
Mar 28 03:13:06 i15 kernel: [1379745.855201] [<ffffffff813fa0d3>] dump_stack+0x63/0x90
Mar 28 03:13:06 i15 kernel: [1379745.855222] [<ffffffff8120b23b>] dump_header+0x67/0x1d5
Mar 28 03:13:06 i15 kernel: [1379745.855244] [<ffffffff81192785>] oom_kill_process+0x205/0x3c0
Mar 28 03:13:06 i15 kernel: [1379745.855266] [<ffffffff81192bd7>] out_of_memory+0x237/0x4a0
Mar 28 03:13:06 i15 kernel: [1379745.855287] [<ffffffff81198ef8>] __alloc_pages_nodemask+0xcc8/0xe80
Mar 28 03:13:06 i15 kernel: [1379745.855309] [<ffffffff811990fb>] alloc_kmem_pages_node+0x4b/0xd0
Mar 28 03:13:06 i15 kernel: [1379745.855332] [<ffffffff8107f053>] copy_process+0x1c3/0x1c00
Mar 28 03:13:06 i15 kernel: [1379745.855354] [<ffffffff812240a0>] ? poll_select_copy_remaining+0x140/0x140
Mar 28 03:13:06 i15 kernel: [1379745.855378] [<ffffffff8125cd47>] ? eventfd_ctx_read+0x67/0x240
Mar 28 03:13:06 i15 kernel: [1379745.855400] [<ffffffff810ace80>] ? wake_up_q+0x70/0x70
Mar 28 03:13:06 i15 kernel: [1379745.855420] [<ffffffff81080c20>] _do_fork+0x80/0x360
Mar 28 03:13:06 i15 kernel: [1379745.855442] [<ffffffff8109183f>] ? sigprocmask+0x6f/0xa0
Mar 28 03:13:06 i15 kernel: [1379745.855462] [<ffffffff81080fa9>] SyS_clone+0x19/0x20
Mar 28 03:13:06 i15 kernel: [1379745.855484] [<ffffffff818602f6>] entry_SYSCALL_64_fastpath+0x16/0x75
Mar 28 03:13:06 i15 kernel: [1379745.855506] Mem-Info:
Mar 28 03:13:06 i15 kernel: [1379745.855523] active_anon:3511355 inactive_anon:393624 isolated_anon:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] active_file:960940 inactive_file:972546 isolated_file:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] unevictable:4294 dirty:23 writeback:43585 unstable:68858
Mar 28 03:13:06 i15 kernel: [1379745.855523] slab_reclaimable:110424 slab_unreclaimable:89953
Mar 28 03:13:06 i15 kernel: [1379745.855523] mapped:29862 shmem:120464 pagetables:14066 bounce:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] free:42455 free_pcp:0 free_cma:0
Mar 28 03:13:06 i15 kernel: [1379745.855639] Node 0 DMA free:15896kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated
(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetable
s:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 28 03:13:06 i15 kernel: [1379745.855766] lowmem_reserve[]: 0 2941 24059 24059 24059
Mar 28 03:13:06 i15 kernel: [1379745.855793] Node 0 DMA32 free:92644kB min:8256kB low:10320kB high:12384kB active_anon:1150416kB inactive_anon:385980kB active_file:628048kB inactive_file:64
3052kB unevictable:2460kB isolated(anon):0kB isolated(file):0kB present:3120640kB managed:3039712kB mlocked:2460kB dirty:0kB writeback:16836kB mapped:14044kB shmem:64488kB slab_reclaimable:
59892kB slab_unreclaimable:42760kB kernel_stack:2352kB pagetables:4604kB unstable:27292kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaima
ble? no
Mar 28 03:13:06 i15 kernel: [1379745.855939] lowmem_reserve[]: 0 0 21117 21117 21117
Mar 28 03:13:06 i15 kernel: [1379745.855965] Node 0 Normal free:61280kB min:59280kB low:74100kB high:88920kB active_anon:12895004kB inactive_anon:1188516kB active_file:3215712kB inactive_fi
le:3247132kB unevictable:14716kB isolated(anon):0kB isolated(file):0kB present:22020096kB managed:21624484kB mlocked:14716kB dirty:92kB writeback:157504kB mapped:105404kB shmem:417368kB sla
b_reclaimable:381804kB slab_unreclaimable:317052kB kernel_stack:5504kB pagetables:51660kB unstable:248140kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanne
d:0 all_unreclaimable? no
Mar 28 03:13:06 i15 kernel: [1379745.856129] lowmem_reserve[]: 0 0 0 0 0
Mar 28 03:13:06 i15 kernel: [1379745.856154] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
Mar 28 03:13:06 i15 kernel: [1379745.856221] Node 0 DMA32: 461*4kB (UME) 11415*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 93164kB
Mar 28 03:13:06 i15 kernel: [1379745.856280] Node 0 Normal: 1818*4kB (UMEH) 6657*8kB (UMH) 23*16kB (H) 11*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62016k
B
Mar 28 03:13:06 i15 kernel: [1379745.856358] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 28 03:13:06 i15 kernel: [1379745.856393] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 28 03:13:06 i15 kernel: [1379745.856428] 2068535 total pagecache pages
Mar 28 03:13:06 i15 kernel: [1379745.856446] 13470 pages in swap cache
Mar 28 03:13:06 i15 kernel: [1379745.856463] Swap cache stats: add 321655, delete 308185, find 14272247/14288282
Mar 28 03:13:06 i15 kernel: [1379745.856496] Free swap = 3108276kB
Mar 28 03:13:06 i15 kernel: [1379745.856513] Total swap = 4194300kB
Mar 28 03:13:06 i15 kernel: [1379745.856530] 6289179 pages RAM
Mar 28 03:13:06 i15 kernel: [1379745.856546] 0 pages HighMem/MovableOnly
Mar 28 03:13:06 i15 kernel: [1379745.856564] 119156 pages reserved
Mar 28 03:13:06 i15 kernel: [1379745.856581] 0 pages cma reserved
Mar 28 03:13:06 i15 kernel: [1379745.856598] 0 pages hwpoisoned
Mar 28 03:13:06 i15 kernel: [1379745.856615] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Mar 28 03:13:06 i15 kernel: [1379745.856653] [ 1506] 0 1506 27745 16857 61 3 36 0 systemd-journal
Mar 28 03:13:06 i15 kernel: [1379745.856690] [ 1515] 0 1515 10451 557 22 3 248 -1000 systemd-udevd
Mar 28 03:13:06 i15 kernel: [1379745.856729] [ 1968] 0 1968 27563 4277 28 3 0 -1000 dmeventd
Mar 28 03:13:06 i15 kernel: [1379745.856765] [ 2084] 105 2084 25011 527 21 3 46 0 systemd-timesyn
Mar 28 03:13:06 i15 kernel: [1379745.856803] [ 2452] 0 2452 9270 595 23 3 66 0 rpcbind
Mar 28 03:13:06 i15 kernel: [1379745.856839] [ 2485] 103 2485 9320 532 23 3 147 0 rpc.statd
Mar 28 03:13:06 i15 kernel: [1379745.856875] [ 2499] 0 2499 5839 275 16 3 52 0 rpc.idmapd
Mar 28 03:13:06 i15 kernel: [1379745.856911] [ 2931] 0 2931 3015 486 12 3 633 0 haveged
Mar 28 03:13:06 i15 kernel: [1379745.856947] [ 2933] 0 2933 4756 383 15 3 37 0 atd
Mar 28 03:13:06 i15 kernel: [1379745.856982] [ 2935] 0 2935 13796 1219 31 3 138 -1000 sshd
Mar 28 03:13:06 i15 kernel: [1379745.857018] [ 2937] 0 2937 3790 378 13 3 38 0 cgmanager
Mar 28 03:13:06 i15 kernel: [1379745.857054] [ 2939] 0 2939 40192 334 14 4 52 0 lxcfs
Mar 28 03:13:06 i15 kernel: [1379745.857092] [ 2943] 0 2943 1022 137 6 3 18 -1000 watchdog-mux
Mar 28 03:13:06 i15 kernel: [1379745.857129] [ 2955] 0 2955 8389 202 22 3 71 0 zed
Mar 28 03:13:06 i15 kernel: [1379745.857164] [ 2959] 0 2959 6086 561 16 3 123 0 smartd
Mar 28 03:13:06 i15 kernel: [1379745.857202] [ 2963] 0 2963 7089 581 20 3 38 0 systemd-logind
Mar 28 03:13:06 i15 kernel: [1379745.857239] [ 2998] 102 2998 10560 578 25 3 65 -900 dbus-daemon
Mar 28 03:13:06 i15 kernel: [1379745.857275] [ 3024] 0 3024 153293 651 61 4 116 0 rrdcached
Mar 28 03:13:06 i15 kernel: [1379745.857311] [ 3039] 0 3039 6096 663 16 3 39 0 ksmtuned
Mar 28 03:13:06 i15 kernel: [1379745.857347] [ 3058] 0 3058 3949 920 13 3 55 0 tincd
Mar 28 03:13:06 i15 kernel: [1379745.857383] [ 3061] 0 3061 5260 0 13 3 42 0 daemon
Mar 28 03:13:06 i15 kernel: [1379745.857418] [ 3062] 0 3062 1084 391 8 4 18 0 aacraid-statusd
Mar 28 03:13:06 i15 kernel: [1379745.857456] [ 3105] 111 3105 8346 898 21 3 117 0 ntpd
Mar 28 03:13:06 i15 kernel: [1379745.857491] [ 3106] 112 3106 5866 453 15 3 121 0 nrpe
Mar 28 03:13:06 i15 kernel: [1379745.857526] [ 3131] 0 3131 64668 794 28 4 122 0 rsyslogd
Mar 28 03:13:06 i15 kernel: [1379745.857562] [ 3147] 0 3147 4213 407 12 3 32 0 agetty
Mar 28 03:13:06 i15 kernel: [1379745.857598] [ 3176] 0 3176 10127 375 25 3 119 0 lxc-monitord
Mar 28 03:13:06 i15 kernel: [1379745.857634] [ 3189] 0 3189 184001 14808 135 3 710 0 pmxcfs
Mar 28 03:13:06 i15 kernel: [1379745.857670] [ 3229] 0 3229 9042 847 22 4 116 0 master
Mar 28 03:13:06 i15 kernel: [1379745.857705] [ 3231] 110 3231 9598 814 23 3 73 0 qmgr
Mar 28 03:13:06 i15 kernel: [1379745.857741] [ 3249] 0 3249 7485 619 20 3 35 0 cron
Mar 28 03:13:06 i15 kernel: [1379745.857777] [ 3288] 0 3288 65021 6306 122 3 10188 0 pve-firewall
Mar 28 03:13:06 i15 kernel: [1379745.857813] [ 3297] 0 3297 64607 7453 121 3 8736 0 pvestatd
Mar 28 03:13:06 i15 kernel: [1379745.857849] [ 3301] 0 3301 81769 1349 146 3 20777 0 pvedaemon
Mar 28 03:13:06 i15 kernel: [1379745.857886] [ 3302] 0 3302 84327 2453 156 3 20605 0 pvedaemon worke
Mar 28 03:13:06 i15 kernel: [1379745.857923] [ 3303] 0 3303 84393 5664 156 3 17451 0 pvedaemon worke
Mar 28 03:13:06 i15 kernel: [1379745.857959] [ 3304] 0 3304 84319 5567 156 3 17513 0 pvedaemon worke
Mar 28 03:13:06 i15 kernel: [1379745.857996] [ 3309] 0 3309 66953 2474 124 4 15912 0 pve-ha-crm
Mar 28 03:13:06 i15 kernel: [1379745.858032] [ 3321] 0 3321 66854 6752 126 4 11576 0 pve-ha-lrm
Mar 28 03:13:06 i15 kernel: [1379745.858069] [ 3328] 0 3328 7292 1073 19 3 74 0 openvpn
Mar 28 03:13:06 i15 kernel: [1379745.858104] [ 3410] 0 3410 1019402 717533 1910 7 93702 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.858143] [ 3477] 0 3477 1305104 1038668 2447 9 32191 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859820] [ 3519] 0 3519 1301016 1049680 2424 8 15163 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859857] [ 3587] 0 3587 755209 544031 1397 8 6674 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859893] [ 3628] 0 3628 1014285 335006 1004 8 15182 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859928] [ 3697] 0 3697 765451 540862 1418 7 8959 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859965] [30011] 0 30011 100062 38994 162 3 0 0 corosync
Mar 28 03:13:06 i15 kernel: [1379745.860001] [ 3099] 33 3099 83456 22267 149 3 0 0 pveproxy
Mar 28 03:13:06 i15 kernel: [1379745.860037] [ 3100] 33 3100 84041 22977 156 3 0 0 pveproxy worker
Mar 28 03:13:06 i15 kernel: [1379745.860074] [ 3101] 33 3101 84041 23012 156 3 0 0 pveproxy worker
Mar 28 03:13:06 i15 kernel: [1379745.860111] [ 3102] 33 3102 84041 22984 156 3 0 0 pveproxy worker
Mar 28 03:13:06 i15 kernel: [1379745.860148] [ 3124] 33 3124 82665 21998 148 3 0 0 spiceproxy
Mar 28 03:13:06 i15 kernel: [1379745.860184] [ 3125] 33 3125 83357 22550 154 3 0 0 spiceproxy work
Mar 28 03:13:06 i15 kernel: [1379745.860221] [ 3153] 0 3153 22988 307 16 3 0 0 pvefw-logger
Mar 28 03:13:06 i15 kernel: [1379745.860258] [25571] 110 25571 9558 970 23 3 0 0 pickup
Mar 28 03:13:06 i15 kernel: [1379745.860294] [ 4402] 0 4402 13806 683 31 3 10 0 cron
Mar 28 03:13:06 i15 kernel: [1379745.860330] [ 4403] 0 4403 1084 188 8 3 0 0 sh
Mar 28 03:13:06 i15 kernel: [1379745.860365] [ 4404] 0 4404 66350 19606 132 3 0 0 vzdump
Mar 28 03:13:06 i15 kernel: [1379745.860401] [ 4420] 0 4420 67116 18793 127 3 0 0 task UPID:i15:0
Mar 28 03:13:06 i15 kernel: [1379745.860440] [12732] 0 12732 1156 350 8 3 0 0 gzip
Mar 28 03:13:06 i15 kernel: [1379745.860476] [12959] 0 12959 2061 171 10 3 0 0 sleep
Mar 28 03:13:06 i15 kernel: [1379745.860512] [13130] 0 13130 2061 175 10 3 0 0 sleep
Mar 28 03:13:06 i15 kernel: [1379745.860548] [13187] 0 13187 22982 1500 49 3 0 0 sshd
Mar 28 03:13:06 i15 kernel: [1379745.860584] [13188] 104 13188 14132 795 30 3 0 0 sshd
Mar 28 03:13:06 i15 kernel: [1379745.860620] [13189] 0 13189 14132 1349 33 3 0 0 sshd
Mar 28 03:13:06 i15 kernel: [1379745.860655] Out of memory: Kill process 3477 (kvm) score 144 or sacrifice child
Mar 28 03:13:06 i15 kernel: [1379745.860716] Killed process 3477 (kvm) total-vm:5220416kB, anon-rss:4149860kB, file-rss:4812kB
 
Last edited:

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,216
1,327
164
Hello fabian, we have had the problem two consecutive days on two different nodes when executing the scheduled proxmox backup in two VM, we are in the kernel 4.4.44-1 with all the packages of proxmox updated.

Mar 28 03:13:06 i15 kernel: [1379745.854937] kvm invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
Mar 28 03:13:06 i15 kernel: [1379745.854974] kvm cpuset=/ mems_allowed=0
Mar 28 03:13:06 i15 kernel: [1379745.854996] CPU: 5 PID: 3697 Comm: kvm Tainted: P IO 4.4.44-1-pve #1
Mar 28 03:13:06 i15 kernel: [1379745.855030] Hardware name: MSI MS-7522/MSI X58 Pro (MS-7522) , BIOS V8.14B8 11/09/2012
Mar 28 03:13:06 i15 kernel: [1379745.855067] 0000000000000286 00000000a88c1d71 ffff88059b8efb70 ffffffff813fa0d3
Mar 28 03:13:06 i15 kernel: [1379745.855105] ffff88059b8efd40 0000000000000000 ffff88059b8efbd8 ffffffff8120b23b
Mar 28 03:13:06 i15 kernel: [1379745.855143] 0000000000000fdf 0000000000000000 0000000000000000 0000000000000000
Mar 28 03:13:06 i15 kernel: [1379745.855181] Call Trace:
Mar 28 03:13:06 i15 kernel: [1379745.855201] [<ffffffff813fa0d3>] dump_stack+0x63/0x90
Mar 28 03:13:06 i15 kernel: [1379745.855222] [<ffffffff8120b23b>] dump_header+0x67/0x1d5
Mar 28 03:13:06 i15 kernel: [1379745.855244] [<ffffffff81192785>] oom_kill_process+0x205/0x3c0
Mar 28 03:13:06 i15 kernel: [1379745.855266] [<ffffffff81192bd7>] out_of_memory+0x237/0x4a0
Mar 28 03:13:06 i15 kernel: [1379745.855287] [<ffffffff81198ef8>] __alloc_pages_nodemask+0xcc8/0xe80
Mar 28 03:13:06 i15 kernel: [1379745.855309] [<ffffffff811990fb>] alloc_kmem_pages_node+0x4b/0xd0
Mar 28 03:13:06 i15 kernel: [1379745.855332] [<ffffffff8107f053>] copy_process+0x1c3/0x1c00
Mar 28 03:13:06 i15 kernel: [1379745.855354] [<ffffffff812240a0>] ? poll_select_copy_remaining+0x140/0x140
Mar 28 03:13:06 i15 kernel: [1379745.855378] [<ffffffff8125cd47>] ? eventfd_ctx_read+0x67/0x240
Mar 28 03:13:06 i15 kernel: [1379745.855400] [<ffffffff810ace80>] ? wake_up_q+0x70/0x70
Mar 28 03:13:06 i15 kernel: [1379745.855420] [<ffffffff81080c20>] _do_fork+0x80/0x360
Mar 28 03:13:06 i15 kernel: [1379745.855442] [<ffffffff8109183f>] ? sigprocmask+0x6f/0xa0
Mar 28 03:13:06 i15 kernel: [1379745.855462] [<ffffffff81080fa9>] SyS_clone+0x19/0x20
Mar 28 03:13:06 i15 kernel: [1379745.855484] [<ffffffff818602f6>] entry_SYSCALL_64_fastpath+0x16/0x75
Mar 28 03:13:06 i15 kernel: [1379745.855506] Mem-Info:
Mar 28 03:13:06 i15 kernel: [1379745.855523] active_anon:3511355 inactive_anon:393624 isolated_anon:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] active_file:960940 inactive_file:972546 isolated_file:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] unevictable:4294 dirty:23 writeback:43585 unstable:68858
Mar 28 03:13:06 i15 kernel: [1379745.855523] slab_reclaimable:110424 slab_unreclaimable:89953
Mar 28 03:13:06 i15 kernel: [1379745.855523] mapped:29862 shmem:120464 pagetables:14066 bounce:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] free:42455 free_pcp:0 free_cma:0
Mar 28 03:13:06 i15 kernel: [1379745.855639] Node 0 DMA free:15896kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated
(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetable
s:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 28 03:13:06 i15 kernel: [1379745.855766] lowmem_reserve[]: 0 2941 24059 24059 24059
Mar 28 03:13:06 i15 kernel: [1379745.855793] Node 0 DMA32 free:92644kB min:8256kB low:10320kB high:12384kB active_anon:1150416kB inactive_anon:385980kB active_file:628048kB inactive_file:64
3052kB unevictable:2460kB isolated(anon):0kB isolated(file):0kB present:3120640kB managed:3039712kB mlocked:2460kB dirty:0kB writeback:16836kB mapped:14044kB shmem:64488kB slab_reclaimable:
59892kB slab_unreclaimable:42760kB kernel_stack:2352kB pagetables:4604kB unstable:27292kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaima
ble? no
Mar 28 03:13:06 i15 kernel: [1379745.855939] lowmem_reserve[]: 0 0 21117 21117 21117
Mar 28 03:13:06 i15 kernel: [1379745.855965] Node 0 Normal free:61280kB min:59280kB low:74100kB high:88920kB active_anon:12895004kB inactive_anon:1188516kB active_file:3215712kB inactive_fi
le:3247132kB unevictable:14716kB isolated(anon):0kB isolated(file):0kB present:22020096kB managed:21624484kB mlocked:14716kB dirty:92kB writeback:157504kB mapped:105404kB shmem:417368kB sla
b_reclaimable:381804kB slab_unreclaimable:317052kB kernel_stack:5504kB pagetables:51660kB unstable:248140kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanne
d:0 all_unreclaimable? no
Mar 28 03:13:06 i15 kernel: [1379745.856129] lowmem_reserve[]: 0 0 0 0 0
Mar 28 03:13:06 i15 kernel: [1379745.856154] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
Mar 28 03:13:06 i15 kernel: [1379745.856221] Node 0 DMA32: 461*4kB (UME) 11415*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 93164kB
Mar 28 03:13:06 i15 kernel: [1379745.856280] Node 0 Normal: 1818*4kB (UMEH) 6657*8kB (UMH) 23*16kB (H) 11*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62016k
B
Mar 28 03:13:06 i15 kernel: [1379745.856358] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Mar 28 03:13:06 i15 kernel: [1379745.856393] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Mar 28 03:13:06 i15 kernel: [1379745.856428] 2068535 total pagecache pages
Mar 28 03:13:06 i15 kernel: [1379745.856446] 13470 pages in swap cache
Mar 28 03:13:06 i15 kernel: [1379745.856463] Swap cache stats: add 321655, delete 308185, find 14272247/14288282
Mar 28 03:13:06 i15 kernel: [1379745.856496] Free swap = 3108276kB
Mar 28 03:13:06 i15 kernel: [1379745.856513] Total swap = 4194300kB
Mar 28 03:13:06 i15 kernel: [1379745.856530] 6289179 pages RAM
Mar 28 03:13:06 i15 kernel: [1379745.856546] 0 pages HighMem/MovableOnly
Mar 28 03:13:06 i15 kernel: [1379745.856564] 119156 pages reserved
Mar 28 03:13:06 i15 kernel: [1379745.856581] 0 pages cma reserved
Mar 28 03:13:06 i15 kernel: [1379745.856598] 0 pages hwpoisoned
Mar 28 03:13:06 i15 kernel: [1379745.856615] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Mar 28 03:13:06 i15 kernel: [1379745.856653] [ 1506] 0 1506 27745 16857 61 3 36 0 systemd-journal
Mar 28 03:13:06 i15 kernel: [1379745.856690] [ 1515] 0 1515 10451 557 22 3 248 -1000 systemd-udevd
Mar 28 03:13:06 i15 kernel: [1379745.856729] [ 1968] 0 1968 27563 4277 28 3 0 -1000 dmeventd
Mar 28 03:13:06 i15 kernel: [1379745.856765] [ 2084] 105 2084 25011 527 21 3 46 0 systemd-timesyn
Mar 28 03:13:06 i15 kernel: [1379745.856803] [ 2452] 0 2452 9270 595 23 3 66 0 rpcbind
Mar 28 03:13:06 i15 kernel: [1379745.856839] [ 2485] 103 2485 9320 532 23 3 147 0 rpc.statd
Mar 28 03:13:06 i15 kernel: [1379745.856875] [ 2499] 0 2499 5839 275 16 3 52 0 rpc.idmapd
Mar 28 03:13:06 i15 kernel: [1379745.856911] [ 2931] 0 2931 3015 486 12 3 633 0 haveged
Mar 28 03:13:06 i15 kernel: [1379745.856947] [ 2933] 0 2933 4756 383 15 3 37 0 atd
Mar 28 03:13:06 i15 kernel: [1379745.856982] [ 2935] 0 2935 13796 1219 31 3 138 -1000 sshd
Mar 28 03:13:06 i15 kernel: [1379745.857018] [ 2937] 0 2937 3790 378 13 3 38 0 cgmanager
Mar 28 03:13:06 i15 kernel: [1379745.857054] [ 2939] 0 2939 40192 334 14 4 52 0 lxcfs
Mar 28 03:13:06 i15 kernel: [1379745.857092] [ 2943] 0 2943 1022 137 6 3 18 -1000 watchdog-mux
Mar 28 03:13:06 i15 kernel: [1379745.857129] [ 2955] 0 2955 8389 202 22 3 71 0 zed
Mar 28 03:13:06 i15 kernel: [1379745.857164] [ 2959] 0 2959 6086 561 16 3 123 0 smartd
Mar 28 03:13:06 i15 kernel: [1379745.857202] [ 2963] 0 2963 7089 581 20 3 38 0 systemd-logind
Mar 28 03:13:06 i15 kernel: [1379745.857239] [ 2998] 102 2998 10560 578 25 3 65 -900 dbus-daemon
Mar 28 03:13:06 i15 kernel: [1379745.857275] [ 3024] 0 3024 153293 651 61 4 116 0 rrdcached
Mar 28 03:13:06 i15 kernel: [1379745.857311] [ 3039] 0 3039 6096 663 16 3 39 0 ksmtuned
Mar 28 03:13:06 i15 kernel: [1379745.857347] [ 3058] 0 3058 3949 920 13 3 55 0 tincd
Mar 28 03:13:06 i15 kernel: [1379745.857383] [ 3061] 0 3061 5260 0 13 3 42 0 daemon
Mar 28 03:13:06 i15 kernel: [1379745.857418] [ 3062] 0 3062 1084 391 8 4 18 0 aacraid-statusd
Mar 28 03:13:06 i15 kernel: [1379745.857456] [ 3105] 111 3105 8346 898 21 3 117 0 ntpd
Mar 28 03:13:06 i15 kernel: [1379745.857491] [ 3106] 112 3106 5866 453 15 3 121 0 nrpe
Mar 28 03:13:06 i15 kernel: [1379745.857526] [ 3131] 0 3131 64668 794 28 4 122 0 rsyslogd
Mar 28 03:13:06 i15 kernel: [1379745.857562] [ 3147] 0 3147 4213 407 12 3 32 0 agetty
Mar 28 03:13:06 i15 kernel: [1379745.857598] [ 3176] 0 3176 10127 375 25 3 119 0 lxc-monitord
Mar 28 03:13:06 i15 kernel: [1379745.857634] [ 3189] 0 3189 184001 14808 135 3 710 0 pmxcfs
Mar 28 03:13:06 i15 kernel: [1379745.857670] [ 3229] 0 3229 9042 847 22 4 116 0 master
Mar 28 03:13:06 i15 kernel: [1379745.857705] [ 3231] 110 3231 9598 814 23 3 73 0 qmgr
Mar 28 03:13:06 i15 kernel: [1379745.857741] [ 3249] 0 3249 7485 619 20 3 35 0 cron
Mar 28 03:13:06 i15 kernel: [1379745.857777] [ 3288] 0 3288 65021 6306 122 3 10188 0 pve-firewall
Mar 28 03:13:06 i15 kernel: [1379745.857813] [ 3297] 0 3297 64607 7453 121 3 8736 0 pvestatd
Mar 28 03:13:06 i15 kernel: [1379745.857849] [ 3301] 0 3301 81769 1349 146 3 20777 0 pvedaemon
Mar 28 03:13:06 i15 kernel: [1379745.857886] [ 3302] 0 3302 84327 2453 156 3 20605 0 pvedaemon worke
Mar 28 03:13:06 i15 kernel: [1379745.857923] [ 3303] 0 3303 84393 5664 156 3 17451 0 pvedaemon worke
Mar 28 03:13:06 i15 kernel: [1379745.857959] [ 3304] 0 3304 84319 5567 156 3 17513 0 pvedaemon worke
Mar 28 03:13:06 i15 kernel: [1379745.857996] [ 3309] 0 3309 66953 2474 124 4 15912 0 pve-ha-crm
Mar 28 03:13:06 i15 kernel: [1379745.858032] [ 3321] 0 3321 66854 6752 126 4 11576 0 pve-ha-lrm
Mar 28 03:13:06 i15 kernel: [1379745.858069] [ 3328] 0 3328 7292 1073 19 3 74 0 openvpn
Mar 28 03:13:06 i15 kernel: [1379745.858104] [ 3410] 0 3410 1019402 717533 1910 7 93702 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.858143] [ 3477] 0 3477 1305104 1038668 2447 9 32191 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859820] [ 3519] 0 3519 1301016 1049680 2424 8 15163 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859857] [ 3587] 0 3587 755209 544031 1397 8 6674 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859893] [ 3628] 0 3628 1014285 335006 1004 8 15182 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859928] [ 3697] 0 3697 765451 540862 1418 7 8959 0 kvm
Mar 28 03:13:06 i15 kernel: [1379745.859965] [30011] 0 30011 100062 38994 162 3 0 0 corosync
Mar 28 03:13:06 i15 kernel: [1379745.860001] [ 3099] 33 3099 83456 22267 149 3 0 0 pveproxy
Mar 28 03:13:06 i15 kernel: [1379745.860037] [ 3100] 33 3100 84041 22977 156 3 0 0 pveproxy worker
Mar 28 03:13:06 i15 kernel: [1379745.860074] [ 3101] 33 3101 84041 23012 156 3 0 0 pveproxy worker
Mar 28 03:13:06 i15 kernel: [1379745.860111] [ 3102] 33 3102 84041 22984 156 3 0 0 pveproxy worker
Mar 28 03:13:06 i15 kernel: [1379745.860148] [ 3124] 33 3124 82665 21998 148 3 0 0 spiceproxy
Mar 28 03:13:06 i15 kernel: [1379745.860184] [ 3125] 33 3125 83357 22550 154 3 0 0 spiceproxy work
Mar 28 03:13:06 i15 kernel: [1379745.860221] [ 3153] 0 3153 22988 307 16 3 0 0 pvefw-logger
Mar 28 03:13:06 i15 kernel: [1379745.860258] [25571] 110 25571 9558 970 23 3 0 0 pickup
Mar 28 03:13:06 i15 kernel: [1379745.860294] [ 4402] 0 4402 13806 683 31 3 10 0 cron
Mar 28 03:13:06 i15 kernel: [1379745.860330] [ 4403] 0 4403 1084 188 8 3 0 0 sh
Mar 28 03:13:06 i15 kernel: [1379745.860365] [ 4404] 0 4404 66350 19606 132 3 0 0 vzdump
Mar 28 03:13:06 i15 kernel: [1379745.860401] [ 4420] 0 4420 67116 18793 127 3 0 0 task UPID:i15:0
Mar 28 03:13:06 i15 kernel: [1379745.860440] [12732] 0 12732 1156 350 8 3 0 0 gzip
Mar 28 03:13:06 i15 kernel: [1379745.860476] [12959] 0 12959 2061 171 10 3 0 0 sleep
Mar 28 03:13:06 i15 kernel: [1379745.860512] [13130] 0 13130 2061 175 10 3 0 0 sleep
Mar 28 03:13:06 i15 kernel: [1379745.860548] [13187] 0 13187 22982 1500 49 3 0 0 sshd
Mar 28 03:13:06 i15 kernel: [1379745.860584] [13188] 104 13188 14132 795 30 3 0 0 sshd
Mar 28 03:13:06 i15 kernel: [1379745.860620] [13189] 0 13189 14132 1349 33 3 0 0 sshd
Mar 28 03:13:06 i15 kernel: [1379745.860655] Out of memory: Kill process 3477 (kvm) score 144 or sacrifice child
Mar 28 03:13:06 i15 kernel: [1379745.860716] Killed process 3477 (kvm) total-vm:5220416kB, anon-rss:4149860kB, file-rss:4812kB

but that looks like an actual OOM situation to me? do you have atop or similar resource monitoring tools running that could give you more insight into the memory situation before the OOM-killer got triggered?
 

mbarchein

Member
May 26, 2015
14
2
23
but that looks like an actual OOM situation to me? do you have atop or similar resource monitoring tools running that could give you more insight into the memory situation before the OOM-killer got triggered?

Hello, I'm Plokker's colleague. In the attached image you can see the RAM graph for that host. The installed RAM is 24GB. You can see the killing at the marked point. That VM has 4GB of RAM assigned. Thanks.
mem.png
 

mbarchein

Member
May 26, 2015
14
2
23
Hello, I'm Plokker's colleague. In the attached image you can see the RAM graph for that host. The installed RAM is 24GB. You can see the killing at the marked point. That VM has 4GB of RAM assigned. Thanks.
View attachment 5028
The actual "free" output is:


$ free -h
total used free shared buffers cached
Mem: 23G 22G 607M 445M 8,4G 1,3G
-/+ buffers/cache: 13G 10G
Swap: 4,0G 783M 3,2G
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,216
1,327
164
Code:
Mar 28 03:13:06 i15 kernel: [1379745.855506] Mem-Info:
Mar 28 03:13:06 i15 kernel: [1379745.855523] active_anon:3511355 inactive_anon:393624 isolated_anon:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] active_file:960940 inactive_file:972546 isolated_file:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] unevictable:4294 dirty:23 writeback:43585 unstable:68858
Mar 28 03:13:06 i15 kernel: [1379745.855523] slab_reclaimable:110424 slab_unreclaimable:89953
Mar 28 03:13:06 i15 kernel: [1379745.855523] mapped:29862 shmem:120464 pagetables:14066 bounce:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] free:42455 free_pcp:0 free_cma:0

Mar 28 03:13:06 i15 kernel: [1379745.855639] Node 0 DMA free:15896kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated
(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetable
s:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 28 03:13:06 i15 kernel: [1379745.855766] lowmem_reserve[]: 0 2941 24059 24059 24059
Mar 28 03:13:06 i15 kernel: [1379745.855793] Node 0 DMA32 free:92644kB min:8256kB low:10320kB high:12384kB active_anon:1150416kB inactive_anon:385980kB active_file:628048kB inactive_file:64
3052kB unevictable:2460kB isolated(anon):0kB isolated(file):0kB present:3120640kB managed:3039712kB mlocked:2460kB dirty:0kB writeback:16836kB mapped:14044kB shmem:64488kB slab_reclaimable:
59892kB slab_unreclaimable:42760kB kernel_stack:2352kB pagetables:4604kB unstable:27292kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaima
ble? no
Mar 28 03:13:06 i15 kernel: [1379745.855939] lowmem_reserve[]: 0 0 21117 21117 21117
Mar 28 03:13:06 i15 kernel: [1379745.855965] Node 0 Normal free:61280kB min:59280kB low:74100kB high:88920kB active_anon:12895004kB inactive_anon:1188516kB active_file:3215712kB inactive_fi
le:3247132kB unevictable:14716kB isolated(anon):0kB isolated(file):0kB present:22020096kB managed:21624484kB mlocked:14716kB dirty:92kB writeback:157504kB mapped:105404kB shmem:417368kB sla
b_reclaimable:381804kB slab_unreclaimable:317052kB kernel_stack:5504kB pagetables:51660kB unstable:248140kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanne
d:0 all_unreclaimable? no
Mar 28 03:13:06 i15 kernel: [1379745.856129] lowmem_reserve[]: 0 0 0 0 0
Mar 28 03:13:06 i15 kernel: [1379745.856154] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
Mar 28 03:13:06 i15 kernel: [1379745.856221] Node 0 DMA32: 461*4kB (UME) 11415*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 93164kB
Mar 28 03:13:06 i15 kernel: [1379745.856280] Node 0 Normal: 1818*4kB (UMEH) 6657*8kB (UMH) 23*16kB (H) 11*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62016k

says you had about 170M actually free at the time of the OOM kill (note that OOM situations can happen rather fast..), but it seems like you still had cached stuff which should have been evicted. you are below the low watermark for "normal" memory though (which is basically the opposite of what the original reports had). what kind of storage are you using on this machine? are you backing up running VMs, or also stopped ones?
 

mbarchein

Member
May 26, 2015
14
2
23
Code:
Mar 28 03:13:06 i15 kernel: [1379745.855506] Mem-Info:
Mar 28 03:13:06 i15 kernel: [1379745.855523] active_anon:3511355 inactive_anon:393624 isolated_anon:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] active_file:960940 inactive_file:972546 isolated_file:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] unevictable:4294 dirty:23 writeback:43585 unstable:68858
Mar 28 03:13:06 i15 kernel: [1379745.855523] slab_reclaimable:110424 slab_unreclaimable:89953
Mar 28 03:13:06 i15 kernel: [1379745.855523] mapped:29862 shmem:120464 pagetables:14066 bounce:0
Mar 28 03:13:06 i15 kernel: [1379745.855523] free:42455 free_pcp:0 free_cma:0

Mar 28 03:13:06 i15 kernel: [1379745.855639] Node 0 DMA free:15896kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated
(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetable
s:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Mar 28 03:13:06 i15 kernel: [1379745.855766] lowmem_reserve[]: 0 2941 24059 24059 24059
Mar 28 03:13:06 i15 kernel: [1379745.855793] Node 0 DMA32 free:92644kB min:8256kB low:10320kB high:12384kB active_anon:1150416kB inactive_anon:385980kB active_file:628048kB inactive_file:64
3052kB unevictable:2460kB isolated(anon):0kB isolated(file):0kB present:3120640kB managed:3039712kB mlocked:2460kB dirty:0kB writeback:16836kB mapped:14044kB shmem:64488kB slab_reclaimable:
59892kB slab_unreclaimable:42760kB kernel_stack:2352kB pagetables:4604kB unstable:27292kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaima
ble? no
Mar 28 03:13:06 i15 kernel: [1379745.855939] lowmem_reserve[]: 0 0 21117 21117 21117
Mar 28 03:13:06 i15 kernel: [1379745.855965] Node 0 Normal free:61280kB min:59280kB low:74100kB high:88920kB active_anon:12895004kB inactive_anon:1188516kB active_file:3215712kB inactive_fi
le:3247132kB unevictable:14716kB isolated(anon):0kB isolated(file):0kB present:22020096kB managed:21624484kB mlocked:14716kB dirty:92kB writeback:157504kB mapped:105404kB shmem:417368kB sla
b_reclaimable:381804kB slab_unreclaimable:317052kB kernel_stack:5504kB pagetables:51660kB unstable:248140kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanne
d:0 all_unreclaimable? no
Mar 28 03:13:06 i15 kernel: [1379745.856129] lowmem_reserve[]: 0 0 0 0 0
Mar 28 03:13:06 i15 kernel: [1379745.856154] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
Mar 28 03:13:06 i15 kernel: [1379745.856221] Node 0 DMA32: 461*4kB (UME) 11415*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 93164kB
Mar 28 03:13:06 i15 kernel: [1379745.856280] Node 0 Normal: 1818*4kB (UMEH) 6657*8kB (UMH) 23*16kB (H) 11*32kB (H) 8*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 62016k

says you had about 170M actually free at the time of the OOM kill (note that OOM situations can happen rather fast..), but it seems like you still had cached stuff which should have been evicted. you are below the low watermark for "normal" memory though (which is basically the opposite of what the original reports had). what kind of storage are you using on this machine? are you backing up running VMs, or also stopped ones?

I'm backing up 6 running VMs and the storage is thin-lvm for every VM
 

Mohsen

New Member
Apr 30, 2017
2
0
1
31
Hello,

I've this issue too, I have only 4 machine running with 1 Gb memory.

When i start one of virtual machines " with 1 Gb limited memory" And 1 CPU core, whole server getting hang

load avrage over 1000, 1xx, 1xx and i can't to anything,

why A machine can do something like this ? what means the limitation ? :| core limit and memory limit not working :|

whe never had any issue with openvz. but i see this issue on every proxmox 4.X i'd used.

Code:
root@athena:~# pveversion -v
proxmox-ve: 4.4-84 (running kernel: 4.4.44-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.44-1-pve: 4.4.44-84
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-49
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-99
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80

and the dmesg log

Code:
[78159.534754]  [<ffffffff813fa0d3>] dump_stack+0x63/0x90
[78159.534758]  [<ffffffff8120b23b>] dump_header+0x67/0x1d5
[78159.534762]  [<ffffffff81392bda>] ? apparmor_capable+0x1aa/0x1b0
[78159.534766]  [<ffffffff81192785>] oom_kill_process+0x205/0x3c0
[78159.534769]  [<ffffffff811fee1f>] ? mem_cgroup_iter+0x1cf/0x380
[78159.534772]  [<ffffffff81200de8>] mem_cgroup_out_of_memory+0x2a8/0x2f0
[78159.534776]  [<ffffffff81201b87>] mem_cgroup_oom_synchronize+0x347/0x360
[78159.534779]  [<ffffffff811fcbb0>] ? mem_cgroup_begin_page_stat+0x90/0x90
[78159.534781]  [<ffffffff81192e84>] pagefault_out_of_memory+0x44/0xc0
[78159.534784]  [<ffffffff8106af1f>] mm_fault_error+0x7f/0x160
[78159.534787]  [<ffffffff8106b723>] __do_page_fault+0x3e3/0x410
[78159.534790]  [<ffffffff81003885>] ? syscall_trace_enter_phase1+0xc5/0x140
[78159.534793]  [<ffffffff8106b772>] do_page_fault+0x22/0x30
[78159.534796]  [<ffffffff81862478>] page_fault+0x28/0x30
[78159.534798] Task in /lxc/101/ns killed as a result of limit of /lxc/101
[78159.534804] memory: usage 1048300kB, limit 1048576kB, failcnt 0
[78159.534806] memory+swap: usage 1048576kB, limit 1048576kB, failcnt 831765
[78159.534807] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[78159.534809] Memory cgroup stats for /lxc/101: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[78159.534821] Memory cgroup stats for /lxc/101/ns: cache:666628KB rss:381672KB rss_huge:0KB mapped_file:332KB dirty:0KB writeback:0KB swap:276KB inactive_anon:574828KB active_anon:471192KB inactive_file:652KB active_file:340KB unevictable:0KB
[78159.534832] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[78159.534907] [29745]     0 29745     8316      252      21       3       34             0 init
[78159.534910] [30012]     0 30012     4873       33      15       3       21             0 upstart-udev-br
[78159.534913] [30050]     0 30050    12321       94      28       3       14         -1000 systemd-udevd
[78159.534916] [30163]     0 30163     3823       56      13       3        0             0 upstart-file-br
[78159.534918] [30164]     0 30164     3851       72      13       3        0             0 upstart-socket-
[78159.534921] [30202]   102 30202    63964      139      27       3        0             0 rsyslogd
[78159.534923] [30506]     0 30506    15349      170      34       3        0         -1000 sshd
[78159.534926] [30518]     0 30518     5917       63      17       3        0             0 cron
[78159.534928] [31452]     0 31452   143221      113      39       4        0             0 nscd
[78159.534931] [31493]     0 31493     2996      149       8       2        0             0 nginx
[78159.534933] [31495]  1001 31495     4654     1807      12       3        0             0 nginx
[78159.534935] [31622]     0 31622     1146       29       8       3        0             0 getty
[78159.534938] [31624]     0 31624     1146       28       7       3        0             0 getty
[78159.534940] [31625]     0 31625     1146       28       8       3        0             0 getty
[78159.534943] [31944]     0 31944     3211      110      12       3        0             0 bash
[78159.534945] [32159]  1001 32159    11433     1392      24       3        0             0 ffmpeg
[78159.534947] [32160]  1001 32160    11514     1159      24       3        0             0 ffmpeg
[78159.534949] [32167]  1001 32167    11519     1501      25       3        0             0 ffmpeg
[78159.534952] [32313]  1001 32313    10805     1375      24       3        0             0 ffmpeg
[78159.534954] [32316]  1001 32316    11356     1923      24       3        0             0 ffmpeg
[78159.534956] [32319]  1001 32319    10624     1191      22       3        0             0 ffmpeg
[78159.534958] [32407]  1001 32407    11372     1942      25       3        0             0 ffmpeg
[78159.534960] [32433]  1001 32433    11446     2017      25       3        0             0 ffmpeg
[78159.534963] [32542]  1001 32542    10905     1502      25       3        0             0 ffmpeg
[78159.534965] [32630]  1001 32630    11006     1570      24       3        0             0 ffmpeg
[78159.534967] [32680]  1001 32680    10862     1375      26       3        0             0 ffmpeg
[78159.534969] [32718]  1001 32718    11089     1686      25       3        0             0 ffmpeg
[78159.534971] [32759]  1001 32759    10470     1011      24       3        0             0 ffmpeg
[78159.534974] [32764]  1001 32764    10556     1080      23       3        0             0 ffmpeg
[78159.534976] [  345]  1001   345    11351     1870      26       3        0             0 ffmpeg
[78159.534978] [  377]  1001   377    11628     1308      24       3        0             0 ffmpeg
[78159.534980] [  425]  1001   425    11518     1544      25       3        0             0 ffmpeg
[78159.534982] [  430]  1001   430    10517     1024      24       3        0             0 ffmpeg
[78159.534984] [  454]  1001   454    11650     1818      27       3        0             0 ffmpeg
[78159.534987] [  485]  1001   485    11808     2313      27       3        0             0 ffmpeg
[78159.534989] [  501]  1001   501    10665     1170      24       3        0             0 ffmpeg
[78159.534991] [  507]  1001   507    10259      818      23       3        0             0 ffmpeg
[78159.534993] [  589]  1001   589    10876     1383      25       3        0             0 ffmpeg
[78159.534995] [  596]  1001   596    10729     1234      24       3        0             0 ffmpeg
[78159.534997] [  598]  1001   598    10306      862      23       3        0             0 ffmpeg
[78159.535000] [  600]  1001   600    10575      942      25       3        0             0 ffmpeg
[78159.535002] [  636]  1001   636    11579     2094      27       3        0             0 ffmpeg
[78159.535004] [  647]  1001   647    10293      786      22       2        0             0 ffmpeg
[78159.535006] [  698]  1001   698    10300      792      21       2        0             0 ffmpeg
[78159.535008] [  767]  1001   767    10447     1031      24       3        0             0 ffmpeg
[78159.535011] [  785]  1001   785    10557     1129      25       3        0             0 ffmpeg
[78159.535013] [  791]  1001   791    10103      621      23       3        0             0 ffmpeg
[78159.535015] [  808]  1001   808    10351      913      25       3        0             0 ffmpeg
[78159.535017] [  821]  1001   821    10442     1038      26       3        0             0 ffmpeg
[78159.535019] [  826]  1001   826    11289     1876      25       3        0             0 ffmpeg
[78159.535021] [  848]  1001   848    11628     1323      26       3        0             0 ffmpeg
[78159.535024] [  881]  1001   881    10729     1252      25       3        0             0 ffmpeg
[78159.535026] [  909]  1001   909    10312      860      24       3        0             0 ffmpeg
[78159.535028] [  991]  1001   991    10787     1352      25       4        0             0 ffmpeg
[78159.535030] [ 1017]  1001  1017    11129     1659      25       3        0             0 ffmpeg
[78159.535032] [ 1033]  1001  1033    10625     1160      25       3        0             0 ffmpeg
[78159.535034] [ 1037]  1001  1037    10119      633      23       3        0             0 ffmpeg
[78159.535037] [ 1069]  1001  1069    10182      699      24       3        0             0 ffmpeg
[78159.535039] [ 1103]  1001  1103    10435     1008      24       3        0             0 ffmpeg
[78159.535041] [ 1107]  1001  1107    11870     2393      26       3        0             0 ffmpeg
[78159.535043] [ 1119]  1001  1119    11149     1666      25       3        0             0 ffmpeg
[78159.535045] [ 1148]  1001  1148    10436     1045      24       3        0             0 ffmpeg
[78159.535047] [ 1156]  1001  1156    12498     1850      25       3        0             0 ffmpeg
[78159.535050] [ 1194]  1001  1194    10418     1026      23       3        0             0 ffmpeg
[78159.535052] [ 1202]  1001  1202    10485     1070      24       3        0             0 ffmpeg
[78159.535054] [ 1274]  1001  1274    10443     1036      23       3        0             0 ffmpeg
[78159.535056] [ 1291]  1001  1291    11762     2277      26       3        0             0 ffmpeg
[78159.535058] [ 1299]  1001  1299    10453     1019      24       3        0             0 ffmpeg
[78159.535060] [ 1314]  1001  1314    10774     1323      25       3        0             0 ffmpeg
[78159.535063] [ 1321]  1001  1321    11762     2278      26       3        0             0 ffmpeg
[78159.535065] [ 1358]  1001  1358    11335     1813      26       3        0             0 ffmpeg
[78159.535067] [ 1367]  1001  1367    10713     1282      24       3        0             0 ffmpeg
[78159.535069] [ 1388]  1001  1388    11225     1669      24       3        0             0 ffmpeg
[78159.535071] [ 1578]  1001  1578    10623     1163      24       3        0             0 ffmpeg
[78159.535074] [ 1585]  1001  1585    12178     1860      25       3        0             0 ffmpeg
[78159.535076] [ 1599]  1001  1599    10706     1224      24       3        0             0 ffmpeg
[78159.535078] [ 1629]  1001  1629    10228      756      23       3        0             0 ffmpeg
[78159.535081] [ 1923]     0  1923     1115       25       8       3        0             0 sh
[78159.535083] [ 1973]  1001  1973    11313      960      23       3        0             0 ffmpeg
[78159.535085] [ 1974]  1001  1974    11280      956      23       3        0             0 ffmpeg
[78159.535087] [ 1975]  1001  1975    11384     1105      23       3        0             0 ffmpeg
[78159.535089] [ 2009]  1001  2009    11321      662      22       3        0             0 ffmpeg
[78159.535091] [ 2010]  1001  2010    11238      587      22       3        0             0 ffmpeg
[78159.535094] [ 2065]  1001  2065    10891      197      21       3        0             0 ffmpeg
[78159.535096] [ 2121]     0  2121    10687       92      26       3        0             0 cron
[78159.535099] [ 2122]     0  2122    10687       93      26       3        0             0 cron
[78159.535101] [ 2123]     0  2123    10687       92      26       3        0             0 cron
[78159.535104] [ 2124]     0  2124    10687       93      26       3        0             0 cron
[78159.535106] [ 2125]     0  2125    10687       92      26       3        0             0 cron
[78159.535108] [ 2126]     0  2126    10687       93      26       3        0             0 cron
[78159.535111] [ 2127]     0  2127    10687       93      26       3        0             0 cron
[78159.535113] [ 2137]  1001  2137     1115       20       8       3        0             0 sh
[78159.535115] [ 2138]  1001  2138     1114       18       8       3        0             0 sh
[78159.535117] [ 2139]  1001  2139     1115       19       8       3        0             0 sh
[78159.535119] [ 2140]  1001  2140     1115       21       9       3        0             0 sh
[78159.535122] [ 2142]  1001  2142     1115       21       8       3        0             0 sh
[78159.535124] [ 2143]  1001  2143     1115       20       8       3        0             0 sh
[78159.535127] [ 2145]  1001  2145     1115       20       8       3        0             0 sh
[78159.535129] [ 2147]  1001  2147    37508      502      68       4        0             0 php
[78159.535131] [ 2148]  1001  2148    37508      500      69       3        0             0 php
[78159.535133] [ 2149]  1001  2149    37508      499      69       3        0             0 php
[78159.535136] [ 2150]  1001  2150    35857      495      70       3        0             0 php
[78159.535138] [ 2151]  1001  2151    36730      497      68       3        0             0 php
[78159.535140] [ 2152]  1001  2152    36730      505      69       3        0             0 php
[78159.535142] [ 2153]  1001  2153    33994      145      59       4        0             0 php
[78159.535145] [ 2200]     0  2200    13753      125      30       3        0             0 sshd
[78159.535148] [ 2230]     0  2230     2193       22       8       3        0             0 grep
[78159.535150] [ 2252]     0  2252     3211      110      11       3        0             0 bash
[78159.535154] Memory cgroup out of memory: Kill process 1107 (ffmpeg) score 9 or sacrifice child
[78159.535243] Killed process 1107 (ffmpeg) total-vm:47480kB, anon-rss:9572kB, file-rss:0kB
[78160.277148] ffmpeg invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=0
[78160.277152] ffmpeg cpuset=ns mems_allowed=0
[78160.277157] CPU: 3 PID: 32542 Comm: ffmpeg Tainted: G           O    4.4.44-1-pve #1
[78160.277159] Hardware name: Supermicro H8SME/H8SME, BIOS 1.0a       05/22/2013
[78160.277161]  0000000000000286 0000000081d1b505 ffff88040a98bc90 ffffffff813fa0d3
[78160.277163]  ffff88040a98bd68 ffff8803b1914c00 ffff88040a98bcf8 ffffffff8120b23b
[78160.277165]  0000000000000000 ffff8804088adc00 ffff88038dd1f000 ffff88040a98bce8
[78160.277167] Call Trace:
[78160.277173]  [<ffffffff813fa0d3>] dump_stack+0x63/0x90
[78160.277177]  [<ffffffff8120b23b>] dump_header+0x67/0x1d5
[78160.277180]  [<ffffffff81392bda>] ? apparmor_capable+0x1aa/0x1b0
[78160.277183]  [<ffffffff81192785>] oom_kill_process+0x205/0x3c0
[78160.277186]  [<ffffffff811fee1f>] ? mem_cgroup_iter+0x1cf/0x380
[78160.277188]  [<ffffffff81200de8>] mem_cgroup_out_of_memory+0x2a8/0x2f0
[78160.277191]  [<ffffffff81201b87>] mem_cgroup_oom_synchronize+0x347/0x360
[78160.277193]  [<ffffffff811fcbb0>] ? mem_cgroup_begin_page_stat+0x90/0x90
[78160.277195]  [<ffffffff81192e84>] pagefault_out_of_memory+0x44/0xc0
[78160.277198]  [<ffffffff8106af1f>] mm_fault_error+0x7f/0x160
[78160.277200]  [<ffffffff8106b723>] __do_page_fault+0x3e3/0x410
[78160.277202]  [<ffffffff8106b772>] do_page_fault+0x22/0x30
[78160.277204]  [<ffffffff81862478>] page_fault+0x28/0x30
[78160.277206] Task in /lxc/101/ns killed as a result of limit of /lxc/101
[78160.277210] memory: usage 1048300kB, limit 1048576kB, failcnt 0
[78160.277212] memory+swap: usage 1048576kB, limit 1048576kB, failcnt 840543
[78160.277213] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[78160.277214] Memory cgroup stats for /lxc/101: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
[78160.277226] Memory cgroup stats for /lxc/101/ns: cache:675532KB rss:372768KB rss_huge:0KB mapped_file:64KB dirty:0KB writeback:0KB swap:276KB inactive_anon:578516KB active_anon:468056KB inactive_file:304KB active_file:76KB unevictable:0KB

the main node over loaded and rebooted many times, how can i resolve the issue ?

i would appreciated if you can provide the fix.

Thank you
 

Mohsen

New Member
Apr 30, 2017
2
0
1
31
Hi,
such high loads shows normaly IO-problems - the system isn't able to do IO for the count of processes, which the load show.

Any hints in the logfile or with atop?

BTW. you think the number of ffmpeg processes are ok?

Udo


you right, about I/O you right. that virtual machine can't handel the this huge hits.

i want to say, how can i do limitation on LXC virtual machines ?

I was used the "lxc.cgroup.pids.max: 200" on this machine , limited memory, limited cpu core. but nothing changed. about the number of ffmpeg processes, yes this is normal, we have many Training Clips on our website with huge hit.

I just asking why the lxc machine with limited resource can break the main node !

we never see something like this on openvz, if vps have heavy load or attack or hit, that single machine was ovloaded, not whole node.

also about the OOM killer bug, we just gaved 4 gb memory to 4 virtual machines hosted on main node. but main server with 32 gb of memory get freez and out of memory many times:| because of this virtual machine "101"

so i'm only need a way for completly limit virtual machines resource. i don't want a fork bomb on virtual machine, can overload the main node.

Thank you
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,216
1,327
164
you are misreading the log you posted - the OOM-killer is triggered to kill a task in the LXC container 101 because that container is over its limit. this is not an error, but the memory limit working as intended. if your node is crashing, you should investigate the logs to find the reason.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!