Proxmox 4.4.5 kernel: Out of memory: Kill process 8543 (kvm) score or sacrifice child

Discussion in 'Proxmox VE: Installation and configuration' started by ozgurerdogan, Jan 1, 2017.

  1. ozgurerdogan

    ozgurerdogan Member

    Joined:
    May 2, 2010
    Messages:
    353
    Likes Received:
    0
    O have enough amount of ram. But one kvm stops and I see these in syslog.:

    Code:
    Jan 01 01:34:01 vztlfr6 kernel: sh invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
    Jan 01 01:34:01 vztlfr6 kernel: sh cpuset=/ mems_allowed=0
    Jan 01 01:34:01 vztlfr6 kernel: CPU: 3 PID: 4117 Comm: sh Tainted: G IO 4.4.35-1-pve #1
    Jan 01 01:34:01 vztlfr6 kernel: Hardware name: Supermicro X8STi/X8STi, BIOS 2.0 09/17/10
    Jan 01 01:34:01 vztlfr6 kernel: 0000000000000286 000000004afdee85 ffff88000489fb50 ffffffff813f9743
    Jan 01 01:34:01 vztlfr6 kernel: ffff88000489fd40 0000000000000000 ffff88000489fbb8 ffffffff8120adcb
    Jan 01 01:34:01 vztlfr6 kernel: ffff88040f2dada0 ffffea0004f99300 0000000100000001 0000000000000000
    Jan 01 01:34:01 vztlfr6 kernel: Call Trace:
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff813f9743>] dump_stack+0x63/0x90
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff8120adcb>] dump_header+0x67/0x1d5
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff811925c5>] oom_kill_process+0x205/0x3c0
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff81192a17>] out_of_memory+0x237/0x4a0
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff81198d0e>] __alloc_pages_nodemask+0xcee/0xe20
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff81198e8b>] alloc_kmem_pages_node+0x4b/0xd0
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff8107f053>] copy_process+0x1c3/0x1c00
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff813941b0>] ? apparmor_file_alloc_security+0x60/0x240
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff813494b3>] ? security_file_alloc+0x33/0x50
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff81080c20>] _do_fork+0x80/0x360
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff810917ff>] ? sigprocmask+0x6f/0xa0
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff81080fa9>] SyS_clone+0x19/0x20
    Jan 01 01:34:01 vztlfr6 kernel: [<ffffffff8185c276>] entry_SYSCALL_64_fastpath+0x16/0x75
    Jan 01 01:34:01 vztlfr6 kernel: Mem-Info:
    Jan 01 01:34:01 vztlfr6 kernel: active_anon:2535826 inactive_anon:377038 isolated_anon:0
    active_file:444477 inactive_file:444280 isolated_file:0
    unevictable:880 dirty:17 writeback:0 unstable:0
    slab_reclaimable:162931 slab_unreclaimable:58813
    mapped:20826 shmem:21040 pagetables:10173 bounce:0
    free:38866 free_pcp:111 free_cma:0
    Jan 01 01:34:01 vztlfr6 kernel: Node 0 DMA free:15852kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15968kB managed:15884kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
    Jan 01 01:34:01 vztlfr6 kernel: lowmem_reserve[]: 0 3454 15995 15995 15995
    Jan 01 01:34:01 vztlfr6 kernel: Node 0 DMA32 free:107940kB min:3492kB low:4364kB high:5236kB active_anon:1922068kB inactive_anon:480552kB active_file:383152kB inactive_file:382624kB unevictable:780kB isolated(anon):0kB isolated(file):0kB present:3644928kB managed:3564040kB mlocked:780kB dirty:8kB writeback:0kB mapped:20576kB shmem:21772kB slab_reclaimable:219488kB slab_unreclaimable:38272kB kernel_stack:528kB pagetables:8100kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
    Jan 01 01:34:01 vztlfr6 kernel: lowmem_reserve[]: 0 0 12541 12541 12541
    Jan 01 01:34:01 vztlfr6 kernel: Node 0 Normal free:31672kB min:12684kB low:15852kB high:19024kB active_anon:8221236kB inactive_anon:1027600kB active_file:1394756kB inactive_file:1394496kB unevictable:2740kB isolated(anon):0kB isolated(file):0kB present:13107200kB managed:12842072kB mlocked:2740kB dirty:60kB writeback:0kB mapped:62728kB shmem:62388kB slab_reclaimable:432236kB slab_unreclaimable:196980kB kernel_stack:4016kB pagetables:32592kB unstable:0kB bounce:0kB free_pcp:428kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:84 all_unreclaimable? no
    Jan 01 01:34:01 vztlfr6 kernel: lowmem_reserve[]: 0 0 0 0 0
    Jan 01 01:34:01 vztlfr6 kernel: Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15852kB
    Jan 01 01:34:01 vztlfr6 kernel: Node 0 DMA32: 826*4kB (UME) 13128*8kB (UME) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 108328kB
    Jan 01 01:34:01 vztlfr6 kernel: Node 0 Normal: 7676*4kB (UMEH) 86*8kB (UMEH) 5*16kB (H) 1*32kB (H) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 31504kB
    Jan 01 01:34:01 vztlfr6 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
    Jan 01 01:34:01 vztlfr6 kernel: 910214 total pagecache pages
    Jan 01 01:34:01 vztlfr6 kernel: 0 pages in swap cache
    Jan 01 01:34:01 vztlfr6 kernel: Swap cache stats: add 376, delete 376, find 0/0
    Jan 01 01:34:01 vztlfr6 kernel: Free swap = 1046044kB
    Jan 01 01:34:01 vztlfr6 kernel: Total swap = 1047548kB
    Jan 01 01:34:01 vztlfr6 kernel: 4192024 pages RAM
    Jan 01 01:34:01 vztlfr6 kernel: 0 pages HighMem/MovableOnly
    Jan 01 01:34:01 vztlfr6 kernel: 86525 pages reserved
    Jan 01 01:34:01 vztlfr6 kernel: 0 pages cma reserved
    Jan 01 01:34:01 vztlfr6 kernel: 0 pages hwpoisoned
    Jan 01 01:34:01 vztlfr6 kernel: [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
    Jan 01 01:34:01 vztlfr6 kernel: [ 297] 0 297 12434 4291 30 3 0 0 systemd-journal
    Jan 01 01:34:01 vztlfr6 kernel: [ 300] 0 300 10391 862 23 3 0 -1000 systemd-udevd
    Jan 01 01:34:01 vztlfr6 kernel: [ 570] 0 570 2511 29 9 3 0 0 rdnssd
    Jan 01 01:34:01 vztlfr6 kernel: [ 571] 104 571 4614 373 14 3 0 0 rdnssd
    Jan 01 01:34:01 vztlfr6 kernel: [ 576] 100 576 25011 594 20 3 0 0 systemd-timesyn
    Jan 01 01:34:01 vztlfr6 kernel: [ 1011] 0 1011 9270 666 23 3 0 0 rpcbind
    Jan 01 01:34:01 vztlfr6 kernel: [ 1028] 0 1028 1272 374 8 3 0 0 iscsid
    Jan 01 01:34:01 vztlfr6 kernel: [ 1029] 0 1029 1397 881 8 3 0 -17 iscsid
    Jan 01 01:34:01 vztlfr6 kernel: [ 1036] 107 1036 9320 721 22 3 0 0 rpc.statd
    Jan 01 01:34:01 vztlfr6 kernel: [ 1050] 0 1050 5839 49 16 3 0 0 rpc.idmapd
    Jan 01 01:34:01 vztlfr6 kernel: [ 1207] 0 1207 13796 1320 31 3 0 -1000 sshd
    Jan 01 01:34:01 vztlfr6 kernel: [ 1212] 0 1212 6146 916 17 3 0 0 smartd
    Jan 01 01:34:01 vztlfr6 kernel: [ 1214] 109 1214 191484 9777 72 3 0 0 named
    Jan 01 01:34:01 vztlfr6 kernel: [ 1216] 0 1216 58709 460 17 4 0 0 lxcfs
    Jan 01 01:34:01 vztlfr6 kernel: [ 1218] 0 1218 1022 161 7 3 0 -1000 watchdog-mux
    Jan 01 01:34:01 vztlfr6 kernel: [ 1219] 0 1219 4756 418 14 3 0 0 atd
    Jan 01 01:34:01 vztlfr6 kernel: [ 1222] 0 1222 5459 649 13 3 0 0 ksmtuned
    Jan 01 01:34:01 vztlfr6 kernel: [ 1227] 0 1227 4964 596 15 3 0 0 systemd-logind
    Jan 01 01:34:01 vztlfr6 kernel: [ 1235] 106 1235 10558 825 27 3 0 -900 dbus-daemon
    Jan 01 01:34:01 vztlfr6 kernel: [ 1271] 0 1271 206547 749 63 4 0 0 rrdcached
    Jan 01 01:34:01 vztlfr6 kernel: [ 1287] 0 1287 64668 822 28 3 0 0 rsyslogd
    Jan 01 01:34:01 vztlfr6 kernel: [ 1312] 0 1312 1064 386 8 3 0 0 acpid
    
    Jan 01 01:34:01 vztlfr6 kernel: Out of memory: Kill process 8543 (kvm) score 279 or sacrifice child
    Jan 01 01:34:01 vztlfr6 kernel: Killed process 8543 (kvm) total-vm:5808216kB, anon-rss:5007352kB, file-rss:10792kB
    Jan 01 01:34:01 vztlfr6 CRON[4094]: pam_unix(cron:session): session closed for user root
    Jan 01 01:34:02 vztlfr6 kernel: vmbr0: port 3(tap112i0) entered disabled state
     
  2. athompso

    athompso Member

    Joined:
    Sep 13, 2013
    Messages:
    127
    Likes Received:
    5
    +1 : I've been seeing this on a nightly basis, too, recently. Only since 4.4.x.
     
  3. spirit

    spirit Well-Known Member

    Joined:
    Apr 2, 2010
    Messages:
    2,998
    Likes Received:
    75
  4. mir

    mir Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 14, 2012
    Messages:
    3,400
    Likes Received:
    86
  5. athompso

    athompso Member

    Joined:
    Sep 13, 2013
    Messages:
    127
    Likes Received:
    5
    I do use ZFS, but I also have the ARC limited to 2GB or 4GB (on 16GB and 28GB servers respectively - I haven't seen the error on any of the 48G nodes yet).
    I have been seriously suspicious of ZFS lately, its performance under heavy write conditions is utterly abysmal no matter what tweaking I do... Actually, I can get it to go fast by disabling the write throttle, but then the kernel crashes under heavy write, so that's no better.

    In any case, this appears to be a regression, since it wasn't happening previously.
     
  6. ozgurerdogan

    ozgurerdogan Member

    Joined:
    May 2, 2010
    Messages:
    353
    Likes Received:
    0
    I do not use ZFS also. It is an issue with OOM Killer
     
  7. spirit

    spirit Well-Known Member

    Joined:
    Apr 2, 2010
    Messages:
    2,998
    Likes Received:
    75
    yes, sure, so what's is eating your memory ? do you have a process list with memory usage before oom occur ?
     
  8. absent

    absent New Member

    Joined:
    Jan 2, 2017
    Messages:
    1
    Likes Received:
    0
    ozgurerdogan, try set:
    vm.min_free_kbytes = 131072
     
  9. ozgurerdogan

    ozgurerdogan Member

    Joined:
    May 2, 2010
    Messages:
    353
    Likes Received:
    0
    thank you I will give it a try. It mostly kill kvm when it is backing up. But even during backup, system has at leat %10 free memory..
     
  10. e100

    e100 Active Member
    Proxmox VE Subscriber

    Joined:
    Nov 6, 2010
    Messages:
    1,219
    Likes Received:
    17
    I recently upgraded kernel from 4.4.13-2-pve to 4.4.35-1-pve
    After the upgrade CEPH osds would randomly get killed by OOM even when there was plenty of RAM available.

    Typically nearly all of the free ram was consumed by cache when the OOM event occurs

    So far since doing this everything has been running stable:
    Code:
    echo 262144 > /proc/sys/vm/min_free_kbytes
     
  11. whitewater

    whitewater Member

    Joined:
    Nov 26, 2012
    Messages:
    102
    Likes Received:
    0
    Hello,
    i have the same problem for a few days. A kvm guest (Windows 2012r2) is killed by OOM.
    Proxmox host have 32 Go ram. 20 Go free ram.
    Storage is drbd 8.4 (compiled with http://coolsoft.altervista.org/it/b...rnel-panic-downgrade-drbd-resources-drbd-9-84).
    Kernel 4.4.35-1.pve.

    I have 3 agency. Difference is memory :
    32 Go (with the problem) 64 and 128 Go (without problem).
    Same kernel.
    I don't use ZFS or Ceph.
    I had a NFS storage for backup.

    I encounter VM killed when backup starting one time and other during the day with no particular activity.

    I encounter this problem since i updated proxmox with this kernel.
    Today, i had migrated VM on other host and see what happen.
    Maybie i will test with kernel version before 4.4.35-1.pve.

    For this
    Code:
    echo 262144 > /proc/sys/vm/min_free_kbytes
    is it must be done every boot ?
     
  12. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,392
    Likes Received:
    101
    Hi,
    you can put this in /etc/sysctl.d/pve.conf (or /etc/sysctl.d/90-my.conf) like:
    Code:
    vm.swappiness = 1
    vm.min_free_kbytes = 262144
    
    Udo
     
    joshin likes this.
  13. whitewater

    whitewater Member

    Joined:
    Nov 26, 2012
    Messages:
    102
    Likes Received:
    0
    ok, thank you Udo. I will test if i need, after testing other kernel version (i think)
     
  14. ozgurerdogan

    ozgurerdogan Member

    Joined:
    May 2, 2010
    Messages:
    353
    Likes Received:
    0
    This seems to fix the issue for now about OOM. But one of my nodes is having similar problem. During or right after backup of all vms, kvm loose disk connection and I have to drop cache with echo 1 > /proc/sys/vm/drop_caches so how can I heal the cache usage?
    If I backup that vm only, it does not loose connection with kvm disk. Only if all 4 vms are backed up this is happening.
     
  15. opty

    opty New Member

    Joined:
    Apr 5, 2012
    Messages:
    20
    Likes Received:
    0
    I encountered the problem this night on 2 of my servers, it was also during backup, I do not use ZFS or CEPH

    One of those server worked perfectly with kernel 4.4.35 from 2016-12-20 until that minor upgrade :

    Start-Date: 2017-01-03 08:14:02
    Commandline: apt-get dist-upgrade
    Upgrade: libpve-common-perl:amd64 (4.0-84, 4.0-85), pve-kernel-4.4.35-1-pve:amd64 (4.4.35-76, 4.4.35-77), libpve-storage-perl:amd64 (4.0-70, 4.0-71), pve-manager:amd64 (4.4-2, 4.4-5), libgd3:amd64 (2.1.0-5+deb8u7, 2.1.0-5+deb8u8), lxcfs:amd64 (2.0.5-pve1, 2.0.5-pve2), pve-qemu-kvm:amd64 (2.7.0-9, 2.7.0-10), pve-container:amd64 (1.0-89, 1.0-90), lxc-pve:amd64 (2.0.6-2, 2.0.6-5), proxmox-ve:amd64 (4.4-76, 4.4-77)
    End-Date: 2017-01-03 08:15:08

    please note the 4.4.35-76 to 4.4.35-77 kernel upgrade, and since I did not see any mention about oom modification in kernel.org changelogs, is that a custom proxmox patch? Or is that something related to backup behaviour change? please note that oom kill happened while backuping lxc container
     
  16. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    2,606
    Likes Received:
    368
    there have been some OOM related cherry-picks from 4.7 into the Ubuntu kernel to fix https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1647400 , those might be at fault:
    https://git.kernel.org/cgit/linux/k.../?id=0a0337e0d1d134465778a16f5cbea95086e8e9e0
    https://git.kernel.org/cgit/linux/k.../?id=ede37713737834d98ec72ed299a305d53e909f73
     
  17. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    2,606
    Likes Received:
    368
    can't reproduce this issue so far (even with very high memory pressure and load) - so any more information to narrow down the contributing factors would help:
    • used hardware
    • used storage plugins
    • memory and swap sizes
    • circumstances triggering the OOM, ideally together with system logs and fine-grained atop or similar data

    edit: I can trigger the OOM-killer and produce the stacktrace mentioned earlier in this thread, but only when disabling swap and having less than a few hundred MB of actual free memory - i.e., the very situation where the OOM-killer has to act to prevent a total system crash.. are you sure that you are not simply running out of memory?
     
    #17 fabian, Jan 4, 2017
    Last edited: Jan 4, 2017
  18. whitewater

    whitewater Member

    Joined:
    Nov 26, 2012
    Messages:
    102
    Likes Received:
    0
    Hello Fabian. For me :
    Motherboard : Supermicro X9DR3-F.
    Storage : DRBD v8.4 (compiled with the link said above) for VM.
    NFS on a synology RS2212 for backup.
    Memory & Swap size : 32 Go & 31 Go. Memtest OK.

    Here some files in attachment.

    Only the VM killed on the node concerned was running.
     

    Attached Files:

  19. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,392
    Likes Received:
    101
    Hi,
    how looks "cat /proc/sys/vm/swappiness" on the effected systems? Perhaps 0 instead of 1?

    Udo
     
  20. whitewater

    whitewater Member

    Joined:
    Nov 26, 2012
    Messages:
    102
    Likes Received:
    0
    Hi Udo, 60 :
    Code:
    root@mtp-prox02:~# cat /proc/sys/vm/swappiness
    60
    I had done this on several proxmox host. All 60.
     

Share This Page