SLUB: Unable to allocate memory on node

RobFantini

Famous Member
May 24, 2012
2,009
102
133
Boston,Mass
i just had a node go red.

vm's were off network.

from logs
Code:
.
Dec 13 10:50:01 sys3 CRON[26588]: (root) CMD ( pve-zsync sync --limit 10000 --source 4444 --dest 10.2.2.181:tank/pve-zsync-bkup --name pro4     --maxsnap 200 --method ssh)
Dec 13 10:50:01 dell1 CRON[14260]: (root) CMD (pve-zsync sync --source 105 --dest tank/pve-zsync-bkup --name imap --maxsnap 101 --method local)
Dec 13 10:50:03 dell1 kernel: [126488.191024] SLUB: Unable to allocate memory on node -1 (gfp=0xd0)
Dec 13 10:50:03 dell1 kernel: [126488.191033]   cache: kmalloc-4096(4:107), object size: 4096, buffer size: 4096, default order: 3, min order: 0
Dec 13 10:50:03 dell1 kernel: [126488.191038]   node 0: slabs: 14063, objs: 108661, free: 0
Dec 13 10:50:03 dell1 kernel: [126488.294658]   cache: kmalloc-4096(4:107), object size: 4096, buffer size: 4096, default order: 3, min order: 0
Dec 13 10:50:03 dell1 kernel: [126488.395777] SLUB: Unable to allocate memory on node -1 (gfp=0xd0)
Dec 13 10:50:03 dell1 kernel: [126488.395789]   node 0: slabs: 14063, objs: 108661, free: 0
Dec 13 10:50:03 dell1 kernel: [126488.479726]   cache: kmalloc-4096(4:107), object size: 4096, buffer size: 4096, default order: 3, min order: 0
Dec 13 10:50:04 dell1 kernel: [126489.916747] Possible memory allocation deadlock: size=80 lflags=0xc210
Dec 13 10:50:04 dell1 kernel: [126489.916772]  0000000000000000 000000000000c210 ffff8803ee33b3b8 ffffffffc0097bfb
Dec 13 10:50:04 dell1 kernel: [126489.916818]  [<ffffffffc0097bfb>] spl_kmem_zalloc+0x17b/0x180 [spl]
Dec 13 10:50:04 dell1 kernel: [126489.917030]  [<ffffffff8139787c>] ? generic_make_request_checks+0x1dc/0x3a0
Dec 13 10:50:04 dell1 kernel: [126489.917061]  [<ffffffff81397be6>] submit_bio+0x76/0x180
Dec 13 10:50:04 dell1 kernel: [126489.917093]  [<ffffffff81195802>] pageout.isra.40+0x182/0x270
Dec 13 10:50:04 dell1 kernel: [126489.917127]  [<ffffffff81198ece>] shrink_lruvec+0x5fe/0x7f0
Dec 13 10:50:04 dell1 kernel: [126489.917162]  [<ffffffff811999b4>] try_to_free_mem_cgroup_pages+0xb4/0x140
Dec 13 10:50:04 dell1 kernel: [126489.917194]  [<ffffffff81182717>] add_to_page_cache_lru+0x37/0x90
Dec 13 10:50:04 dell1 kernel: [126489.917226]  [<ffffffff810e93b2>] ? set_cpu_itimer+0x132/0x220
Dec 13 10:50:04 dell1 kernel: [126489.917266] Possible memory allocation deadlock: size=80 lflags=0xc210

a little before that in log, issues started here
Code:
Dec 13 10:41:26 dell1 kernel: [125970.997643] TCP: request_sock_TCP: Possible SYN flooding on port 7002. Sending cookies.  Check SNMP counters.
Dec 13 10:42:24 dell1 kernel: [126028.917202] Possible memory allocation deadlock: size=216 lflags=0xc210
Dec 13 10:42:24 dell1 kernel: [126028.917233]  0000000000000000 000000000000c210 ffff8807c3007458 ffffffffc0097bfb
Dec 13 10:42:24 dell1 kernel: [126028.917276]  [<ffffffffc0097bfb>] spl_kmem_zalloc+0x17b/0x180 [spl]
Dec 13 10:42:24 dell1 kernel: [126028.917487]  [<ffffffffc02e78c6>] zvol_request+0x226/0x680 [zfs]
Dec 13 10:42:24 dell1 kernel: [126028.917520]  [<ffffffff81397be6>] submit_bio+0x76/0x180
Dec 13 10:42:24 dell1 kernel: [126028.917551]  [<ffffffff81195802>] pageout.isra.40+0x182/0x270
Dec 13 10:42:24 dell1 kernel: [126028.917582]  [<ffffffff810a5e00>] ? try_to_wake_up+0x180/0x340
Dec 13 10:42:24 dell1 kernel: [126028.917613]  [<ffffffff811f15ae>] try_charge+0x18e/0x720
Dec 13 10:42:24 dell1 kernel: [126028.917774]  [<ffffffff81327f33>] ? security_file_permission+0xa3/0xc0
Dec 13 10:42:24 dell1 kernel: [126028.917790]  [<ffffffff810675dd>] __do_page_fault+0x19d/0x410
Dec 13 10:42:24 dell1 kernel: [126028.917804]  [<ffffffff81809f48>] page_fault+0x28/0x30
Dec 13 10:42:26 dell1 kernel: [126031.251186] Possible memory allocation deadlock: size=224 lflags=0x4210
Dec 13 10:42:26 dell1 kernel: [126031.251205] Hardware name: Dell Inc. PowerEdge R720/0C4Y3R, BIOS 2.5.2 01/28/2015
Dec 13 10:42:26 dell1 kernel: [126031.251224]  00011200ffffffff 0000000000000296 ffff8809d3a23398 ffff8803f6e20a00
Dec 13 10:42:26 dell1 kernel: [126031.251262]  [<ffffffffc0097a74>] spl_kmem_alloc+0x184/0x190 [spl]
Dec 13 10:42:26 dell1 kernel: [126031.251441]  [<ffffffffc02e786c>] zvol_request+0x1cc/0x680 [zfs]
Dec 13 10:42:26 dell1 kernel: [126031.251465]  [<ffffffff81397b2e>] generic_make_request+0xee/0x130
Dec 13 10:42:26 dell1 kernel: [126031.251487]  [<ffffffff811cb4a0>] ? __frontswap_store+0x90/0x120
Dec 13 10:42:26 dell1 kernel: [126031.251510]  [<ffffffff81197748>] shrink_page_list+0x408/0x780
Dec 13 10:42:26 dell1 kernel: [126031.251535]  [<ffffffff81198ece>] shrink_lruvec+0x5fe/0x7f0
Dec 13 10:42:26 dell1 kernel: [126031.251558]  [<ffffffff811994e2>] do_try_to_free_pages+0x172/0x440
Dec 13 10:42:26 dell1 kernel: [126031.251580]  [<ffffffff811f237e>] mem_cgroup_try_charge+0x8e/0xf0
Dec 13 10:42:26 dell1 kernel: [126031.251600]  [<ffffffff81184186>] filemap_fault+0x1b6/0x3e0
Dec 13 10:42:26 dell1 kernel: [126031.251621]  [<ffffffff811b4830>] handle_mm_fault+0xfc0/0x1840
Dec 13 10:42:26 dell1 kernel: [126031.251643]  [<ffffffff81067872>] do_page_fault+0x22/0x30
Dec 13 10:42:26 dell1 kernel: [126031.251681] CPU: 3 PID: 32519 Comm: mysqld Tainted: P           O    4.2.6-1-pve #1
Dec 13 10:42:26 dell1 kernel: [126031.251694]  0000000000000000 0000000000004210 ffff8809d3a233a8 ffffffffc0097a74
Dec 13 10:42:26 dell1 kernel: [126031.251713]  [<ffffffff81801028>] dump_stack+0x45/0x57
Dec 13 10:42:26 dell1 kernel: [126031.251807]  [<ffffffff813969ff>] ? part_round_stats+0x4f/0x60
Dec 13 10:42:26 dell1 kernel: [126031.252023]  [<ffffffff81184779>] ? mempool_alloc+0x69/0x170
Dec 13 10:42:26 dell1 kernel: [126031.252097]  [<ffffffff81397be6>] submit_bio+0x76/0x180
Dec 13 10:42:26 dell1 kernel: [126031.252104]  [<ffffffff811c5f5e>] __swap_writepage+0x22e/0x270
Dec 13 10:42:26 dell1 kernel: [126031.252110]  [<ffffffff811cb4a0>] ? __frontswap_store+0x90/0x120
Dec 13 10:42:26 dell1 kernel: [126031.252134]  [<ffffffff81197748>] shrink_page_list+0x408/0x780
Dec 13 10:42:26 dell1 kernel: [126031.252161]  [<ffffffff81198ece>] shrink_lruvec+0x5fe/0x7f0
Dec 13 10:42:26 dell1 kernel: [126031.252184]  [<ffffffff811994e2>] do_try_to_free_pages+0x172/0x440
Dec 13 10:42:26 dell1 kernel: [126031.252247]  [<ffffffff811f237e>] mem_cgroup_try_charge+0x8e/0xf0
..

Dec 13 10:43:56 dell1 kernel: [126121.651232] INFO: task monit:7349 blocked for more than 120 seconds.
Dec 13 10:43:56 dell1 kernel: [126121.651397] monit           D ffff880feea56a00     0  7349      1 0x00000000

..

Dec 13 10:43:56 dell1 kernel: [126121.652403] INFO: task bc-server:14570 blocked for more than 120 seconds.
Dec 13 10:43:56 dell1 kernel: [126121.652676]  [<ffffffff81806df2>] rwsem_down_read_failed+0xf2/0x140
Dec 13 10:43:56 dell1 kernel: [126121.652686]  [<ffffffff813d6704>] call_rwsem_down_read_failed+0x14/0x30
Dec 13 10:43:56 dell1 kernel: [126121.652690]  [<ffffffff81806324>] ? down_read+0x24/0x30
Dec 13 10:43:56 dell1 kernel: [126121.652699]  [<ffffffff810677be>] __do_page_fault+0x37e/0x410
Dec 13 10:43:56 dell1 kernel: [126121.652706]  [<ffffffff818038ae>] ? __schedule+0x37e/0x950
Dec 13 10:43:56 dell1 kernel: [126121.652711]  [<ffffffff81067872>] do_page_fault+0x22/0x30
Dec 13 10:43:56 dell1 kernel: [126121.652715]  [<ffffffff81809f48>] page_fault+0x28/0x30
Dec 13 10:43:56 dell1 kernel: [126121.652719] INFO: task bc-server:14593 blocked for more than 120 seconds.
Dec 13 10:43:56 dell1 kernel: [126121.652764]       Tainted: P           O    4.2.6-1-pve #1
Dec 13 10:43:56 dell1 kernel: [126121.652805] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Code:
# pveversion -v
proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)                                                                                                              
pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)                                                                                                          
pve-kernel-4.2.6-1-pve: 4.2.6-26                                                                                                                              
pve-kernel-4.2.2-1-pve: 4.2.2-16                                                                                                                              
pve-kernel-4.2.3-1-pve: 4.2.3-18                                                                                                                              
pve-kernel-4.2.3-2-pve: 4.2.3-22                                                                                                                              
lvm2: 2.02.116-pve2                                                                                                                                                             
corosync-pve: 2.3.5-2                                                                                                                                                           
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-41
pve-firmware: 1.1-7
libpve-common-perl: 4.0-41
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-17
pve-container: 1.0-32
pve-firewall: 2.0-14
pve-ha-manager: 1.0-14
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

from inside a lxc:
Code:
Dec 13 10:44:51 imap kernel: [126176.155922]  [<ffffffff8106735f>] mm_fault_error+0x7f/0x160

this system has 64GB ecc ram.

had to reboot to fix.

normal memory used is 15GB

Now I'll make sure pve-zsync only runs one at a time from different systems.

It looks like there us a zfs / kernel bug ... I'll do more research , ran in to this so far: http://www.subly.me/articles/linux-zfs-oom.html . However that person had not much memory to start with.

Any clues on preventing this?
 
Last edited:
to try to prevent the issue I moved all pve-zsyncs to one node - doing pulls instead of each node sending. pve-zsync seems to limit one zfs send/receive at a time.

I assume that the out of memory issue had to do with zfs .

Is there a way to limit zfs max memory usage? [ I'll research... ]
 
to try to prevent the issue I moved all pve-zsyncs to one node - doing pulls instead of each node sending. pve-zsync seems to limit one zfs send/receive at a time.

I assume that the out of memory issue had to do with zfs .

Is there a way to limit zfs max memory usage? [ I'll research... ]

I think, it is a bug. Some minutes ago i try to install a proxmox 4.0 on free node. There my CTs work normall and no trubles with memory.
 
For us 4.1 solved some cluster issues so I'm going to stay with 4.1. however we are seeing out of memory issues when there is a high disk load - like with backups. so we'll limit those until issue is fixed.
 
examples of oom . these are from a log check email
Code:
sys7  [ pve host and a nfs backup target .  ]
/var/log/syslog
Dec 16 01:16:39 sys7 kernel: [58246.744973]  [<ffffffff8106735f>] mm_fault_error+0x7f/0x160
Dec 16 01:16:39 sys7 kernel: [58246.775265]  [<ffffffff8106735f>] mm_fault_error+0x7f/0x160
Dec 16 01:16:39 sys7 kernel: [58247.552665]  [<ffffffff8106735f>] mm_fault_error+0x7f/0x160
Dec 16 01:45:59 sys7 kernel: [60008.122087]  [<ffffffff8106735f>] mm_fault_error+0x7f/0x160

*in a couple of lxc at sys7
Code:
Dec 16 01:16:41 bc-sys2 kernel: [58249.374931] mv invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
Dec 16 01:16:43 bc-sys2 kernel: [58251.310009] monit invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0

Dec 16 01:39:44 apt-cacher kernel: [59632.467134] init invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!