Hello,
I have been slowly but surely trying to scale on of my nodes & attempting to find the limits of the hardware in order to settle on a good level to load the server at long term. I have encountered various constraints along the way and have recently got stuck at 1350 containers of the proxmox ubuntu 18.04 container image - which throws a seccomp error. A previous occurance of this error was solved by the following sysctl - net.core.bpf_jit_limit = 3000000000 as described in the updated lxc production setup doc: https://linuxcontainers.org/lxd/docs/master/production-setup
However, now I am at 1350 containers, and containers are failing to start:
lxc-start 438 20200615053953.628 ERROR seccomp - seccomp.c:lxc_seccomp_load:1239 - Unknown error 524 - Error loading the seccomp policy
However I think this is just a symptom of what I see in the syslog:
I will try and post the contents of /proc/meminfo once I am able to bring my node back up after some work, but VmallocUsed is showing at around 22GB. I have searched for more info around the topic, but much of it is centered around 32bit constraints that need to be alleviated with a line in the grub boot loader, but I am unsure if that works here, as the 64bit contraint is 34TB or so. The bpf_jit constraint i increased previously seems to be involved. I make extensive use of SWAP (1-2TB) on a round robin array of 8 NVME drives, which allows my containers to dump most of their idle memory (after they have performed their workload) allowing me to make much better use of RAM. The containers remain performant & I have done extensive IO tuning for my workload.
Does anyone have any advice as to where to look next?
Here are further details of my node:
Dual AMD Epyc 7742, 1TB RAM, 8TB SWAP (8x1TB NVME), 72x 2TB SSD
proxmox-ve: 6.2-1 (running kernel: 5.4.41-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-2
pve-kernel-helper: 6.2-2
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
pve-kernel-5.4.27-1-pve: 5.4.27-1
pve-kernel-5.4.24-1-pve: 5.4.24-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.13-3-pve: 5.3.13-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 2.0.1-1+pve8
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-6
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
I have been slowly but surely trying to scale on of my nodes & attempting to find the limits of the hardware in order to settle on a good level to load the server at long term. I have encountered various constraints along the way and have recently got stuck at 1350 containers of the proxmox ubuntu 18.04 container image - which throws a seccomp error. A previous occurance of this error was solved by the following sysctl - net.core.bpf_jit_limit = 3000000000 as described in the updated lxc production setup doc: https://linuxcontainers.org/lxd/docs/master/production-setup
However, now I am at 1350 containers, and containers are failing to start:
lxc-start 438 20200615053953.628 ERROR seccomp - seccomp.c:lxc_seccomp_load:1239 - Unknown error 524 - Error loading the seccomp policy
However I think this is just a symptom of what I see in the syslog:
Jun 15 06:45:14 host kernel: vmap allocation for size 8192 failed: use vmalloc= to increase size
Jun 15 06:45:14 host kernel: lxc-start: vmalloc: allocation failure: 4096 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=ns,mems_allowed=0-1
Jun 15 06:45:14 host kernel: Call Trace:
Jun 15 06:45:14 host kernel: dump_stack+0x6d/0x9a
Jun 15 06:45:14 host kernel: warn_alloc.cold.119+0x7b/0xdd
Jun 15 06:45:14 host kernel: ? __get_vm_area_node+0x149/0x160
Jun 15 06:45:14 host kernel: ? bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: __vmalloc_node_range+0x1aa/0x270
Jun 15 06:45:14 host kernel: ? pcpu_block_refresh_hint+0xb0/0xf0
Jun 15 06:45:14 host kernel: ? bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: module_alloc+0x82/0xe0
Jun 15 06:45:14 host kernel: ? bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: bpf_jit_binary_alloc+0x63/0xf0
Jun 15 06:45:14 host kernel: ? emit_mov_reg+0xf0/0xf0
Jun 15 06:45:14 host kernel: bpf_int_jit_compile+0x133/0x34d
Jun 15 06:45:14 host kernel: bpf_prog_select_runtime+0xcd/0x150
Jun 15 06:45:14 host kernel: bpf_prepare_filter+0x52e/0x5a0
Jun 15 06:45:14 host kernel: bpf_prog_create_from_user+0xc5/0x110
Jun 15 06:45:14 host kernel: ? hardlockup_detector_perf_cleanup.cold.9+0x1a/0x1a
Jun 15 06:45:14 host kernel: do_seccomp+0x2bf/0x8d0
Jun 15 06:45:14 host kernel: __x64_sys_seccomp+0x1a/0x20
Jun 15 06:45:14 host kernel: do_syscall_64+0x57/0x190
Jun 15 06:45:14 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 15 06:45:14 host kernel: RIP: 0033:0x7fbfc709bf59
Jun 15 06:45:14 host kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 07 6f 0c 00 f7 d8 64 89 01 48
Jun 15 06:45:14 host kernel: RSP: 002b:00007ffd36a591b8 EFLAGS: 00000246 ORIG_RAX: 000000000000013d
Jun 15 06:45:14 host kernel: RAX: ffffffffffffffda RBX: 00005597e2acf440 RCX: 00007fbfc709bf59
Jun 15 06:45:14 host kernel: RDX: 00005597e2ade6f0 RSI: 0000000000000000 RDI: 0000000000000001
Jun 15 06:45:14 host kernel: RBP: 00005597e2ade6f0 R08: 00005597e2acf440 R09: 00005597e2ac8cc0
Jun 15 06:45:14 host kernel: R10: 00005597e2ad34a0 R11: 0000000000000246 R12: 00007ffd36a5925c
Jun 15 06:45:14 host kernel: R13: 0000000000000000 R14: 00000000ffffffff R15: 00005597e2ac8cc0
Jun 15 06:45:14 host kernel: Mem-Info:
Jun 15 06:45:14 host kernel: active_anon:46934939 inactive_anon:84738556 isolated_anon:0
active_file:20479475 inactive_file:18648470 isolated_file:0
unevictable:223734 dirty:590 writeback:0 unstable:0
slab_reclaimable:6646485 slab_unreclaimable:25509665
mapped:5764741 shmem:53598 pagetables:2035581 bounce:0
free:35623875 free_pcp:138359 free_cma:0
Jun 15 06:45:14 host kernel: Node 0 active_anon:96891592kB inactive_anon:176347476kB active_file:42523196kB inactive_file:38214056kB unevictable:285892kB isolated(anon):0kB isolated(file):0kB mapped:11951496kB dirty:1392kB writeback:0kB shmem:78572kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jun 15 06:45:14 host kernel: Node 1 active_anon:90848164kB inactive_anon:162606748kB active_file:39394704kB inactive_file:36379824kB unevictable:609044kB isolated(anon):0kB isolated(file):0kB mapped:11107468kB dirty:968kB writeback:0kB shmem:135820kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jun 15 06:45:14 host kernel: Node 0 DMA free:15872kB min:0kB low:12kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15872kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 2557 515793 515793 515793
Jun 15 06:45:14 host kernel: Node 0 DMA32 free:2626636kB min:220kB low:2836kB high:5452kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2732964kB managed:2665112kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1608kB local_pcp:0kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 0 513236 513236 513236
Jun 15 06:45:14 host kernel: Node 0 Normal free:57810656kB min:44820kB low:570372kB high:1095924kB active_anon:96891592kB inactive_anon:176347476kB active_file:42523196kB inactive_file:38214056kB unevictable:285892kB writepending:1392kB present:533970944kB managed:525553736kB mlocked:285892kB kernel_stack:881128kB pagetables:4131924kB bounce:0kB free_pcp:280648kB local_pcp:1284kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 0 0 0 0
Jun 15 06:45:14 host kernel: Node 1 Normal free:82042336kB min:45064kB low:573476kB high:1101888kB active_anon:90848164kB inactive_anon:162606748kB active_file:39394704kB inactive_file:36379824kB unevictable:609044kB writepending:968kB present:536866816kB managed:528422156kB mlocked:609044kB kernel_stack:973480kB pagetables:4010400kB bounce:0kB free_pcp:271176kB local_pcp:1472kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 0 0 0 0
Jun 15 06:45:14 host kernel: Node 0 DMA: 24kB (U) 18kB (U) 116kB (U) 132kB (U) 364kB (U) 0128kB 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15872kB
Jun 15 06:45:14 host kernel: Node 0 DMA32: 54kB (UM) 38kB (M) 616kB (M) 632kB (M) 464kB (M) 6128kB (M) 5256kB (UM) 7512kB (UM) 91024kB (UM) 72048kB (UM) 6344096kB (M) = 2626636kB
Jun 15 06:45:14 host kernel: Node 0 Normal: 169974kB (UME) 192868kB (UM) 639816kB (UME) 200932kB (UME) 3764kB (UME) 192128kB (UME) 173256kB (UM) 53512kB (UM) 7521024kB (UME) 3562048kB (U) 136294096kB (M) = 57810820kB
Jun 15 06:45:14 host kernel: Node 1 Normal: 524174kB (UME) 378348kB (UME) 1751816kB (UME) 2570332kB (UME) 1020064kB (UME) 6514128kB (UME) 794256kB (UME) 755512kB (UE) 6851024kB (UE) 1562048kB (U) 188794096kB (M) = 82040852kB
Jun 15 06:45:14 host kernel: lxc-start: vmalloc: allocation failure: 4096 bytes, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=ns,mems_allowed=0-1
Jun 15 06:45:14 host kernel: Call Trace:
Jun 15 06:45:14 host kernel: dump_stack+0x6d/0x9a
Jun 15 06:45:14 host kernel: warn_alloc.cold.119+0x7b/0xdd
Jun 15 06:45:14 host kernel: ? __get_vm_area_node+0x149/0x160
Jun 15 06:45:14 host kernel: ? bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: __vmalloc_node_range+0x1aa/0x270
Jun 15 06:45:14 host kernel: ? pcpu_block_refresh_hint+0xb0/0xf0
Jun 15 06:45:14 host kernel: ? bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: module_alloc+0x82/0xe0
Jun 15 06:45:14 host kernel: ? bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: bpf_jit_alloc_exec+0xe/0x10
Jun 15 06:45:14 host kernel: bpf_jit_binary_alloc+0x63/0xf0
Jun 15 06:45:14 host kernel: ? emit_mov_reg+0xf0/0xf0
Jun 15 06:45:14 host kernel: bpf_int_jit_compile+0x133/0x34d
Jun 15 06:45:14 host kernel: bpf_prog_select_runtime+0xcd/0x150
Jun 15 06:45:14 host kernel: bpf_prepare_filter+0x52e/0x5a0
Jun 15 06:45:14 host kernel: bpf_prog_create_from_user+0xc5/0x110
Jun 15 06:45:14 host kernel: ? hardlockup_detector_perf_cleanup.cold.9+0x1a/0x1a
Jun 15 06:45:14 host kernel: do_seccomp+0x2bf/0x8d0
Jun 15 06:45:14 host kernel: __x64_sys_seccomp+0x1a/0x20
Jun 15 06:45:14 host kernel: do_syscall_64+0x57/0x190
Jun 15 06:45:14 host kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jun 15 06:45:14 host kernel: RIP: 0033:0x7fbfc709bf59
Jun 15 06:45:14 host kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 07 6f 0c 00 f7 d8 64 89 01 48
Jun 15 06:45:14 host kernel: RSP: 002b:00007ffd36a591b8 EFLAGS: 00000246 ORIG_RAX: 000000000000013d
Jun 15 06:45:14 host kernel: RAX: ffffffffffffffda RBX: 00005597e2acf440 RCX: 00007fbfc709bf59
Jun 15 06:45:14 host kernel: RDX: 00005597e2ade6f0 RSI: 0000000000000000 RDI: 0000000000000001
Jun 15 06:45:14 host kernel: RBP: 00005597e2ade6f0 R08: 00005597e2acf440 R09: 00005597e2ac8cc0
Jun 15 06:45:14 host kernel: R10: 00005597e2ad34a0 R11: 0000000000000246 R12: 00007ffd36a5925c
Jun 15 06:45:14 host kernel: R13: 0000000000000000 R14: 00000000ffffffff R15: 00005597e2ac8cc0
Jun 15 06:45:14 host kernel: Mem-Info:
Jun 15 06:45:14 host kernel: active_anon:46934939 inactive_anon:84738556 isolated_anon:0
active_file:20479475 inactive_file:18648470 isolated_file:0
unevictable:223734 dirty:590 writeback:0 unstable:0
slab_reclaimable:6646485 slab_unreclaimable:25509665
mapped:5764741 shmem:53598 pagetables:2035581 bounce:0
free:35623875 free_pcp:138359 free_cma:0
Jun 15 06:45:14 host kernel: Node 0 active_anon:96891592kB inactive_anon:176347476kB active_file:42523196kB inactive_file:38214056kB unevictable:285892kB isolated(anon):0kB isolated(file):0kB mapped:11951496kB dirty:1392kB writeback:0kB shmem:78572kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jun 15 06:45:14 host kernel: Node 1 active_anon:90848164kB inactive_anon:162606748kB active_file:39394704kB inactive_file:36379824kB unevictable:609044kB isolated(anon):0kB isolated(file):0kB mapped:11107468kB dirty:968kB writeback:0kB shmem:135820kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
Jun 15 06:45:14 host kernel: Node 0 DMA free:15872kB min:0kB low:12kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15872kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 2557 515793 515793 515793
Jun 15 06:45:14 host kernel: Node 0 DMA32 free:2626636kB min:220kB low:2836kB high:5452kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:2732964kB managed:2665112kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:1608kB local_pcp:0kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 0 513236 513236 513236
Jun 15 06:45:14 host kernel: Node 0 Normal free:57810656kB min:44820kB low:570372kB high:1095924kB active_anon:96891592kB inactive_anon:176347476kB active_file:42523196kB inactive_file:38214056kB unevictable:285892kB writepending:1392kB present:533970944kB managed:525553736kB mlocked:285892kB kernel_stack:881128kB pagetables:4131924kB bounce:0kB free_pcp:280648kB local_pcp:1284kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 0 0 0 0
Jun 15 06:45:14 host kernel: Node 1 Normal free:82042336kB min:45064kB low:573476kB high:1101888kB active_anon:90848164kB inactive_anon:162606748kB active_file:39394704kB inactive_file:36379824kB unevictable:609044kB writepending:968kB present:536866816kB managed:528422156kB mlocked:609044kB kernel_stack:973480kB pagetables:4010400kB bounce:0kB free_pcp:271176kB local_pcp:1472kB free_cma:0kB
Jun 15 06:45:14 host kernel: lowmem_reserve[]: 0 0 0 0 0
Jun 15 06:45:14 host kernel: Node 0 DMA: 24kB (U) 18kB (U) 116kB (U) 132kB (U) 364kB (U) 0128kB 1256kB (U) 0512kB 11024kB (U) 12048kB (M) 34096kB (M) = 15872kB
Jun 15 06:45:14 host kernel: Node 0 DMA32: 54kB (UM) 38kB (M) 616kB (M) 632kB (M) 464kB (M) 6128kB (M) 5256kB (UM) 7512kB (UM) 91024kB (UM) 72048kB (UM) 6344096kB (M) = 2626636kB
Jun 15 06:45:14 host kernel: Node 0 Normal: 169974kB (UME) 192868kB (UM) 639816kB (UME) 200932kB (UME) 3764kB (UME) 192128kB (UME) 173256kB (UM) 53512kB (UM) 7521024kB (UME) 3562048kB (U) 136294096kB (M) = 57810820kB
Jun 15 06:45:14 host kernel: Node 1 Normal: 524174kB (UME) 378348kB (UME) 1751816kB (UME) 2570332kB (UME) 1020064kB (UME) 6514128kB (UME) 794256kB (UME) 755512kB (UE) 6851024kB (UE) 1562048kB (U) 188794096kB (M) = 82040852kB
I will try and post the contents of /proc/meminfo once I am able to bring my node back up after some work, but VmallocUsed is showing at around 22GB. I have searched for more info around the topic, but much of it is centered around 32bit constraints that need to be alleviated with a line in the grub boot loader, but I am unsure if that works here, as the 64bit contraint is 34TB or so. The bpf_jit constraint i increased previously seems to be involved. I make extensive use of SWAP (1-2TB) on a round robin array of 8 NVME drives, which allows my containers to dump most of their idle memory (after they have performed their workload) allowing me to make much better use of RAM. The containers remain performant & I have done extensive IO tuning for my workload.
Does anyone have any advice as to where to look next?
Here are further details of my node:
Dual AMD Epyc 7742, 1TB RAM, 8TB SWAP (8x1TB NVME), 72x 2TB SSD
proxmox-ve: 6.2-1 (running kernel: 5.4.41-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-2
pve-kernel-helper: 6.2-2
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
pve-kernel-5.4.27-1-pve: 5.4.27-1
pve-kernel-5.4.24-1-pve: 5.4.24-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.13-3-pve: 5.3.13-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 2.0.1-1+pve8
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-6
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1