abysmal write performance on ZFS lxc container? (now with kernel oops)

Neuer_User

Active Member
Jan 5, 2016
22
5
43
56
Hello

I installed 4.1 on a hp microserver with 12GB RAM and a 3TB HDD. I mainly use lxc containers and added a local ZFS storage for these.

I noticed extremely bad write performance for these containers. This morning I installed webmin on a debian container. The dpkg install process of the one webmin.deb took incredible 15 min. The system was mainly idle, but IO waits were fully using one processor.

Is there anything wrong with my setup or are these abysmal write performances normal for a ZFS lxc container?

Thanks

Michael

P.S.: Otherwise a very nice system. I do very much like lxc snapshots. Great feature.
 
Ooops, a few minutes later I even get memory allocation failures!

Not good. Complete kernel oops. "SLUB: Unable to allocate memory on node -1" and "Possible memory allocation deadlock". System is hanging (100% CPU usage). I guess, I need to pull the plug.

Before that happened about 4,3 GB RAM were in use. Hmm...
 
After a hard reboot, this is what syslog has recorded:
Code:
Jan  5 09:14:08 proxmox kernel: [435636.004894] SLUB: Unable to allocate memory on node -1 (gfp=0xd0)
Jan  5 09:14:08 proxmox kernel: [435636.004899]   cache: kmalloc-128(6:136), object size: 128, buffer size: 128, default order: 0, min order: 0
Jan  5 09:14:08 proxmox kernel: [435636.004900]   node 0: slabs: 80, objs: 2560, free: 0
Jan  5 09:14:08 proxmox kernel: [435636.004903] vmalloc: allocation failure: 8920 bytes
Jan  5 09:14:08 proxmox kernel: [435636.004904] init: page allocation failure: order:0, mode:0xd2
Jan  5 09:14:08 proxmox kernel: [435636.004907] CPU: 1 PID: 12772 Comm: init Tainted: P           O    4.2.6-1-pve #1
Jan  5 09:14:08 proxmox kernel: [435636.004908] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 06/06/2014
Jan  5 09:14:08 proxmox kernel: [435636.004910]  0000000000000000 00000000b682673c ffff880296fa79c8 ffffffff81801028
Jan  5 09:14:08 proxmox kernel: [435636.004914]  0000000000000000 00000000000000d2 ffff880296fa7a58 ffffffff81187864
Jan  5 09:14:08 proxmox kernel: [435636.004916]  ffffffff81cb9218 ffff880296fa79e8 0000000000000018 ffff880296fa7a68
Jan  5 09:14:08 proxmox kernel: [435636.004918] Call Trace:
Jan  5 09:14:08 proxmox kernel: [435636.004925]  [<ffffffff81801028>] dump_stack+0x45/0x57
Jan  5 09:14:08 proxmox kernel: [435636.004929]  [<ffffffff81187864>] warn_alloc_failed+0xf4/0x150
Jan  5 09:14:08 proxmox kernel: [435636.004933]  [<ffffffff811c1018>] ? __get_vm_area_node+0x118/0x130
Jan  5 09:14:08 proxmox kernel: [435636.004935]  [<ffffffff811c2b5b>] __vmalloc_node_range+0x21b/0x2a0
Jan  5 09:14:08 proxmox kernel: [435636.004939]  [<ffffffff814c3849>] ? n_tty_open+0x19/0xe0
Jan  5 09:14:08 proxmox kernel: [435636.004941]  [<ffffffff811c2ea4>] vmalloc+0x54/0x60
Jan  5 09:14:08 proxmox kernel: [435636.004943]  [<ffffffff814c3849>] ? n_tty_open+0x19/0xe0
Jan  5 09:14:08 proxmox kernel: [435636.004945]  [<ffffffff814c3849>] n_tty_open+0x19/0xe0
Jan  5 09:14:08 proxmox kernel: [435636.004948]  [<ffffffff814c77a2>] tty_ldisc_open.isra.2+0x32/0x60
Jan  5 09:14:08 proxmox kernel: [435636.004951]  [<ffffffff814c8069>] tty_ldisc_setup+0x39/0x70
Jan  5 09:14:08 proxmox kernel: [435636.004953]  [<ffffffff814c167c>] tty_init_dev+0x9c/0x1b0
Jan  5 09:14:08 proxmox kernel: [435636.004956]  [<ffffffff8127ce57>] ? devpts_new_index+0x107/0x120
Jan  5 09:14:08 proxmox kernel: [435636.004958]  [<ffffffff814ca36f>] ptmx_open+0x8f/0x170
Jan  5 09:14:08 proxmox kernel: [435636.004961]  [<ffffffff81201a9f>] chrdev_open+0xbf/0x1b0
Jan  5 09:14:08 proxmox kernel: [435636.004965]  [<ffffffff811fa97f>] do_dentry_open+0x1cf/0x310
Jan  5 09:14:08 proxmox kernel: [435636.004967]  [<ffffffff812019e0>] ? cdev_put+0x30/0x30
Jan  5 09:14:08 proxmox kernel: [435636.004970]  [<ffffffff811fbd98>] vfs_open+0x58/0x60
Jan  5 09:14:08 proxmox kernel: [435636.004972]  [<ffffffff8120b6b0>] path_openat+0x210/0x14d0
Jan  5 09:14:08 proxmox kernel: [435636.004975]  [<ffffffff8120db9a>] do_filp_open+0x8a/0x100
Jan  5 09:14:08 proxmox kernel: [435636.004978]  [<ffffffff8121b456>] ? __alloc_fd+0x46/0x110
Jan  5 09:14:08 proxmox kernel: [435636.004981]  [<ffffffff811fc149>] do_sys_open+0x139/0x280
Jan  5 09:14:08 proxmox kernel: [435636.004984]  [<ffffffff81252ebb>] compat_SyS_open+0x1b/0x20
Jan  5 09:14:08 proxmox kernel: [435636.004987]  [<ffffffff8180a7ca>] ia32_do_call+0x1b/0x25
Jan  5 09:14:08 proxmox kernel: [435636.004989] Mem-Info:
Jan  5 09:14:08 proxmox kernel: [435636.004993] active_anon:221087 inactive_anon:218265 isolated_anon:0
Jan  5 09:14:08 proxmox kernel: [435636.004993]  active_file:37536 inactive_file:17174 isolated_file:0
Jan  5 09:14:08 proxmox kernel: [435636.004993]  unevictable:878 dirty:4 writeback:1 unstable:0
Jan  5 09:14:08 proxmox kernel: [435636.004993]  slab_reclaimable:16559 slab_unreclaimable:491800
Jan  5 09:14:08 proxmox kernel: [435636.004993]  mapped:38194 shmem:118374 pagetables:4802 bounce:0
Jan  5 09:14:08 proxmox kernel: [435636.004993]  free:1604374 free_pcp:361 free_cma:0
Jan  5 09:14:08 proxmox kernel: [435636.004996] Node 0 DMA free:15892kB min:16kB low:20kB high:24kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15892kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan  5 09:14:08 proxmox kernel: [435636.005002] lowmem_reserve[]: 0 2683 11828 11828
Jan  5 09:14:08 proxmox kernel: [435636.005004] Node 0 DMA32 free:1434696kB min:3156kB low:3944kB high:4732kB active_anon:201260kB inactive_anon:200928kB active_file:32924kB inactive_file:16460kB unevictable:752kB isolated(anon):0kB isolated(file):0kB present:2832272kB managed:2751976kB mlocked:752kB dirty:12kB writeback:4kB mapped:30328kB shmem:107864kB slab_reclaimable:13688kB slab_unreclaimable:475816kB kernel_stack:4656kB pagetables:3776kB unstable:0kB bounce:0kB free_pcp:748kB local_pcp:192kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan  5 09:14:08 proxmox kernel: [435636.005009] lowmem_reserve[]: 0 0 9144 9144
Jan  5 09:14:08 proxmox kernel: [435636.005011] Node 0 Normal free:4966908kB min:10752kB low:13440kB high:16128kB active_anon:683088kB inactive_anon:672132kB active_file:117220kB inactive_file:52236kB unevictable:2760kB isolated(anon):0kB isolated(file):0kB present:9568252kB managed:9363936kB mlocked:2760kB dirty:4kB writeback:0kB mapped:122448kB shmem:365632kB slab_reclaimable:52548kB slab_unreclaimable:1491384kB kernel_stack:18224kB pagetables:15432kB unstable:0kB bounce:0kB free_pcp:696kB local_pcp:200kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan  5 09:14:08 proxmox kernel: [435636.005015] lowmem_reserve[]: 0 0 0 0
Jan  5 09:14:08 proxmox kernel: [435636.005017] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15892kB
Jan  5 09:14:08 proxmox kernel: [435636.005026] Node 0 DMA32: 2*4kB (EM) 30*8kB (UEM) 110*16kB (EM) 635*32kB (EM) 278*64kB (EM) 2485*128kB (UEM) 1651*256kB (UEM) 743*512kB (UEM) 219*1024kB (UM) 18*2048kB (M) 3*4096kB (M) = 1434680kB
Jan  5 09:14:08 proxmox kernel: [435636.005036] Node 0 Normal: 103*4kB (UE) 6*8kB (EM) 6*16kB (UE) 307*32kB (UEM) 684*64kB (EM) 6083*128kB (UEM) 5581*256kB (UEM) 2966*512kB (UEM) 475*1024kB (UEM) 214*2048kB (UEM) 64*4096kB (UEM) = 4966924kB
Jan  5 09:14:08 proxmox kernel: [435636.005046] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
 
second part:
Code:
Jan  5 09:14:08 proxmox kernel: [435636.005047] 175899 total pagecache pages
Jan  5 09:14:08 proxmox kernel: [435636.005049] 2247 pages in swap cache
Jan  5 09:14:08 proxmox kernel: [435636.005050] Swap cache stats: add 49884, delete 47637, find 441438/447781
Jan  5 09:14:08 proxmox kernel: [435636.005051] Free swap  = 11397980kB
Jan  5 09:14:08 proxmox kernel: [435636.005052] Total swap = 11534332kB
Jan  5 09:14:08 proxmox kernel: [435636.005053] 3104125 pages RAM
Jan  5 09:14:08 proxmox kernel: [435636.005054] 0 pages HighMem/MovableOnly
Jan  5 09:14:08 proxmox kernel: [435636.005055] 71174 pages reserved
Jan  5 09:14:08 proxmox kernel: [435636.005055] 0 pages cma reserved
Jan  5 09:14:08 proxmox kernel: [435636.005056] 0 pages hwpoisoned
Jan  5 09:14:08 proxmox kernel: [435636.005058] tty_init_dev: ldisc open failed, clearing slot 2
Jan  5 09:14:08 proxmox kernel: [435636.005076] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
Jan  5 09:14:08 proxmox kernel: [435636.005120] IP: [<ffffffff8127ce7c>] devpts_kill_index+0xc/0x70
Jan  5 09:14:08 proxmox kernel: [435636.005148] PGD 240a9d067 PUD 0
Jan  5 09:14:08 proxmox kernel: [435636.005175] Oops: 0000 [#1] SMP
Jan  5 09:14:08 proxmox kernel: [435636.005202] Modules linked in: cfg80211 veth ip_set ip6table_filter ip6_tables softdog iptable_filter ip_tables x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 8021q garp mrp nfnetlink_log nfnetlink intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel gpio_ich kvm ipmi_ssif crct10dif_pclmul crc32_pclmul cryptd snd_pcm snd_timer hpilo snd psmouse soundcore pcspkr serio_raw shpchp 8250_fintek lpc_ich ipmi_si ipmi_msghandler ie31200_edac edac_core acpi_power_meter mac_hid vhost_net vhost macvtap macvlan autofs4 zfs(PO) zunicode(PO) zcommon(PO) znvpair(PO) spl(O) zavl(PO) uas usb_storage tg3 ptp pps_core ahci libahci
Jan  5 09:14:08 proxmox kernel: [435636.005692] CPU: 1 PID: 12772 Comm: init Tainted: P           O    4.2.6-1-pve #1
Jan  5 09:14:08 proxmox kernel: [435636.005728] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 06/06/2014
Jan  5 09:14:08 proxmox kernel: [435636.005752] task: ffff8803075ccc80 ti: ffff880296fa4000 task.ti: ffff880296fa4000
Jan  5 09:14:08 proxmox kernel: [435636.005788] RIP: 0010:[<ffffffff8127ce7c>]  [<ffffffff8127ce7c>] devpts_kill_index+0xc/0x70
Jan  5 09:14:08 proxmox kernel: [435636.005828] RSP: 0000:ffff880296fa7b38  EFLAGS: 00010286
Jan  5 09:14:08 proxmox kernel: [435636.005849] RAX: ffffffff814ca0a0 RBX: ffff88004aaa2400 RCX: 0000000000000030
Jan  5 09:14:08 proxmox kernel: [435636.005884] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
Jan  5 09:14:08 proxmox kernel: [435636.005920] RBP: ffff880296fa7b48 R08: 000000000000000a R09: 00000000fffffffe
Jan  5 09:14:08 proxmox kernel: [435636.005955] R10: 000000000002963b R11: 0000000000000002 R12: 00000000fffffff4
Jan  5 09:14:08 proxmox kernel: [435636.005989] R13: 0000000000000002 R14: ffff8800207cd000 R15: 0000000000000002
Jan  5 09:14:08 proxmox kernel: [435636.006025] FS:  0000000000000000(0000) GS:ffff88033a640000(0063) knlGS:00000000f755aa60
Jan  5 09:14:08 proxmox kernel: [435636.006061] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
Jan  5 09:14:08 proxmox kernel: [435636.006084] CR2: 0000000000000028 CR3: 0000000329eeb000 CR4: 00000000001406e0
Jan  5 09:14:08 proxmox kernel: [435636.006119] Stack:
Jan  5 09:14:08 proxmox kernel: [435636.006136]  ffff88004aaa2400 00000000fffffff4 ffff880296fa7b58 ffffffff814ca0b8
Jan  5 09:14:08 proxmox kernel: [435636.006186]  ffff880296fa7b78 ffffffff814c0b7b 000060fcc4809570 fffffffffffffff4
Jan  5 09:14:08 proxmox kernel: [435636.006235]  ffff880296fa7bb8 ffffffff814c16b4 ffff880296fa7bb8 ffffffff8127ce57
Jan  5 09:14:08 proxmox kernel: [435636.006284] Call Trace:
Jan  5 09:14:08 proxmox kernel: [435636.006304]  [<ffffffff814ca0b8>] pty_unix98_shutdown+0x18/0x20
Jan  5 09:14:08 proxmox kernel: [435636.006328]  [<ffffffff814c0b7b>] release_tty+0x3b/0x100
Jan  5 09:14:08 proxmox kernel: [435636.006351]  [<ffffffff814c16b4>] tty_init_dev+0xd4/0x1b0
Jan  5 09:14:08 proxmox kernel: [435636.006374]  [<ffffffff8127ce57>] ? devpts_new_index+0x107/0x120
Jan  5 09:14:08 proxmox kernel: [435636.006397]  [<ffffffff814ca36f>] ptmx_open+0x8f/0x170
Jan  5 09:14:08 proxmox kernel: [435636.006420]  [<ffffffff81201a9f>] chrdev_open+0xbf/0x1b0
Jan  5 09:14:08 proxmox kernel: [435636.006444]  [<ffffffff811fa97f>] do_dentry_open+0x1cf/0x310
Jan  5 09:14:08 proxmox kernel: [435636.006467]  [<ffffffff812019e0>] ? cdev_put+0x30/0x30
Jan  5 09:14:08 proxmox kernel: [435636.006490]  [<ffffffff811fbd98>] vfs_open+0x58/0x60
Jan  5 09:14:08 proxmox kernel: [435636.006513]  [<ffffffff8120b6b0>] path_openat+0x210/0x14d0
Jan  5 09:14:08 proxmox kernel: [435636.006536]  [<ffffffff8120db9a>] do_filp_open+0x8a/0x100
Jan  5 09:14:08 proxmox kernel: [435636.006559]  [<ffffffff8121b456>] ? __alloc_fd+0x46/0x110
Jan  5 09:14:08 proxmox kernel: [435636.006583]  [<ffffffff811fc149>] do_sys_open+0x139/0x280
Jan  5 09:14:08 proxmox kernel: [435636.006606]  [<ffffffff81252ebb>] compat_SyS_open+0x1b/0x20
Jan  5 09:14:08 proxmox kernel: [435636.006629]  [<ffffffff8180a7ca>] ia32_do_call+0x1b/0x25
Jan  5 09:14:08 proxmox kernel: [435636.006651] Code: 01 e8 19 8f 58 00 8b 45 e4 eb ad b8 fb ff ff ff eb a6 b8 ed ff ff ff eb 9f e8 e1 e5 df ff 90 0f 1f 44 00 00 55 48 89 e5 41 54 53 <48> 8b 47 28 41 89 f4 48 81 78 60 d1 1c 00 00 74 10 48 8b 05 54
Jan  5 09:14:08 proxmox kernel: [435636.006949] RIP  [<ffffffff8127ce7c>] devpts_kill_index+0xc/0x70
Jan  5 09:14:08 proxmox kernel: [435636.006976]  RSP <ffff880296fa7b38>
Jan  5 09:14:08 proxmox kernel: [435636.006995] CR2: 0000000000000028
Jan  5 09:14:08 proxmox kernel: [435636.007393] ---[ end trace ba6a25c0909e1a07 ]---
Jan  5 09:14:08 proxmox kernel: [435636.013390] ERST: [Firmware Warn]: Firmware does not respond in time.
 
It almost seems that you have a deadlock because of the LXC settings and some ZFS process being completely attributed to the memory and CPU settings of the container. So if you have a vcpu setting of 1 and a certain memory limit, it appears that you can deadlock the container because it is not allowed to complete one ZFS transaction group, and is not allowed to use more system resources because of its limits.

Because it is probably stuck in the ZFS block layer this shows up as iowait.
 
Thanks for the feedback.

This is really a serious problem as it did completely stall the whole machine. The LXC container has two vCPUs allocated and 1GB of RAM, which should be plenty for its normal processes (simple file server). At least as long as file caching etc is allocated to the host, which it probably is (at least under OpenVZ this was the case). In the end that is kernel memory and it was the kernel with the ZFS filesystem that stalled.

Anything that I can do to take care that that does not happen regularly now?

Also, should I file a bug somewhere? But where? ZFS or LXC?

Thanks

Michael
 
At this point, it is just a supposition. When the container is blocked, you could record the output of this command on the host:
Code:
lxc-info -n <containerid>

By default the Proxmox containers do not account for kernel usage, and you should see a 0 in the output of lxc-info in the Kmem use line, but maybe the ZFS cache counts towards userspace memory. If lxc-info does not show Memory use at the maximum usage, then perhaps there is something else going on.
 
Well, maybe I wasn't very clear here. The block was not just inside the container. The block stalled the whole host machine. So, there is, unfortunately, no chance to enter any commands when such a block may happen again.

On the other hand, "By default the Proxmox containers do not account for kernel usage, and you should see a 0 in the output of lxc-info in the Kmem use line," is definitely not true. Even currently the output of lxc-info is:
Code:
Name:           136
State:          RUNNING
PID:            3204
IP:             172.17.5.10
CPU use:        2724.95 seconds
BlkIO use:      109.16 MiB
Memory use:     513.34 MiB
KMem use:       301.49 MiB
Link:           veth136i0
TX bytes:      491.73 MiB
RX bytes:      1.55 GiB
Total bytes:   2.03 GiB

I just found something maybe of interest in the changelog of "pve-container":
Code:
pve-container (1.0-34) unstable; urgency=medium * Revert "set memory.kmem.limit_in_bytes" -- Proxmox Support Team <support@proxmox.com> Thu, 17 Dec 2015 12:28:11 +0100
I will try to find, if there is a bug report for that. It sounds related.

I am currently on 1.0-32, so I do not yet have that patch.
 
Yeah, that sounds resonable. Probably that's the cause. Would be great, if it thus is already fixed. I will update the host system and see, if it happens again.

Although, the kernel oops was just a secondary effect. The primary effect was that the unpacking of a deb archive in the container took 15 min, most of the usage being IO waits. The question will be if this would also be remedied or if it only solves the secondary effect...
 
I can confirm this:
The primary effect was that the unpacking of a deb archive in the container took 15 min, most of the usage being IO waits.
Processes in LXC container on ZFS that write to a disk and consume memory are stalled (D) most of time. E.g cc1plus, compiling takes ages.
proxmox-ve: 5.1-42 (running kernel: 4.13.16-2-pve)
pve-manager: 5.1-49 (running version: 5.1-49/1e427a54)
pve-kernel-4.13: 5.1-44
pve-kernel-4.13.16-2-pve: 4.13.16-47
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-4-pve: 4.13.13-35
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.8-3-pve: 4.13.8-30
pve-kernel-4.13.8-1-pve: 4.13.8-27
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-30
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-18
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-2
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-14
pve-cluster: 5.0-24
pve-container: 2.0-21
pve-docs: 5.1-17
pve-firewall: 3.0-7
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-2
qemu-server: 5.0-24
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.7-pve1~bpo9
 
please don't hijack old, unrelated threads just because symptoms look similar.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!