That command doesn't show processes from container 154. It's simply matching anything in the output of ps with the text "154" anywhere on the line.Today we had a different issue.
We terminated a CT 154, and Node went RED. 154 got deleted, Node and all other CTs pinging fine.
Still many process for CT 154 running.
root@P158:~# ps aux | grep 154
Correct, 4.14.20 (or later) completely resolve the issue.
Will do. If it fails, I'll figure out exactly which version works but it may take a few days as I can only test in the evenings.could you please test 4.14 (the first release of the 4.14 kernel series) and report whether it works or not? if it does not, this trims down the range of potentially fixing commits quite a lot!
Thanks a lot, I'm hitting this bug too and was confused that I did something wrong. Hopefully you can help pinpointing this annoying bug. Thanks a lot! (I just recently started using PROXMOX, so this is no upgrade-bug for me and it started to get frustrating not being able to use LXC-containers...)Will do. If it fails, I'll figure out exactly which version works but it may take a few days as I can only test in the evenings.
commit 84779085fa10014b9e8208d7e71b54bced73075c
Author: Vasily Averin <vvs@virtuozzo.com>
Date: Thu Nov 2 13:03:42 2017 +0300
lockd: lost rollback of set_grace_period() in lockd_down_net()
commit 3a2b19d1ee5633f76ae8a88da7bc039a5d1732aa upstream.
Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect,
it removes lockd_manager and disarm grace_period_end for init_net only.
If nfsd was started from another net namespace lockd_up_net() calls
set_grace_period() that adds lockd_manager into per-netns list
and queues grace_period_end delayed work.
These action should be reverted in lockd_down_net().
Otherwise it can lead to double list_add on after restart nfsd in netns,
and to use-after-free if non-disarmed delayed work will be executed after netns destroy.
Fixes: efda760fe95e ("lockd: fix lockd shutdown race")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mar 07 12:50:14 host kernel: ------------[ cut here ]------------
Mar 07 12:50:14 host kernel: kernel BUG at fs/nfs_common/grace.c:107!
Mar 07 12:50:14 host kernel: invalid opcode: 0000 [#1] SMP PTI
Mar 07 12:50:14 host kernel: Modules linked in: rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss veth rbd libceph nfsv3 nfs_acl nfs lockd grace fscache ip_set ip6table_filter ip6_tables xfs iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack softdog nfnetlink_log nfnetlink dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper ppdev hid_generic cryptd zfs(PO) zunicode(PO) zavl(PO) icp(PO) snd_pcm snd_timer snd soundcore pcspkr joydev input_leds serio_raw shpchp parport_pc parport qemu_fw_cfg mac_hid usbhid hid zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp
Mar 07 12:50:14 host kernel: libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs xor raid6_pq psmouse virtio_net virtio_scsi floppy pata_acpi i2c_piix4
Mar 07 12:50:14 host kernel: CPU: 1 PID: 90 Comm: kworker/u4:2 Tainted: P O 4.13.13-6-pve #1
Mar 07 12:50:14 host kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.0-0-g63451fca13-prebuilt.qemu-project.org 04/01/2014
Mar 07 12:50:14 host kernel: Workqueue: netns cleanup_net
Mar 07 12:50:14 host kernel: task: ffff941fe5475f00 task.stack: ffffb9a181d30000
Mar 07 12:50:14 host kernel: RIP: 0010:grace_exit_net+0x24/0x30 [grace]
Mar 07 12:50:14 host kernel: RSP: 0000:ffffb9a181d33dc8 EFLAGS: 00010212
Mar 07 12:50:14 host kernel: RAX: ffff941fe6f209e0 RBX: ffff941f902aaf80 RCX: 0000000000000000
Mar 07 12:50:14 host kernel: RDX: ffff941f9010ed38 RSI: ffffffffc0ac1020 RDI: ffff941f902aaf80
Mar 07 12:50:14 host kernel: RBP: ffffb9a181d33dc8 R08: ffff941f9010e0c0 R09: 000000018015000d
Mar 07 12:50:14 host kernel: R10: ffffb9a181d33d18 R11: 0000000000000000 R12: ffffb9a181d33e20
Mar 07 12:50:14 host kernel: R13: ffffffffc0ac1018 R14: ffffffffc0ac1020 R15: 0000000000000000
Mar 07 12:50:14 host kernel: FS: 0000000000000000(0000) GS:ffff941fffd00000(0000) knlGS:0000000000000000
Mar 07 12:50:14 host kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 07 12:50:14 host kernel: CR2: 000056234982b078 CR3: 000000029700a002 CR4: 00000000003606e0
Mar 07 12:50:14 host kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 07 12:50:14 host kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 07 12:50:14 host kernel: Call Trace:
Mar 07 12:50:14 host kernel: ops_exit_list.isra.8+0x3b/0x70
Mar 07 12:50:14 host kernel: cleanup_net+0x1ca/0x2b0
Mar 07 12:50:14 host kernel: process_one_work+0x1ee/0x410
Mar 07 12:50:14 host kernel: worker_thread+0x4b/0x420
Mar 07 12:50:14 host kernel: kthread+0x10c/0x140
Mar 07 12:50:14 host kernel: ? process_one_work+0x410/0x410
Mar 07 12:50:14 host kernel: ? kthread_create_on_node+0x70/0x70
Mar 07 12:50:14 host kernel: ret_from_fork+0x35/0x40
Mar 07 12:50:14 host kernel: Code: 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 15 79 22 00 00 48 8b 87 88 12 00 00 55 48 89 e5 48 8b 04 d0 48 8b 10 48 39 d0 75 02 5d c3 <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 8b 15 49 22
Mar 07 12:50:14 host kernel: RIP: grace_exit_net+0x24/0x30 [grace] RSP: ffffb9a181d33dc8
Mar 07 12:50:14 host kernel: ---[ end trace ce4a24d79fcca3bb ]---
Seems not so easy for the proxmox-team so I'll try to have this reproducable within some virtualbox or something like that.Reproducing is easy.
Create 5 LXC CT, run a cron to stop and start each CT every minute.
Within minutes you will see issue.
grep copy_net_ns /proc/*/stack