LXC container reboot fails - LXC becomes unusable

I just updated the bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=1943 - sadly could still not reproduce the issue locally (despite sending out quite some (fragmented) ipv6 traffic and restaring containers).
If possible please provide the requested information in the bug-report
Thanks!
 
I noted this on the bug as well..

We ( @seneca214 and me) were able to reproduce the bug with the ip6tables block in place, unfortunately. This time the spinlock was in a kernel tree with the ipv4 version of the same exit_frags_net kernel process.

@seneca214 noted that there was a lot of mDNS broadcast traffic hitting this machine so maybe that's what is doing it.

I enabled the firewall at the cluster level and added MDNS macro to the drop chain and then made sure the default action was to allow so that we didn't lose access to anything else while testing this.
 
+1 me too.

I am not using the proxmox fw at all (disabled) and up to this point I have not seen this behavior before. Some nodes work fine, some are seeing this issue. pveversion for all nodes:
Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
pve-kernel-4.15: 5.3-2
pve-kernel-4.15.18-11-pve: 4.15.18-33
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
ceph: 12.2.11-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-46
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-38
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-2
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-46
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 
@alexskysilk :
* Please provide the perf-data, workqueue trace and other information as requested in:
https://bugzilla.proxmox.com/show_bug.cgi?id=1943#c4

I just updated the issue's summary and added a comment to clarify what the exact problem described in the issue is (kworker spinning in inet_frags_exit_net) - given that we had quite a few reports of other issues with the same symptoms (kworker using 100% CPU - only fixable by node reset)
 
Does your workaround with iptables still work and prevent the issue from occuring (for those users, which tried to mitigate the issue with it)?

As written in the bugreport (https://bugzilla.proxmox.com/show_bug.cgi?id=1943#c20) I still was not able to reproduce the issue locally despite additionally introducing mDNS traffic into the test-setup
 
So far, we've been unable to reproduce the issue with any server that's been rebooted with the firewall rules in place.

If nothing else, this seems to greatly mitigate the issue.
 
Hey,

is here any Bug Fix? After an reboot from an LXC, the Host is go Offline. 1 CPU on 100% always Question mark on the Panel. Only an reboot fix this for a moment to the next reboot.

I hope anyone can Help me!

Regards Thorsten
 
Since last update: On my last tests with my CT's i have seen that if i reboot or shutdown the CT's from inside, then everything hangs. You have to kill the LXCprocess manually and reboot the host. But if i shutdown the CT from the PVE Webinterface everything goes fine. Tested it 20 times.
Code:
pve-manager/5.4-3/0a6eaa62 (running kernel: 4.15.18-12-pve)
 
When the kworker issue is present we do see the web console show grey icons on all containers. This does sound like the same issue.
 
Same problem here:
abc.png

kworker 100%:
kworker.png
startup lxc failing:
fail.png


The topic title says "solved", where can I find the details how to resolve this issue on my system?

pveversion -V:
Code:
root@server:~# uname -a
Linux server 4.15.18-13-pve #1 SMP PVE 4.15.18-37 (Sat, 13 Apr 2019 21:09:15 +0200) x86_64 GNU/Linux
root@server:~# pveversion -V
proxmox-ve: 5.4-1 (running kernel: 4.15.18-13-pve)
pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
pve-kernel-4.15: 5.4-1
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-41
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-36
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-3
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-50
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
[CODE]


The topic title says "solved", where can I find the details how to resolve this issue on my system?
 
The topic title says "solved", where can I find the details how to resolve this issue on my system?

I have removed "Solved" from the title as the only solution is to manually install and maintain a 4.18+ kernel which isn't feasible / desirable for most users.
 
  • Like
Reactions: fireon and Calmor
I do however have mounted NFS resources from within my containers. I read somewhere this is not advised by the proxmox team. So I should make use of bind mounts. Is there any evidence NFS mounts have a relation with this issue?
 
I had the same problem. I fixed it by removing openvswitch and changing back to linux-bridge.
Working till now.
 
I do however have mounted NFS resources from within my containers. I read somewhere this is not advised by the proxmox team. So I should make use of bind mounts. Is there any evidence NFS mounts have a relation with this issue?
I have seen the issue on a Proxmox node without any (client or server) NFS.
 
I have seen the issue on a Proxmox node without any (client or server) NFS.

Alright, thanks for posting.

However, just to be sure, I just moved all NFS mounts to the proxmox host, and 'bind mounted' all of them to the individual lxc's.

I find it strange that Proxmox forces me to select one of the 'content types (see screenshot below) An alternative is a systemd mount service, but I'd prefer to do NFS mounting via the gui.
snip.png
Now I got this 'snippets' folder inside my downloads mount, but anyways, that's a different story..

Will update here when I'm encountering the same issue again, let's see if this fixes mine.
 
Now I got this 'snippets' folder inside my downloads mount, but anyways, that's a different story..

that's just a recent feature addition, no need to worry
 
that's just a recent feature addition, no need to worry

Thanks for the info. I'm actually not worrying at all because of that folder, I'm just questioning why the PVE team decided that you must select a content type, when defining an NFS mount.
 
In the screenshot, you are defining a storage and the content type is used in other parts of Proxmox VE for eg. filter views, or if CTs/VMs can be migrated there. If you want to use a NFS mount for other purposes or as a directory storage, then you need to go through the fstab.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!