LXC container reboot fails - LXC becomes unusable

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
2,136
223
63
I just updated the bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=1943 - sadly could still not reproduce the issue locally (despite sending out quite some (fragmented) ipv6 traffic and restaring containers).
If possible please provide the requested information in the bug-report
Thanks!
 

foobar73

New Member
Jan 19, 2016
8
0
1
46
I noted this on the bug as well..

We ( @seneca214 and me) were able to reproduce the bug with the ip6tables block in place, unfortunately. This time the spinlock was in a kernel tree with the ipv4 version of the same exit_frags_net kernel process.

@seneca214 noted that there was a lot of mDNS broadcast traffic hitting this machine so maybe that's what is doing it.

I enabled the firewall at the cluster level and added MDNS macro to the drop chain and then made sure the default action was to allow so that we didn't lose access to anything else while testing this.
 

alexskysilk

Well-Known Member
Oct 16, 2015
598
62
48
Chatsworth, CA
www.skysilk.com
+1 me too.

I am not using the proxmox fw at all (disabled) and up to this point I have not seen this behavior before. Some nodes work fine, some are seeing this issue. pveversion for all nodes:
Code:
proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
pve-kernel-4.15: 5.3-2
pve-kernel-4.15.18-11-pve: 4.15.18-33
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
ceph: 12.2.11-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-46
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-38
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-2
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-34
pve-docs: 5.3-2
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-46
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
2,136
223
63
@alexskysilk :
* Please provide the perf-data, workqueue trace and other information as requested in:
https://bugzilla.proxmox.com/show_bug.cgi?id=1943#c4

I just updated the issue's summary and added a comment to clarify what the exact problem described in the issue is (kworker spinning in inet_frags_exit_net) - given that we had quite a few reports of other issues with the same symptoms (kworker using 100% CPU - only fixable by node reset)
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
2,136
223
63
Does your workaround with iptables still work and prevent the issue from occuring (for those users, which tried to mitigate the issue with it)?

As written in the bugreport (https://bugzilla.proxmox.com/show_bug.cgi?id=1943#c20) I still was not able to reproduce the issue locally despite additionally introducing mDNS traffic into the test-setup
 

seneca214

New Member
Dec 3, 2012
23
3
3
So far, we've been unable to reproduce the issue with any server that's been rebooted with the firewall rules in place.

If nothing else, this seems to greatly mitigate the issue.
 

fireon

Famous Member
Oct 25, 2010
3,081
196
83
Austria/Graz
iteas.at
Since last update: On my last tests with my CT's i have seen that if i reboot or shutdown the CT's from inside, then everything hangs. You have to kill the LXCprocess manually and reboot the host. But if i shutdown the CT from the PVE Webinterface everything goes fine. Tested it 20 times.
Code:
pve-manager/5.4-3/0a6eaa62 (running kernel: 4.15.18-12-pve)
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
2,136
223
63

seneca214

New Member
Dec 3, 2012
23
3
3
When the kworker issue is present we do see the web console show grey icons on all containers. This does sound like the same issue.
 

Kerel

New Member
Oct 24, 2018
5
0
1
42
NL
Same problem here:
abc.png

kworker 100%:
kworker.png
startup lxc failing:
fail.png


The topic title says "solved", where can I find the details how to resolve this issue on my system?

pveversion -V:
Code:
root@server:~# uname -a
Linux server 4.15.18-13-pve #1 SMP PVE 4.15.18-37 (Sat, 13 Apr 2019 21:09:15 +0200) x86_64 GNU/Linux
root@server:~# pveversion -V
proxmox-ve: 5.4-1 (running kernel: 4.15.18-13-pve)
pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
pve-kernel-4.15: 5.4-1
pve-kernel-4.15.18-13-pve: 4.15.18-37
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-41
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-36
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 2.12.1-3
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-50
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2
[CODE]

The topic title says "solved", where can I find the details how to resolve this issue on my system?
 

denos

Member
Jul 27, 2015
74
34
18
The topic title says "solved", where can I find the details how to resolve this issue on my system?
I have removed "Solved" from the title as the only solution is to manually install and maintain a 4.18+ kernel which isn't feasible / desirable for most users.
 
  • Like
Reactions: fireon and Calmor

Kerel

New Member
Oct 24, 2018
5
0
1
42
NL
I do however have mounted NFS resources from within my containers. I read somewhere this is not advised by the proxmox team. So I should make use of bind mounts. Is there any evidence NFS mounts have a relation with this issue?
 
Jan 29, 2017
117
5
23
43
I had the same problem. I fixed it by removing openvswitch and changing back to linux-bridge.
Working till now.
 

denos

Member
Jul 27, 2015
74
34
18
I do however have mounted NFS resources from within my containers. I read somewhere this is not advised by the proxmox team. So I should make use of bind mounts. Is there any evidence NFS mounts have a relation with this issue?
I have seen the issue on a Proxmox node without any (client or server) NFS.
 

Kerel

New Member
Oct 24, 2018
5
0
1
42
NL
I have seen the issue on a Proxmox node without any (client or server) NFS.
Alright, thanks for posting.

However, just to be sure, I just moved all NFS mounts to the proxmox host, and 'bind mounted' all of them to the individual lxc's.

I find it strange that Proxmox forces me to select one of the 'content types (see screenshot below) An alternative is a systemd mount service, but I'd prefer to do NFS mounting via the gui.
snip.png
Now I got this 'snippets' folder inside my downloads mount, but anyways, that's a different story..

Will update here when I'm encountering the same issue again, let's see if this fixes mine.
 

oguz

Proxmox Staff Member
Staff member
Nov 19, 2018
857
87
28
Now I got this 'snippets' folder inside my downloads mount, but anyways, that's a different story..
that's just a recent feature addition, no need to worry
 

Kerel

New Member
Oct 24, 2018
5
0
1
42
NL
that's just a recent feature addition, no need to worry
Thanks for the info. I'm actually not worrying at all because of that folder, I'm just questioning why the PVE team decided that you must select a content type, when defining an NFS mount.
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,020
261
88
In the screenshot, you are defining a storage and the content type is used in other parts of Proxmox VE for eg. filter views, or if CTs/VMs can be migrated there. If you want to use a NFS mount for other purposes or as a directory storage, then you need to go through the fstab.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!