LXC container reboot fails - LXC becomes unusable

Discussion in 'Proxmox VE: Installation and configuration' started by denos, Feb 7, 2018.

  1. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,269
    Likes Received:
    117
    I just updated the bugreport https://bugzilla.proxmox.com/show_bug.cgi?id=1943 - sadly could still not reproduce the issue locally (despite sending out quite some (fragmented) ipv6 traffic and restaring containers).
    If possible please provide the requested information in the bug-report
    Thanks!
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  2. foobar73

    foobar73 New Member

    Joined:
    Jan 19, 2016
    Messages:
    8
    Likes Received:
    0
    I noted this on the bug as well..

    We ( @seneca214 and me) were able to reproduce the bug with the ip6tables block in place, unfortunately. This time the spinlock was in a kernel tree with the ipv4 version of the same exit_frags_net kernel process.

    @seneca214 noted that there was a lot of mDNS broadcast traffic hitting this machine so maybe that's what is doing it.

    I enabled the firewall at the cluster level and added MDNS macro to the drop chain and then made sure the default action was to allow so that we didn't lose access to anything else while testing this.
     
  3. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
    +1 me too.

    I am not using the proxmox fw at all (disabled) and up to this point I have not seen this behavior before. Some nodes work fine, some are seeing this issue. pveversion for all nodes:
    Code:
    proxmox-ve: 5.3-1 (running kernel: 4.15.18-11-pve)
    pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
    pve-kernel-4.15: 5.3-2
    pve-kernel-4.15.18-11-pve: 4.15.18-33
    pve-kernel-4.15.18-9-pve: 4.15.18-30
    pve-kernel-4.15.18-7-pve: 4.15.18-27
    ceph: 12.2.11-pve1
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-3
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-46
    libpve-guest-common-perl: 2.0-20
    libpve-http-server-perl: 2.0-11
    libpve-storage-perl: 5.0-38
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-3
    lxcfs: 3.0.3-pve1
    novnc-pve: 1.0.0-2
    openvswitch-switch: 2.7.0-3
    proxmox-widget-toolkit: 1.0-22
    pve-cluster: 5.0-33
    pve-container: 2.0-34
    pve-docs: 5.3-2
    pve-edk2-firmware: 1.20181023-1
    pve-firewall: 3.0-17
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-6
    pve-i18n: 1.0-9
    pve-libspice-server1: 0.14.1-2
    pve-qemu-kvm: 2.12.1-1
    pve-xtermjs: 3.10.1-1
    qemu-server: 5.0-46
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.12-pve1~bpo1
     
  4. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,269
    Likes Received:
    117
    @alexskysilk :
    * Please provide the perf-data, workqueue trace and other information as requested in:
    https://bugzilla.proxmox.com/show_bug.cgi?id=1943#c4

    I just updated the issue's summary and added a comment to clarify what the exact problem described in the issue is (kworker spinning in inet_frags_exit_net) - given that we had quite a few reports of other issues with the same symptoms (kworker using 100% CPU - only fixable by node reset)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,269
    Likes Received:
    117
    Does your workaround with iptables still work and prevent the issue from occuring (for those users, which tried to mitigate the issue with it)?

    As written in the bugreport (https://bugzilla.proxmox.com/show_bug.cgi?id=1943#c20) I still was not able to reproduce the issue locally despite additionally introducing mDNS traffic into the test-setup
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  6. seneca214

    seneca214 New Member

    Joined:
    Dec 3, 2012
    Messages:
    23
    Likes Received:
    3
    So far, we've been unable to reproduce the issue with any server that's been rebooted with the firewall rules in place.

    If nothing else, this seems to greatly mitigate the issue.
     
  7. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
  8. sQuote.de Thorsten

    sQuote.de Thorsten New Member
    Proxmox Subscriber

    Joined:
    Dec 3, 2018
    Messages:
    29
    Likes Received:
    0
    Hey,

    is here any Bug Fix? After an reboot from an LXC, the Host is go Offline. 1 CPU on 100% always Question mark on the Panel. Only an reboot fix this for a moment to the next reboot.

    I hope anyone can Help me!

    Regards Thorsten
     
  9. fireon

    fireon Well-Known Member
    Proxmox Subscriber

    Joined:
    Oct 25, 2010
    Messages:
    2,976
    Likes Received:
    181
    Since last update: On my last tests with my CT's i have seen that if i reboot or shutdown the CT's from inside, then everything hangs. You have to kill the LXCprocess manually and reboot the host. But if i shutdown the CT from the PVE Webinterface everything goes fine. Tested it 20 times.
    Code:
    pve-manager/5.4-3/0a6eaa62 (running kernel: 4.15.18-12-pve)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  10. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    1,269
    Likes Received:
    117
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  11. seneca214

    seneca214 New Member

    Joined:
    Dec 3, 2012
    Messages:
    23
    Likes Received:
    3
    When the kworker issue is present we do see the web console show grey icons on all containers. This does sound like the same issue.
     
  12. Kerel

    Kerel New Member

    Joined:
    Oct 24, 2018
    Messages:
    4
    Likes Received:
    0
    Same problem here:
    abc.png

    kworker 100%:
    kworker.png
    startup lxc failing:
    fail.png


    The topic title says "solved", where can I find the details how to resolve this issue on my system?

    pveversion -V:
    Code:
    root@server:~# uname -a
    Linux server 4.15.18-13-pve #1 SMP PVE 4.15.18-37 (Sat, 13 Apr 2019 21:09:15 +0200) x86_64 GNU/Linux
    root@server:~# pveversion -V
    proxmox-ve: 5.4-1 (running kernel: 4.15.18-13-pve)
    pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
    pve-kernel-4.15: 5.4-1
    pve-kernel-4.15.18-13-pve: 4.15.18-37
    pve-kernel-4.15.18-9-pve: 4.15.18-30
    pve-kernel-4.15.18-7-pve: 4.15.18-27
    pve-kernel-4.15.17-1-pve: 4.15.17-9
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-8
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-51
    libpve-guest-common-perl: 2.0-20
    libpve-http-server-perl: 2.0-13
    libpve-storage-perl: 5.0-41
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-3
    lxcfs: 3.0.3-pve1
    novnc-pve: 1.0.0-3
    proxmox-widget-toolkit: 1.0-26
    pve-cluster: 5.0-36
    pve-container: 2.0-37
    pve-docs: 5.4-2
    pve-edk2-firmware: 1.20190312-1
    pve-firewall: 3.0-20
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-9
    pve-i18n: 1.1-4
    pve-libspice-server1: 0.14.1-2
    pve-qemu-kvm: 2.12.1-3
    pve-xtermjs: 3.12.0-1
    qemu-server: 5.0-50
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.13-pve1~bpo2
    [CODE]
    

    The topic title says "solved", where can I find the details how to resolve this issue on my system?
     
  13. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    74
    Likes Received:
    34
    I have removed "Solved" from the title as the only solution is to manually install and maintain a 4.18+ kernel which isn't feasible / desirable for most users.
     
    fireon and Calmor like this.
  14. Kerel

    Kerel New Member

    Joined:
    Oct 24, 2018
    Messages:
    4
    Likes Received:
    0
    I do however have mounted NFS resources from within my containers. I read somewhere this is not advised by the proxmox team. So I should make use of bind mounts. Is there any evidence NFS mounts have a relation with this issue?
     
  15. mac.linux.free

    Joined:
    Jan 29, 2017
    Messages:
    101
    Likes Received:
    5
    I had the same problem. I fixed it by removing openvswitch and changing back to linux-bridge.
    Working till now.
     
  16. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    74
    Likes Received:
    34
    I have seen the issue on a Proxmox node without any (client or server) NFS.
     
  17. Kerel

    Kerel New Member

    Joined:
    Oct 24, 2018
    Messages:
    4
    Likes Received:
    0
    Alright, thanks for posting.

    However, just to be sure, I just moved all NFS mounts to the proxmox host, and 'bind mounted' all of them to the individual lxc's.

    I find it strange that Proxmox forces me to select one of the 'content types (see screenshot below) An alternative is a systemd mount service, but I'd prefer to do NFS mounting via the gui.
    snip.png
    Now I got this 'snippets' folder inside my downloads mount, but anyways, that's a different story..

    Will update here when I'm encountering the same issue again, let's see if this fixes mine.
     
  18. oguz

    oguz Proxmox Staff Member
    Staff Member

    Joined:
    Nov 19, 2018
    Messages:
    603
    Likes Received:
    63
    that's just a recent feature addition, no need to worry
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  19. Kerel

    Kerel New Member

    Joined:
    Oct 24, 2018
    Messages:
    4
    Likes Received:
    0
    Thanks for the info. I'm actually not worrying at all because of that folder, I'm just questioning why the PVE team decided that you must select a content type, when defining an NFS mount.
     
  20. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,345
    Likes Received:
    212
    In the screenshot, you are defining a storage and the content type is used in other parts of Proxmox VE for eg. filter views, or if CTs/VMs can be migrated there. If you want to use a NFS mount for other purposes or as a directory storage, then you need to go through the fstab.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice