LXC restart creates kworker CPU 100%

Discussion in 'Proxmox VE: Installation and configuration' started by Jean-Pierre, Jan 29, 2018.

  1. Jean-Pierre

    Jean-Pierre New Member
    Proxmox Subscriber

    Joined:
    Dec 22, 2016
    Messages:
    8
    Likes Received:
    2
    Hi

    We have a consistent issue when rebooting LXC conatiners a kworker process eventually locks the system and we have to hard reset the server.

    A line from `top` below for the process that spawns,
    31043 root 20 0 0 0 0 R 100.0 0.0 94:20.90 kworker/u24:3

    The spec of this server which is a standlone server:
    Supermicro 1018R-WC0R with a X10SRW-F main board and a E5-1650 v4 CPU.
    Boots off 2x SSD internal in Raid1 and storage is a internal Raid10 array of 6 1TB SSD drives all software/mdadm raid.
    There is NFS storage attatched for backup images.

    The server was fully updated and rebooted about 9 days ago. The server was initialy installed with Proxmox 5.1 in November 2017. I do see there is another kernel available, however rebooting this production server is not a simple process.
    pveversion -v
    proxmox-ve: 5.1-35 (running kernel: 4.13.13-4-pve)
    pve-manager: 5.1-42 (running version: 5.1-42/724a6cb3)
    pve-kernel-4.13.4-1-pve: 4.13.4-26
    pve-kernel-4.13.13-4-pve: 4.13.13-35
    pve-kernel-4.13.13-1-pve: 4.13.13-31
    libpve-http-server-perl: 2.0-8
    lvm2: 2.02.168-pve6
    corosync: 2.4.2-pve3
    libqb0: 1.0.1-1
    pve-cluster: 5.0-19
    qemu-server: 5.0-19
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-25
    libpve-guest-common-perl: 2.0-14
    libpve-access-control: 5.0-7
    libpve-storage-perl: 5.0-17
    pve-libspice-server1: 0.12.8-3
    vncterm: 1.5-3
    pve-docs: 5.1-16
    pve-qemu-kvm: 2.9.1-5
    pve-container: 2.0-18
    pve-firewall: 3.0-5
    pve-ha-manager: 2.0-4
    ksm-control-daemon: 1.2-2
    glusterfs-client: 3.8.8-1
    lxc-pve: 2.1.1-2
    lxcfs: 2.0.8-1
    criu: 2.11.1-1~bpo90
    novnc-pve: 0.6-4
    smartmontools: 6.5+svn4324-1
    zfsutils-linux: 0.7.3-pve1~bpo9

    We noticed this issue around 12 December 2017 and confirmed that this happens 99.9% of the time any LXC conatiner is restarted/rebooted.
    The kworker process will appaear (@100% CPU) and slowly after a few hours start grinding the server to an almost stand still. I cannot kill this process or even get the server to reboot/halt gracefully a hard reset is required.
    This is for any LXC container on the server even a brand new one.
    I did notice that 9 days ago after the last update and reboot immediatly after I could restart LXC conatiners, however 10 hours later the next container restart caused the same issue.

    The below posts mention what could be the same issue but do not seem to be addressed at all,
    https://forum.proxmox.com/threads/proxmox-ve-5-1-released.37650/page-3#post-187137
    https://forum.proxmox.com/threads/kworker-100-cpu.37795/


    I should also mentioned we have a few servers with similar hardware running Proxmox 4.4 and do not have this issue.

    On a side note. I have a feeling it might have something to do with ACPI and friends and LXC maybe even triggering the old kworker bug somwhow.
    bugs.launchpad.net/ubuntu/+source/linux/+bug/887793

    Any help would be apreciated.
     
    Marius_B likes this.
  2. Marius_B

    Marius_B New Member
    Proxmox Subscriber

    Joined:
    Dec 19, 2017
    Messages:
    4
    Likes Received:
    0
    I can confirm we get the same issue ever since we upgraded to the new version of Proxmox. This is very annoying and renders a host almost useless with us unable to kill the kworker process. Instead of restarting the LXC we've tried shutting it down and then starting it up again, which appeared to work or be safer in most cases, but even then we've had a LXC cause the same situation during a shutdown. Needs attention please.
     
  3. sapphiron

    sapphiron New Member

    Joined:
    Nov 2, 2012
    Messages:
    26
    Likes Received:
    0
    I have also been experiencing the same since my upgrade to Proxmox 5.1.

    I restarted a container, which resulted in Kworker using 100% CPU of IOWAIT. however, non of the disk devices had high utilization percentages, they were normal. VM and containers already running, also experienced no performance problems.

    The Proxmox interface stoped updating all VM and container data. Restarting some of the proxmox services, will briefly start updating KVM's again, but after about 30 seconds, it stops again.

    I was able to safely shut down my KVMs on the same box, using the qm shutdown commands via SSH. Using pct commands to attempt to do the same with my containers, simply hung on the command. I was also unable to do a pct list.

    Hard reset via IPMI was my only option to restore the server as it would not reboot via SSH.

    My Server is using MDADM+LVM (not thin) for VM and container storage. My motherboard is a ASUS X99 IPMI with Xeon-E5-1650 V4. an NFS share is mounted for Backups, ISO's and container templates.

    root@vm1:~# pveversion -v
    proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
    pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
    pve-kernel-4.4.40-1-pve: 4.4.40-82
    pve-kernel-4.4.35-2-pve: 4.4.35-79
    pve-kernel-4.4.83-1-pve: 4.4.83-96
    pve-kernel-4.4.24-1-pve: 4.4.24-72
    pve-kernel-4.4.62-1-pve: 4.4.62-88
    pve-kernel-4.4.19-1-pve: 4.4.19-66
    pve-kernel-4.4.6-1-pve: 4.4.6-48
    pve-kernel-4.4.35-1-pve: 4.4.35-77
    pve-kernel-4.4.21-1-pve: 4.4.21-71
    pve-kernel-4.4.95-1-pve: 4.4.95-99
    pve-kernel-4.4.44-1-pve: 4.4.44-84
    pve-kernel-4.13.13-5-pve: 4.13.13-38
    pve-kernel-4.4.16-1-pve: 4.4.16-64
    pve-kernel-4.4.67-1-pve: 4.4.67-92
    pve-kernel-4.13.13-1-pve: 4.13.13-31
    pve-kernel-4.4.59-1-pve: 4.4.59-87
    libpve-http-server-perl: 2.0-8
    lvm2: 2.02.168-pve6
    corosync: 2.4.2-pve3
    libqb0: 1.0.1-1
    pve-cluster: 5.0-19
    qemu-server: 5.0-20
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-25
    libpve-guest-common-perl: 2.0-14
    libpve-access-control: 5.0-7
    libpve-storage-perl: 5.0-17
    pve-libspice-server1: 0.12.8-3
    vncterm: 1.5-3
    pve-docs: 5.1-16
    pve-qemu-kvm: 2.9.1-6
    pve-container: 2.0-18
    pve-firewall: 3.0-5
    pve-ha-manager: 2.0-4
    ksm-control-daemon: 1.2-2
    glusterfs-client: 3.8.8-1
    lxc-pve: 2.1.1-2
    lxcfs: 2.0.8-1
    criu: 2.11.1-1~bpo90
    novnc-pve: 0.6-4
    smartmontools: 6.5+svn4324-1
    zfsutils-linux: 0.7.4-pve2~bpo9
     
  4. uncia

    uncia New Member

    Joined:
    Feb 16, 2012
    Messages:
    1
    Likes Received:
    0
    We have the same problem :( And a waiting for fix release.
     
  5. gusans

    gusans New Member

    Joined:
    Apr 21, 2016
    Messages:
    4
    Likes Received:
    0
    hi! same problem here :(

    root@pve1:~# pveversion -v
    proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
    pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
    pve-kernel-4.13.13-2-pve: 4.13.13-33
    pve-kernel-4.13.13-5-pve: 4.13.13-38
    libpve-http-server-perl: 2.0-8
    lvm2: 2.02.168-pve6
    corosync: 2.4.2-pve3
    libqb0: 1.0.1-1
    pve-cluster: 5.0-19
    qemu-server: 5.0-20
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-25
    libpve-guest-common-perl: 2.0-14
    libpve-access-control: 5.0-7
    libpve-storage-perl: 5.0-17
    pve-libspice-server1: 0.12.8-3
    vncterm: 1.5-3
    pve-docs: 5.1-16
    pve-qemu-kvm: 2.9.1-6
    pve-container: 2.0-18
    pve-firewall: 3.0-5
    pve-ha-manager: 2.0-4
    ksm-control-daemon: 1.2-2
    glusterfs-client: 3.8.8-1
    lxc-pve: 2.1.1-2
    lxcfs: 2.0.8-1
    criu: 2.11.1-1~bpo90
    novnc-pve: 0.6-4
    smartmontools: 6.5+svn4324-1
    zfsutils-linux: 0.7.4-pve2~bpo9
     
  6. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    I also see same issue after i upgraded to 5.1.

    Waiting for an update.

    kworker/u48:1 at 100%

    top - 23:02:35 up 3 days, 23:57, 1 user, load average: 69.34, 64.66, 63.87
    Tasks: 720 total, 17 running, 703 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 1.4 us, 62.9 sy, 0.0 ni, 26.5 id, 9.0 wa, 0.0 hi, 0.2 si, 0.0 st
    KiB Mem : 49443632 total, 11305480 free, 36524072 used, 1614080 buff/cache
    KiB Swap: 8388604 total, 8388604 free, 0 used. 12136912 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    32333 root 20 0 0 0 0 R 100.0 0.0 102:07.86 kworker/u48:1
    405 root 20 0 0 0 0 R 99.7 0.0 3988:29 arc_reclaim
    8728 root 20 0 0 0 0 R 55.9 0.0 2:24.85 arc_prune
    8901 root 20 0 0 0 0 S 55.6 0.0 2:23.63 arc_prune
    1593 root 20 0 0 0 0 S 54.9 0.0 4:01.49 arc_prune
    17397 root 20 0 0 0 0 S 54.9 0.0 0:06.30 arc_prune
    4254 root 20 0 0 0 0 R 54.6 0.0 3:14.35 arc_prune
     
  7. Harrdy

    Harrdy New Member

    Joined:
    Mar 10, 2018
    Messages:
    2
    Likes Received:
    0
    Your not alone. I got the same issue after my upgrade from 4.x to 5.x on one of my hosts. The kworker process spawns from time to time. The problem occurs about every two weeks without start or stopping any lxc or do anything on the host system. After a Reset via IPMI the system is still running without any problems until the kworker process spawns again.

    Code:
    root@node003:~# pveversion -v
    proxmox-ve: 5.1-41 (running kernel: 4.13.13-6-pve)
    pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
    pve-kernel-4.13.13-6-pve: 4.13.13-41
    pve-kernel-4.13.13-5-pve: 4.13.13-38
    pve-kernel-4.13.13-4-pve: 4.13.13-35
    pve-kernel-4.13.13-3-pve: 4.13.13-34
    pve-kernel-4.13.13-2-pve: 4.13.13-33
    pve-kernel-4.13.13-1-pve: 4.13.13-31
    pve-kernel-4.13.8-3-pve: 4.13.8-30
    pve-kernel-4.13.8-1-pve: 4.13.8-27
    pve-kernel-4.13.4-1-pve: 4.13.4-26
    pve-kernel-4.4.83-1-pve: 4.4.83-96
    pve-kernel-4.4.79-1-pve: 4.4.79-95
    pve-kernel-4.4.76-1-pve: 4.4.76-94
    pve-kernel-4.4.67-1-pve: 4.4.67-92
    pve-kernel-4.4.62-1-pve: 4.4.62-88
    pve-kernel-4.4.59-1-pve: 4.4.59-87
    pve-kernel-4.4.49-1-pve: 4.4.49-86
    corosync: 2.4.2-pve3
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.0-8
    libpve-common-perl: 5.0-28
    libpve-guest-common-perl: 2.0-14
    libpve-http-server-perl: 2.0-8
    libpve-storage-perl: 5.0-17
    libqb0: 1.0.1-1
    lvm2: 2.02.168-pve6
    lxc-pve: 2.1.1-3
    lxcfs: 2.0.8-2
    novnc-pve: 0.6-4
    proxmox-widget-toolkit: 1.0-11
    pve-cluster: 5.0-20
    pve-container: 2.0-19
    pve-docs: 5.1-16
    pve-firewall: 3.0-5
    pve-firmware: 2.0-3
    pve-ha-manager: 2.0-5
    pve-i18n: 1.0-4
    pve-libspice-server1: 0.12.8-3
    pve-qemu-kvm: 2.9.1-9
    pve-xtermjs: 1.0-2
    qemu-server: 5.0-22
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.6-pve1~bpo9
    
     
  8. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    When i restart the node, everyhting comes back to normal.

    But issue happens after few hours again.
     
  9. Harrdy

    Harrdy New Member

    Joined:
    Mar 10, 2018
    Messages:
    2
    Likes Received:
    0
    I read this in another topic.

    And i tried it too. So far, it looks very good. Also after 10 days no kworker has been spawned. Although I have done many actions with the containers like start/stop/restart/dump & restore.
     
  10. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    kernel 4.15 solved the issue
     
  11. Shankar

    Shankar New Member

    Joined:
    Feb 21, 2018
    Messages:
    3
    Likes Received:
    0
    I've had the same issue for months, and then I came across this thread. I upgraded to the 4.15 kernel, still seeing issues. I'm hoping I made a mistake somewhere and kernel 4.15 did solve the issue, but I'm not sure what to fix.

    I'll be happy to provide any troubleshooting info, but this is what I have right now -
    Code:
    root@T30:~# pveversion -v
    proxmox-ve: 5.1-42 (running kernel: 4.15.15-1-pve)
    pve-manager: 5.1-51 (running version: 5.1-51/96be5354)
    pve-kernel-4.13: 5.1-44
    pve-kernel-4.15: 5.1-3
    pve-kernel-4.15.15-1-pve: 4.15.15-6
    pve-kernel-4.13.16-2-pve: 4.13.16-47
    pve-kernel-4.13.13-5-pve: 4.13.13-38
    pve-kernel-4.13.13-2-pve: 4.13.13-33
    corosync: 2.4.2-pve4
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.0-8
    libpve-apiclient-perl: 2.0-4
    libpve-common-perl: 5.0-30
    libpve-guest-common-perl: 2.0-14
    libpve-http-server-perl: 2.0-8
    libpve-storage-perl: 5.0-18
    libqb0: 1.0.1-1
    lvm2: 2.02.168-pve6
    lxc-pve: 3.0.0-2
    lxcfs: 3.0.0-1
    novnc-pve: 0.6-4
    proxmox-widget-toolkit: 1.0-15
    pve-cluster: 5.0-25
    pve-container: 2.0-21
    pve-docs: 5.1-17
    pve-firewall: 3.0-8
    pve-firmware: 2.0-4
    pve-ha-manager: 2.0-5
    pve-i18n: 1.0-4
    pve-libspice-server1: 0.12.8-3
    pve-qemu-kvm: 2.11.1-5
    pve-xtermjs: 1.0-2
    qemu-server: 5.0-25
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.7-pve1~bpo9
    Code:
    root@T30:~# uname -a
    Linux T30 4.15.15-1-pve #1 SMP PVE 4.15.15-6 (Mon, 9 Apr 2018 12:24:42 +0200) x86_64 GNU/Linux
    Code:
    top - 09:56:35 up 1 day,  9:37,  2 users,  load average: 2.30, 2.54, 2.54
    Tasks: 575 total,   2 running, 445 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  3.1 us, 27.6 sy,  0.0 ni, 67.4 id,  1.7 wa,  0.0 hi,  0.2 si,  0.0 st
    KiB Mem : 65834576 total, 17571348 free,  9703844 used, 38559384 buff/cache
    KiB Swap:  7340028 total,  4993276 free,  2346752 used. 55229064 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    18550 root      20   0       0      0      0 R 100.0  0.0   1474:07 kworker/u8:4
    Thank you in advance !
     
  12. Jean-Pierre

    Jean-Pierre New Member
    Proxmox Subscriber

    Joined:
    Dec 22, 2016
    Messages:
    8
    Likes Received:
    2
    I can confirm Kernel 4.15 solved this issue for me, we have had zero issues for a few weeks now. I also had no luck with paid Proxmox support when this was happening, I was actually surprised at how bad it was.
     
  13. Jean-Pierre

    Jean-Pierre New Member
    Proxmox Subscriber

    Joined:
    Dec 22, 2016
    Messages:
    8
    Likes Received:
    2
    Code:
    root@T30:~# uname -a
    Linux T30 4.15.15-1-pve #1 SMP PVE 4.15.15-6 (Mon, 9 Apr 2018 12:24:42 +0200) x86_64 GNU/Linux
    Code:
    top - 09:56:35 up 1 day,  9:37,  2 users,  load average: 2.30, 2.54, 2.54
    Tasks: 575 total,   2 running, 445 sleeping,   0 stopped,   0 zombie
    %Cpu(s):  3.1 us, 27.6 sy,  0.0 ni, 67.4 id,  1.7 wa,  0.0 hi,  0.2 si,  0.0 st
    KiB Mem : 65834576 total, 17571348 free,  9703844 used, 38559384 buff/cache
    KiB Swap:  7340028 total,  4993276 free,  2346752 used. 55229064 avail Mem
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    18550 root      20   0       0      0      0 R 100.0  0.0   1474:07 kworker/u8:4
    Thank you in advance ![/QUOTE]

    Hi Shankar

    I do not have a solid answer for you, however I am running slightly older versions namely:
    kernel: 4.15.10-1-pve
    lxc-pve: 2.1.1-3
    lxcfs: 2.0.8-2

    Did you try the older kernel version I have or only the very new one you currently have as there was a source change after 4.15.10-4?

    Lastly if you would like to send me a copy of pvereport privately I can see if I have a node with the same versions of what you have and test.
     
  14. Shankar

    Shankar New Member

    Joined:
    Feb 21, 2018
    Messages:
    3
    Likes Received:
    0
    I just updated the kernel to 4.15, based on instructions here. In short, ran this -
    Code:
    apt update
    apt install pve-kernel-4.15
    And rebooted. it did pick up the latest kernel, I think, based on your versions. If I want to match your versions for the kernel, lxc-pve, and lxcfs, how do I do it ?

    Thank you for your time on this ! I'll send you the relevant portions of my pvereport, via PM.
     
  15. lojasyst

    lojasyst New Member

    Joined:
    Apr 10, 2012
    Messages:
    12
    Likes Received:
    0
    Same problem here.

    I upgraded to 4.15 kernel but nothing.

    Ideas?

    #pveversion
    pve-manager/5.2-2/b1d1c7f4 (running kernel: 4.15.17-3-pve)
     
  16. Jean-Pierre

    Jean-Pierre New Member
    Proxmox Subscriber

    Joined:
    Dec 22, 2016
    Messages:
    8
    Likes Received:
    2
    Hi

    Sorry about my delayed responses, by now version 5.2 should not have this issue. If you wish to try the exact same kernel I am still running, you should be able to run:

    apt-get install pve-kernel-4.15.10-1-pve

    when you reboot make sure it is booting off this kernel as it may not be your default.

    if you run apt-cache search pve-kernel-4.15 this will list all 4.15 available kernels.
     
  17. fireon

    fireon Well-Known Member
    Proxmox Subscriber

    Joined:
    Oct 25, 2010
    Messages:
    3,027
    Likes Received:
    186
    Problem exists also on version pve-manager/5.2-6/bcd5f008 (running kernel: 4.15.18-1-pve)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  18. Jean-Pierre

    Jean-Pierre New Member
    Proxmox Subscriber

    Joined:
    Dec 22, 2016
    Messages:
    8
    Likes Received:
    2
    I too can confirm the latest version of Promox 5.2 still has this issue, I have had at least two seprate nodes go down with the same kworker issue. I ca also confirm downgrading the kernel to pve-kernel-4.15.10-1-pve still fixes the issue. I will take this up with proxmox support again and report back.
     
  19. Shankar

    Shankar New Member

    Joined:
    Feb 21, 2018
    Messages:
    3
    Likes Received:
    0
    I last updated my kernel on Apr 22-ish, and have not had a problem since, there were at least 3-4 instances where I was almost sure the entire node would go down because of kworker issue, but never had the issue. For me, my current setup is pretty stable.
     
  20. fireon

    fireon Well-Known Member
    Proxmox Subscriber

    Joined:
    Oct 25, 2010
    Messages:
    3,027
    Likes Received:
    186
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice