4.15 based test kernel for PVE 5.x available

Discussion in 'Proxmox VE: Installation and configuration' started by fabian, Mar 12, 2018.

  1. Jospeh Huber

    Jospeh Huber Member

    Joined:
    Apr 18, 2016
    Messages:
    76
    Likes Received:
    3
    After a week in an production environment with 4.15.3-1 ... again one node with questionmark.
    I will try 4.15.10-1-pve
     
  2. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    I have 4.15.3 running without any issues.

    What is the error related to? SSL or KSM or just node becomes grey and one LXC guest not starting?

    I faced few node restart issue due to KSM not starting.

    I had to manually set KSM starting memory usage % to 75% to avoid issue and did systemctl restart ksmtuned
     
  3. efeu

    efeu Member
    Proxmox Subscriber

    Joined:
    Nov 6, 2015
    Messages:
    74
    Likes Received:
    6
    Are the AMD bugs resolved?
     
  4. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    I don't have AMD based nodes. So i didn't test it.

    But I can confirm that for LXC, there are still many bugs causing node restart which are very annoying
     
  5. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,200
    Likes Received:
    496
    if you are referring to KSM not merging pages fast enough for your setup - that is not a bug. overcommitting resources is always a dangerous game to play.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  6. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    No that is not the issue.

    Suppose I already started 3 guests and node is at 75% memory usage, it will not start KSM sharing.

    And when i start 4th guest, the node crashes and restarts.

    I reproduced the same error multiple times. Every time node crashed.

    Then I changed the KSM threshhold to 50% (KSM_THRES_COEF=50) , then KSM starts when i have three guests started.

    And I can start 4th guest without any crash.
     
  7. eXtremeSHOk

    eXtremeSHOk New Member

    Joined:
    Mar 15, 2016
    Messages:
    23
    Likes Received:
    12
    pve-kernel-4.15.10-1-pve working perfectly on our various Intel based servers.
     
  8. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    pve-kernel-4.15.10-1-pve also has the above KSM sharing issue.

    If you have plenty of memory, you will not see it.

    I have 25+ nodes, and i don't have plenty of memory, so i see it often.

    But after setting the % of KSM thresh hold., i d didn't face any issue.
     
  9. efeu

    efeu Member
    Proxmox Subscriber

    Joined:
    Nov 6, 2015
    Messages:
    74
    Likes Received:
    6
    Oh, the question was more related on proxmox staff ;)

    But ty anyway.

    @proxmox Team
    Are the AMD issues solved with the newer 4.15 kernel?

    Cant test it by my own right now
     
  10. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,200
    Likes Received:
    496
    like I said - this is not a bug. when you overcommit resources, you need to carefully plan otherwise you might run out of resources. KSM is always asynchronous. unless you have some details to share which you haven't included so far that actually point to a bug, please stop posting this "issue" in this thread. thanks.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  11. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    With default settings, node crashes when I start the 4th guest.

    With changed settings node does not crash when i start 4th guest, since it starts KSM early enough.

    I think it is more like LXC related issues than a bug.

    In KVM I didn't face any issues.
     
  12. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    System crashed and restarted time 20:02:00

    (I have 25 live nodes, 1 or 2 nodes crashes like this everyday. )

    Log file.

    Mar 29 19:39:42 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:39:56 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:40:04 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:40:35 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:40:57 Q172 pvedaemon[4385]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:41:55 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:42:13 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:42:56 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:44:40 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:45:03 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:45:03 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:49:19 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:56:44 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
    Mar 29 19:59:09 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] Linux version 4.15.3-1-pve (root@nora) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PVE 4.15.3-1 (Fri, 9 Mar 2018 14:45:34 +0100) ()
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.15.3-1-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] KERNEL supported cpus:
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] Intel GenuineIntel
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] AMD AuthenticAMD
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] Centaur CentaurHauls
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] e820: BIOS-provided physical RAM map:
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009e7ff] usable
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x000000000009e800-0x000000000009ffff] reserved
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bf72ffff] usable
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf730000-0x00000000bf73dfff] ACPI data
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf73e000-0x00000000bf79ffff] ACPI NVS
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf7a0000-0x00000000bf7affff] reserved
    Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf7bc000-0x00000000bfffffff] reserved
     
  13. Whatever

    Whatever Member

    Joined:
    Nov 19, 2012
    Messages:
    199
    Likes Received:
    5
    any chance to see ZFS 0.7.7 included into test kernel?
     
  14. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    13,460
    Likes Received:
    395
    you will see this in the next week.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    Dmitry.S and fireon like this.
  15. joshlukas

    joshlukas New Member

    Joined:
    Apr 21, 2017
    Messages:
    10
    Likes Received:
    0
    @efeu: I have a customized mini home server with some VMs and Containers. It's a

    Threadripper 1900X
    Asrock X399 Taichi
    2 x 16 GB DDR4 2400 Kingston ECC unbuffered RAM
    nVidia G710 (host)
    nVidia GTX1080 (Win10 guest)
    250 GB Samsung EVO SSD (host only)
    512 GB Samsung EVO SSD ZFS pool (Win10 guest + 5 VMs)
    3 x 3 TB Samsung Eco green HDD in RAIDZ1 (Fileserver, Container, Templates)

    Running latest proxmox:
    root@pve:~# pveversion --verbose
    proxmox-ve: 5.1-42 (running kernel: 4.15.10-1-pve)
    pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
    pve-kernel-4.13: 5.1-43
    pve-kernel-4.15: 5.1-2
    pve-kernel-4.15.10-1-pve: 4.15.10-2
    pve-kernel-4.13.16-1-pve: 4.13.16-43
    pve-kernel-4.13.13-6-pve: 4.13.13-42
    corosync: 2.4.2-pve3
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.0-8
    libpve-common-perl: 5.0-28
    libpve-guest-common-perl: 2.0-14
    libpve-http-server-perl: 2.0-8
    libpve-storage-perl: 5.0-17
    libqb0: 1.0.1-1
    lvm2: 2.02.168-pve6
    lxc-pve: 2.1.1-3
    lxcfs: 2.0.8-2
    novnc-pve: 0.6-4
    proxmox-widget-toolkit: 1.0-11
    pve-cluster: 5.0-20
    pve-container: 2.0-19
    pve-docs: 5.1-16
    pve-firewall: 3.0-5
    pve-firmware: 2.0-4
    pve-ha-manager: 2.0-5
    pve-i18n: 1.0-4
    pve-libspice-server1: 0.12.8-3
    pve-qemu-kvm: 2.9.1-9
    pve-xtermjs: 1.0-2
    qemu-server: 5.0-22
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.6-pve1~bpo9

    Still in need of the java fix for pci-e passthrough to my Win10 gaming system as well as problems with gpu sleep where only a reboot of the host system fixes the loss of gpu to the guest OS. Regardless of that, everything seems to be working stable and very fast.

    1 x Win10 VM KVM with PCI-e passthrough for gaming (4 cores, 12 GB RAM)
    1 x VM running ubuntu 16.10 with Squeezeboxserver (2 cores, 512 MB RAM)
    1 x VM running debian 9 with nginx reverse proxy (4 cores, 1 GB RAM)
    1 x VM running debian 9 with nextcloud (4 cores, 1 GB RAM)
    1 x VM running debian 9 with mailserver (4 cores, 4 GB RAM)
    1 x VM running debian 9 with monitoring (1 core, 1 GB RAM)
    1 x LXC running ubuntu 16.10 with motioneye (2 cores, 1 GB RAM)
    1 x LCC running ubuntu 16.10 with ampache music server (1 core, 512 MB RAM)

    SMB and NFS via ZFS.
     

    Attached Files:

  16. efeu

    efeu Member
    Proxmox Subscriber

    Joined:
    Nov 6, 2015
    Messages:
    74
    Likes Received:
    6
    So even with 4.15.10-1-pve cpu type host is not working for Zen. Windows bootup is starting, but then after a while the VM eats 800-1400% CPU in top and nothing more is happening. Also I recognized that you can not passthrough the CPU internal USB controller to a VM anymore, which was working absolutly fine with 4.13....

    I do not see any ubuntu work on this issue, so maybe the proxmox team could find out, which changes are causing this problems and revert them for the proxmox kernel, I mean a AMD compatible kernel should be something very important for a virtualization distribution, dont u agree?
     
  17. eXtremeSHOk

    eXtremeSHOk New Member

    Joined:
    Mar 15, 2016
    Messages:
    23
    Likes Received:
    12
    Tested on a dual 24core (48core, 96thread) AMD EYPC, working perfectly in production. 4.15 is a must have on AMD EPYC.

    My Post Install Script (postinstall.sh) located at https://github.com/extremeshok/xshok-proxmox/ will automatically Detect an AMD EPYC CPU and install the kernel 4.15.
     
    #37 eXtremeSHOk, Apr 7, 2018
    Last edited: Apr 7, 2018
    fireon likes this.
  18. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,834
    Likes Received:
    158
    Hi,
    just tried kernel 4.15 on an Dell R620 with Perc 710 mini Raid-Volume (lvm).

    4.15.10 is booting fine. 4.15.15 from pvetest stuck after:
    Code:
    [   1.104090] megaraid_sas 0000:03:00.0: Inint cmd return status SUCCESS for SCSI host 0
    
    after a longer time (minutes) one more line:
    Code:
    Reading sll physical volumes. This may take a while...
    
    then tree times (363s, 605s + 846s) INFO: task lvm:375 blocked for more than 120 seconds. (if I press the on/off switch).

    Udo
     
  19. tjh

    tjh New Member

    Joined:
    Feb 15, 2018
    Messages:
    19
    Likes Received:
    2
    Running
    Linux orbit 4.15.15-1-pve #1 SMP PVE 4.15.15-6

    on a QOTOM i5 box and it's working fine. I even notice that my Intel NICs now are using MSI-X interrupts. With 4.13 it was only using MSI interrupts.
     
  20. CloudPlumber42

    CloudPlumber42 New Member

    Joined:
    Apr 20, 2018
    Messages:
    2
    Likes Received:
    0


    I am running identical hardware and funny enough VM config with Hyper-V at the moment with the host partition being my gaming VM. I have been waiting for the same pcie java fix to make a move to this hypervisor/VM config, has this fix dropped by chance? If so what has your experience been?

    Also a question for the dev team, once the 4.15 kernel is labeled stable, how easy will it be to switch an install to the new branch?
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice