4.15 based test kernel for PVE 5.x available

Jospeh Huber · Mar 28, 2018

After a week in an production environment with 4.15.3-1 ... again one node with questionmark.
I will try 4.15.10-1-pve

Vasu Sreekumar · Mar 28, 2018

I have 4.15.3 running without any issues.

What is the error related to? SSL or KSM or just node becomes grey and one LXC guest not starting?

I faced few node restart issue due to KSM not starting.

I had to manually set KSM starting memory usage % to 75% to avoid issue and did systemctl restart ksmtuned

efeu · Mar 28, 2018

Are the AMD bugs resolved?

Vasu Sreekumar · Mar 28, 2018

I don't have AMD based nodes. So i didn't test it.

But I can confirm that for LXC, there are still many bugs causing node restart which are very annoying

fabian · Mar 28, 2018

Vasu Sreekumar said:
I don't have AMD based nodes. So i didn't test it.

But I can confirm that for LXC, there are still many bugs causing node restart which are very annoying

if you are referring to KSM not merging pages fast enough for your setup - that is not a bug. overcommitting resources is always a dangerous game to play.

Vasu Sreekumar · Mar 28, 2018

No that is not the issue.

Suppose I already started 3 guests and node is at 75% memory usage, it will not start KSM sharing.

And when i start 4th guest, the node crashes and restarts.

I reproduced the same error multiple times. Every time node crashed.

Then I changed the KSM threshhold to 50% (KSM_THRES_COEF=50) , then KSM starts when i have three guests started.

And I can start 4th guest without any crash.

eXtremeSHOk · Mar 28, 2018

pve-kernel-4.15.10-1-pve working perfectly on our various Intel based servers.

Vasu Sreekumar · Mar 29, 2018

pve-kernel-4.15.10-1-pve also has the above KSM sharing issue.

If you have plenty of memory, you will not see it.

I have 25+ nodes, and i don't have plenty of memory, so i see it often.

But after setting the % of KSM thresh hold., i d didn't face any issue.

efeu · Mar 29, 2018

Vasu Sreekumar said:
I don't have AMD based nodes. So i didn't test it.
...

Oh, the question was more related on proxmox staff

But ty anyway.

@proxmox Team
Are the AMD issues solved with the newer 4.15 kernel?

Cant test it by my own right now

fabian · Mar 29, 2018

Vasu Sreekumar said:
No that is not the issue.

Suppose I already started 3 guests and node is at 75% memory usage, it will not start KSM sharing.

And when i start 4th guest, the node crashes and restarts.

I reproduced the same error multiple times. Every time node crashed.

Then I changed the KSM threshhold to 50% (KSM_THRES_COEF=50) , then KSM starts when i have three guests started.

And I can start 4th guest without any crash.

like I said - this is not a bug. when you overcommit resources, you need to carefully plan otherwise you might run out of resources. KSM is always asynchronous. unless you have some details to share which you haven't included so far that actually point to a bug, please stop posting this "issue" in this thread. thanks.

Vasu Sreekumar · Mar 29, 2018

With default settings, node crashes when I start the 4th guest.

With changed settings node does not crash when i start 4th guest, since it starts KSM early enough.

I think it is more like LXC related issues than a bug.

In KVM I didn't face any issues.

Vasu Sreekumar · Mar 30, 2018

System crashed and restarted time 20:02:00

(I have 25 live nodes, 1 or 2 nodes crashes like this everyday. )

Log file.

Mar 29 19:39:42 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:39:56 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:40:04 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:40:35 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:40:57 Q172 pvedaemon[4385]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:41:55 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:42:13 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:42:56 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:44:40 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:45:03 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:45:03 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:49:19 Q172 pvedaemon[9167]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:56:44 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
Mar 29 19:59:09 Q172 pvedaemon[2841]: <root@pam> successful auth for user 'root@pam'
Mar 29 20:02:19 Q172 kernel: [ 0.000000] Linux version 4.15.3-1-pve (root@nora) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1)) #1 SMP PVE 4.15.3-1 (Fri, 9 Mar 2018 14:45:34 +0100) ()
Mar 29 20:02:19 Q172 kernel: [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-4.15.3-1-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
Mar 29 20:02:19 Q172 kernel: [ 0.000000] KERNEL supported cpus:
Mar 29 20:02:19 Q172 kernel: [ 0.000000] Intel GenuineIntel
Mar 29 20:02:19 Q172 kernel: [ 0.000000] AMD AuthenticAMD
Mar 29 20:02:19 Q172 kernel: [ 0.000000] Centaur CentaurHauls
Mar 29 20:02:19 Q172 kernel: [ 0.000000] x86/fpu: x87 FPU will use FXSAVE
Mar 29 20:02:19 Q172 kernel: [ 0.000000] e820: BIOS-provided physical RAM map:
Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009e7ff] usable
Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x000000000009e800-0x000000000009ffff] reserved
Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bf72ffff] usable
Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf730000-0x00000000bf73dfff] ACPI data
Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf73e000-0x00000000bf79ffff] ACPI NVS
Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf7a0000-0x00000000bf7affff] reserved
Mar 29 20:02:19 Q172 kernel: [ 0.000000] BIOS-e820: [mem 0x00000000bf7bc000-0x00000000bfffffff] reserved

Whatever · Mar 30, 2018

any chance to see ZFS 0.7.7 included into test kernel?

tom · Mar 30, 2018

Whatever said:
any chance to see ZFS 0.7.7 included into test kernel?

you will see this in the next week.

joshlukas · Mar 30, 2018

@efeu: I have a customized mini home server with some VMs and Containers. It's a

Threadripper 1900X
Asrock X399 Taichi
2 x 16 GB DDR4 2400 Kingston ECC unbuffered RAM
nVidia G710 (host)
nVidia GTX1080 (Win10 guest)
250 GB Samsung EVO SSD (host only)
512 GB Samsung EVO SSD ZFS pool (Win10 guest + 5 VMs)
3 x 3 TB Samsung Eco green HDD in RAIDZ1 (Fileserver, Container, Templates)

Running latest proxmox:
root@pve:~# pveversion --verbose
proxmox-ve: 5.1-42 (running kernel: 4.15.10-1-pve)
pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13: 5.1-43
pve-kernel-4.15: 5.1-2
pve-kernel-4.15.10-1-pve: 4.15.10-2
pve-kernel-4.13.16-1-pve: 4.13.16-43
pve-kernel-4.13.13-6-pve: 4.13.13-42
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-common-perl: 5.0-28
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 2.1.1-3
lxcfs: 2.0.8-2
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20
pve-container: 2.0-19
pve-docs: 5.1-16
pve-firewall: 3.0-5
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.9.1-9
pve-xtermjs: 1.0-2
qemu-server: 5.0-22
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.6-pve1~bpo9

Still in need of the java fix for pci-e passthrough to my Win10 gaming system as well as problems with gpu sleep where only a reboot of the host system fixes the loss of gpu to the guest OS. Regardless of that, everything seems to be working stable and very fast.

1 x Win10 VM KVM with PCI-e passthrough for gaming (4 cores, 12 GB RAM)
1 x VM running ubuntu 16.10 with Squeezeboxserver (2 cores, 512 MB RAM)
1 x VM running debian 9 with nginx reverse proxy (4 cores, 1 GB RAM)
1 x VM running debian 9 with nextcloud (4 cores, 1 GB RAM)
1 x VM running debian 9 with mailserver (4 cores, 4 GB RAM)
1 x VM running debian 9 with monitoring (1 core, 1 GB RAM)
1 x LXC running ubuntu 16.10 with motioneye (2 cores, 1 GB RAM)
1 x LCC running ubuntu 16.10 with ampache music server (1 core, 512 MB RAM)

SMB and NFS via ZFS.

efeu · Apr 2, 2018

So even with 4.15.10-1-pve cpu type host is not working for Zen. Windows bootup is starting, but then after a while the VM eats 800-1400% CPU in top and nothing more is happening. Also I recognized that you can not passthrough the CPU internal USB controller to a VM anymore, which was working absolutly fine with 4.13....

I do not see any ubuntu work on this issue, so maybe the proxmox team could find out, which changes are causing this problems and revert them for the proxmox kernel, I mean a AMD compatible kernel should be something very important for a virtualization distribution, dont u agree?

eXtremeSHOk · Apr 7, 2018

Tested on a dual 24core (48core, 96thread) AMD EYPC, working perfectly in production. 4.15 is a must have on AMD EPYC.

My Post Install Script (postinstall.sh) located at https://github.com/extremeshok/xshok-proxmox/ will automatically Detect an AMD EPYC CPU and install the kernel 4.15.

udo · Apr 9, 2018

Hi,
just tried kernel 4.15 on an Dell R620 with Perc 710 mini Raid-Volume (lvm).

4.15.10 is booting fine. 4.15.15 from pvetest stuck after:

Code:

[   1.104090] megaraid_sas 0000:03:00.0: Inint cmd return status SUCCESS for SCSI host 0

after a longer time (minutes) one more line:

Code:

Reading sll physical volumes. This may take a while...

then tree times (363s, 605s + 846s) INFO: task lvm:375 blocked for more than 120 seconds. (if I press the on/off switch).

Udo

tjh · Apr 14, 2018

Running
Linux orbit 4.15.15-1-pve #1 SMP PVE 4.15.15-6

on a QOTOM i5 box and it's working fine. I even notice that my Intel NICs now are using MSI-X interrupts. With 4.13 it was only using MSI interrupts.

CloudPlumber42 · Apr 20, 2018

joshlukas said:
@efeu: I have a customized mini home server with some VMs and Containers. It's a

Threadripper 1900X
Asrock X399 Taichi
2 x 16 GB DDR4 2400 Kingston ECC unbuffered RAM
nVidia G710 (host)
nVidia GTX1080 (Win10 guest)
250 GB Samsung EVO SSD (host only)
512 GB Samsung EVO SSD ZFS pool (Win10 guest + 5 VMs)
3 x 3 TB Samsung Eco green HDD in RAIDZ1 (Fileserver, Container, Templates)

Running latest proxmox:
root@pve:~# pveversion --verbose
proxmox-ve: 5.1-42 (running kernel: 4.15.10-1-pve)
pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13: 5.1-43
pve-kernel-4.15: 5.1-2
pve-kernel-4.15.10-1-pve: 4.15.10-2
pve-kernel-4.13.16-1-pve: 4.13.16-43
pve-kernel-4.13.13-6-pve: 4.13.13-42
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-common-perl: 5.0-28
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 2.1.1-3
lxcfs: 2.0.8-2
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20
pve-container: 2.0-19
pve-docs: 5.1-16
pve-firewall: 3.0-5
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.9.1-9
pve-xtermjs: 1.0-2
qemu-server: 5.0-22
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.6-pve1~bpo9

Still in need of the java fix for pci-e passthrough to my Win10 gaming system as well as problems with gpu sleep where only a reboot of the host system fixes the loss of gpu to the guest OS. Regardless of that, everything seems to be working stable and very fast.

1 x Win10 VM KVM with PCI-e passthrough for gaming (4 cores, 12 GB RAM)
1 x VM running ubuntu 16.10 with Squeezeboxserver (2 cores, 512 MB RAM)
1 x VM running debian 9 with nginx reverse proxy (4 cores, 1 GB RAM)
1 x VM running debian 9 with nextcloud (4 cores, 1 GB RAM)
1 x VM running debian 9 with mailserver (4 cores, 4 GB RAM)
1 x VM running debian 9 with monitoring (1 core, 1 GB RAM)
1 x LXC running ubuntu 16.10 with motioneye (2 cores, 1 GB RAM)
1 x LCC running ubuntu 16.10 with ampache music server (1 core, 512 MB RAM)

SMB and NFS via ZFS.

I am running identical hardware and funny enough VM config with Hyper-V at the moment with the host partition being my gaming VM. I have been waiting for the same pcie java fix to make a move to this hypervisor/VM config, has this fix dropped by chance? If so what has your experience been?

Also a question for the dev team, once the 4.15 kernel is labeled stable, how easy will it be to switch an install to the new branch?

4.15 based test kernel for PVE 5.x available

Renowned Member

Active Member

Renowned Member

Active Member

Proxmox Staff Member

Active Member

Renowned Member

Active Member

Renowned Member

Proxmox Staff Member

Active Member

Active Member

Renowned Member

Proxmox Staff Member

Active Member

Attachments

Renowned Member

Renowned Member

Distinguished Member

Well-Known Member

New Member

We value your privacy