4.15 based test kernel for PVE 5.x available

udo · Jun 19, 2018

t.lamprecht said:
...
Hmm, no good...
Could you please post the output of:

Code:

ethtool -e eth0 length 256 ethtool -i eth0

(where 'eth0' should be replaced with a problematic e1000e NIC link name)

Hi Thomas,
I've look at an cluster with e1000e driver but running an older kernel (4.13.13-34).

There are also trouble with two from seven nodes, but not so often (like I had on one node with the 4.15er):

Code:

root@pve01:~# dmesg | grep e1000
...
[4815221.163083] NETDEV WATCHDOG: eth4 (e1000e): transmit queue 0 timed out
[4815221.271681]  joydev intel_cstate pcspkr ipmi_si dcdbas shpchp mei intel_rapl_perf ipmi_devintf lpc_ich wmi mac_hid ipmi_msghandler acpi_pad acpi_power_meter vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 hid_generic usbmouse usbkbd btrfs usbhid xor hid raid6_pq e1000e(O) tg3 ahci ptp libahci megaraid_sas pps_core
[4815221.628818] e1000e 0000:05:00.0 eth4: Reset adapter unexpectedly
[4815224.767822] e1000e: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[5506569.903374] e1000e 0000:05:00.1 eth5: Reset adapter unexpectedly
[5506573.100126] e1000e: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[6454490.510657] e1000e 0000:05:00.0 eth4: Reset adapter unexpectedly
[6454493.679389] e1000e: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[6885168.270113] e1000e 0000:05:00.1 eth5: Reset adapter unexpectedly
[6885171.474830] e1000e: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[11810161.032883] e1000e 0000:05:00.0 eth4: Reset adapter unexpectedly
[11810164.317643] e1000e: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[13020006.535342] e1000e 0000:05:00.1 eth5: Reset adapter unexpectedly
[13020009.772075] e1000e: eth5 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

root@pve03:~# dmesg | grep e1000
...
[8780811.944900] NETDEV WATCHDOG: eth4 (e1000e): transmit queue 0 timed out
[8780812.065959]  dcdbas joydev soundcore shpchp pcspkr intel_rapl_perf mei wmi lpc_ich ipmi_si acpi_power_meter mac_hid ipmi_devintf ipmi_msghandler acpi_pad vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 hid_generic usbmouse usbkbd usbhid hid btrfs xor raid6_pq e1000e(O) ahci tg3 ptp libahci megaraid_sas pps_core
[8780812.461362] e1000e 0000:05:00.0 eth4: Reset adapter unexpectedly
[8780815.565703] e1000e: eth4 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

both nodes has an nic with an older firmware:

Code:

ethtool -e eth4 length 256
Offset          Values
------          ------
0x0000:         00 15 17 84 cd 0c 20 05 98 11 62 50 ff ff ff ff
0x0010:         77 c5 05 21 2f a4 5e 13 86 80 5e 10 86 80 6f b1
0x0020:         08 00 5e 10 00 54 00 00 01 58 00 00 00 00 00 01
0x0030:         f6 6c b0 37 ae 07 03 84 83 07 00 00 03 c3 02 06
0x0040:         08 00 f0 0e 64 21 40 00 01 48 00 00 00 00 00 00
0x0050:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0060:         00 01 00 40 1e 12 07 40 00 01 00 40 ff ff ff ff
0x0070:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff dd 41
0x0080:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0090:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00a0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00b0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00c0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00d0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00e0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00f0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

root@pve01:~# ethtool -i eth4
driver: e1000e
version: 3.3.6-NAPI
firmware-version: 5.6-2
expansion-rom-version:
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

All nodes without trouble has an newer firmware:

Code:

ethtool -i eth4
driver: e1000e
version: 3.3.6-NAPI
firmware-version: 5.11-2
expansion-rom-version:
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

@EDIT: but just see, that the NIC, which makes trouble on the 4.15er kernel has the same (not so old) firmware:

Code:

root@pvetest:~# ethtool -i enp5s0f0
driver: e1000e
version: 3.2.6-k
firmware-version: 5.11-2
expansion-rom-version:
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

root@pvetest:~# ethtool -i enp5s0f0
driver: e1000e
version: 3.2.6-k
firmware-version: 5.11-2
expansion-rom-version:
bus-info: 0000:05:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
root@pvetest:~# ethtool -e enp5s0f0 length 256
Offset          Values
------          ------
0x0000:         00 15 17 4a 4a aa 20 04 ff ff b2 50 ff ff ff ff
0x0010:         08 d5 03 68 2f a4 5e 11 86 80 5e 10 86 80 65 b1
0x0020:         08 00 5e 10 00 54 00 00 01 50 00 00 00 00 00 01
0x0030:         f6 6c b0 37 a6 07 03 84 83 07 00 00 03 c3 02 06
0x0040:         08 00 f0 0e 64 21 40 00 01 40 00 00 00 00 00 00
0x0050:         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0060:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0070:         ff ff ff ff ff ff ff ff ff ff 97 01 ff ff bf 7e
0x0080:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0090:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00a0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00b0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00c0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00d0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00e0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x00f0:         ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Udo

resoli · Jun 19, 2018

t.lamprecht said:
I found an issue which fixes my reproducer here, I reported it upstream with a patch and build a new kernel with the fix included, it was just uploaded to pvetest, you could manually download it from:

http://download.proxmox.com/debian/...pve-kernel-4.15.17-3-pve_4.15.17-13_amd64.deb

I applied to my quorum node; drbd9 syncronized immediately. Tomorrow I will update remaining nodes and will report.

Thanks,
rob

resoli · Jun 20, 2018

resoli said:
I applied to my quorum node; drbd9 syncronized immediately. Tomorrow I will update remaining nodes and will report.

All nodes upgraded and now running 4.15.17-13 . All is well

Nice job, Thomas!
rob

t.lamprecht · Jun 20, 2018

resoli said:
All nodes upgraded and now running 4.15.17-13 . All is well

good to hear!

udo said:
both nodes has an nic with an older firmware:

Hmm, there were some problems with older firmwares in the past... Couldn't you update the older ones to the newer?

Antonio Blanco said:
Sure, here:

Sorry, it seems that the

Code:

ethtool -i <DEV>

(note -i not -e) is missing, would be nice to see if the firmware's are related, or at least also older than udo's working ones.

Stefan Radman · Jun 20, 2018

Hi Thomas,
With PVE5.2 and kernel 4.15.17-3-pve I was not able to run jumbo frames (MTU 9000) on an Intel I350 card (igb).
Same configuration worked on two other nodes in the cluster with Broadcom NIC (tg3).
Installation of pve kernel 4.15.17-13 solved the headache for me

Thanks a lot
Stefan

sergopotap · Jun 29, 2018

Hello,
We have five node pve 5.2 and kernel 4.15.17-3-pve (servers HP DL180 G6) and we are experiencing issues with it.
Servers randomly crashes every day, without any kernel panic logs. Where we can enable kernel panic logs or where where we can found it?

Code:

proxmox-ve: 5.2-2 (running kernel: 4.15.17-3-pve)
pve-manager: 5.2-3 (running version: 5.2-3/785ba980)
pve-kernel-4.15: 5.2-3
pve-kernel-4.13: 5.1-45
pve-kernel-4.15.17-3-pve: 4.15.17-13
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.16-3-pve: 4.13.16-49
pve-kernel-4.4.128-1-pve: 4.4.128-111
pve-kernel-4.4.117-1-pve: 4.4.117-109
pve-kernel-4.4.98-6-pve: 4.4.98-107
pve-kernel-4.4.98-5-pve: 4.4.98-105
pve-kernel-4.4.95-1-pve: 4.4.95-99
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.4.49-1-pve: 4.4.49-86
pve-kernel-4.4.35-2-pve: 4.4.35-79
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.2.8-1-pve: 4.2.8-41
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.3-2-pve: 4.2.3-22
pve-kernel-4.2.3-1-pve: 4.2.3-18
pve-kernel-4.2.2-1-pve: 4.2.2-16
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-34
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-1
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-12
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-29
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

Code:

auto lo
iface lo inet loopback

auto bond0
iface bond0 inet static
        primary eth2
        slaves eth2 eth3
        address 10.10.105.23
        netmask 255.255.255.0
        bond_miimon 100
        bond_mode 1
        pre-up ( ifconfig eth2 mtu 8900 && ifconfig eth3 mtu 8900 )
        mtu 8900
auto vmbr2
iface vmbr2 inet static
        address  172.16.4.127
        netmask  255.255.255.0
        gateway  172.16.4.1
        bridge_ports eth0.2
        bridge_stp off
        bridge_fd 0

Code:

root@pve4-node1:~# ethtool -i eth2
driver: ixgbe
version: 5.3.7
firmware-version: 0x2b2c0001
expansion-rom-version:
bus-info: 0000:04:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

root@pve4-node1:~# ethtool -i eth3
driver: ixgbe
version: 5.3.7
firmware-version: 0x2b2c0001
expansion-rom-version:
bus-info: 0000:04:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

root@pve4-node1:~# ethtool -i eth0
driver: igb
version: 5.3.5.18
firmware-version: 1.7.2
expansion-rom-version:
bus-info: 0000:06:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no

t.lamprecht · Jun 29, 2018

sergopotap said:
We have five node pve 5.2 and kernel 4.15.17-3-pve (servers HP DL180 G6) and we are experiencing issues with it.
Servers randomly crashes every day, without any kernel panic logs. Where we can enable kernel panic logs or where where we can found it?

crashes as in "the server just resets (reboots) suddenly"?

Where did you look? I assume /var/log/kern.log (and rotates) journalctl (or syslog if no persistent journal is enabled)?

Do you have a watchdog configured?

sergopotap · Jul 14, 2018

t.lamprecht said:
crashes as in "the server just resets (reboots) suddenly"?

Where did you look? I assume /var/log/kern.log (and rotates) journalctl (or syslog if no persistent journal is enabled)?

Do you have a watchdog configured?

crashes as in "the server just resets (reboots) suddenly"? - Yes

Do you have a watchdog configured? - No my server HP g6 and ilo100

Where did you look? I assume /var/log/kern.log (and rotates) journalctl (or syslog if no persistent journal is enabled)? - server crash Jul 13 00:27:57 time

Code:

root@pve4-node2:~# kdump-config show
DUMP_MODE:        kdump
USE_KDUMP:        1
KDUMP_SYSCTL:     kernel.panic_on_oops=1
KDUMP_COREDIR:    /var/crash
crashkernel addr: 0x26000000
   /var/lib/kdump/vmlinuz: symbolic link to /boot/vmlinuz-4.15.18-1-pve
kdump initrd:
   /var/lib/kdump/initrd.img: symbolic link to /var/lib/kdump/initrd.img-4.15.18-1-pve
current state:    ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_IMAGE=/boot/vmlinuz-4.15.18-1-pve root=/dev/mapper/pve-root ro quiet irqpoll nr_cpus=1 nousb systemd.unit=kdump-tools.service ata_piix.prefer_ms_hyperv=0" --initrd=/var/lib/kdump/initrd.img /var/lib/kdump/vmlinuz



But


root@pve4-node2:~# kdump-config savecore
running makedumpfile -c -d 31 /proc/vmcore /var/crash/201807140829/dump-incomplete.
open_dump_memory: Can't open the dump memory(/proc/vmcore). No such file or directory

makedumpfile Failed.
kdump-config: makedumpfile failed, falling back to 'cp' ... failed!
cp: cannot stat '/proc/vmcore': No such file or directory
kdump-config: failed to save vmcore in /var/crash/201807140829 ... failed!
running makedumpfile --dump-dmesg /proc/vmcore /var/crash/201807140829/dmesg.201807140829.
open_dump_memory: Can't open the dump memory(/proc/vmcore). No such file or directory

makedumpfile Failed.
kdump-config: makedumpfile --dump-dmesg failed. dmesg content will be unavailable ... failed!
kdump-config: failed to save dmesg content in /var/crash/201807140829 ... failed!

alsicorp · Jul 17, 2018

I have 2 Hp servers ( HP ProLiant DL360 G6, HP Proliant DL160 G6)
that are having a kernel panic with the latest enterprise kernel 4.15.18-1

I don't know if this matters or not... (I can't imagine a mtu setting causing a kernel panic)
Both have jumbo frames enabled mtu 9000
Both have been fine (and still are) with 4.15.17-3-pve

one server has bnx2 driver
one server has e1000e driver

I checked both syslog and kernel logs - I guess it never booted far enough to write the logs...

I could see the kernel panic on the screen but using a kvm switch and text was VERY large so I couldn't see the complete error.

Guillaume · Jul 19, 2018

Hi

In last month we checked 4.15 and see so many people have problem, we tested on G8 and have problem too
We using old server, like R610 and HP G8.
Do you know if problem was now corrected and we can update to 4.15 or must we need to stay on our 4.13.16-4 that work perfectly ?

Best regards
Guillaume

tsarya · Jul 20, 2018

Hi,

I upgraded today my HP DL360p Gen8 to the latest 4.15.18-15 kernel and the system cannot boot, it is stuck at importing the zfs pool.

Code:

proxmox-ve: 5.2-2 (running kernel: 4.15.17-3-pve)
pve-manager: 5.2-5 (running version: 5.2-5/eb24855a)
pve-kernel-4.15: 5.2-4
pve-kernel-4.15.18-1-pve: 4.15.18-15
pve-kernel-4.15.17-3-pve: 4.15.17-14
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-35
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-1
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-28
pve-container: 2.0-24
pve-docs: 5.2-4
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-29
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

It boots fine with kernel 4.15.17-14

coppola_f · Jul 23, 2018

forum admins!!!

may be this is related with:
https://forum.proxmox.com/threads/4-15-17-kernel-panic.44714/#post-214424

i'm going to crosslink these topics!!
so we all may look around!!

regards,
Francesco

Menno · Aug 7, 2018

t.lamprecht said:
crashes as in "the server just resets (reboots) suddenly"?

Where did you look? I assume /var/log/kern.log (and rotates) journalctl (or syslog if no persistent journal is enabled)?

Do you have a watchdog configured?

I can confirm the panic and it seems to be related to PTI (page-table isolation), adding the nopti flag to the kernel command line makes the server boot again although I have not yet tested the machine extensively.

Previous kernels 4.15.17 all work fine, ever since kernel 4.15.18 my machines became unstable and panic on boot with the back trace added below.

Hardware used is ProLiant DL380 G6 and G7, please let me know if any other information is needed.

The full back trace is:

Code:

[    6.455328] general protection fault: 0000 [#1] SMP PTI
[    6.714257] Modules linked in: ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbmouse usbkbd usbhid hid psmouse bnx2 sfc mpt3sas mtd ptp raid_class pps_core mdio hpsa scsi_transport_sas
[    8.417644] CPU: 1 PID: 330 Comm: systemd-modules Tainted: G          I      4.15.18-1-pve #1
[    8.839981] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[    9.153970] RIP: 0010:__kmalloc_node+0x1b0/0x2b0
[    9.382121] RSP: 0018:ffffbda146d37a20 EFLAGS: 00010286
[    9.641499] RAX: 0000000000000000 RBX: b9f63dd25eea69bd RCX: 00000000000009a7
[    9.994883] RDX: 00000000000009a6 RSI: 0000000000000000 RDI: 0000000000027040
[   10.347905] RBP: ffffbda146d37a58 R08: ffff981ce6e67040 R09: ffff981ce6807c00
[   10.701943] R10: ffff981cdfebb488 R11: ffffffffc057cd80 R12: 0000000001080020
[   11.055194] R13: 0000000000000008 R14: 00000000ffffffff R15: ffff981ce6807c00
[   11.409098] FS:  00007f60956158c0(0000) GS:ffff981ce6e40000(0000) knlGS:0000000000000000
[   11.810294] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.093672] CR2: 00007ff7b3a215c0 CR3: 00000008ef6fa005 CR4: 00000000000206e0
[   12.093673] Call Trace:
[   12.093680]  ? enqueue_task_fair+0xb5/0x800
[   12.093684]  ? alloc_cpumask_var_node+0x1f/0x30
[   12.093687]  ? x86_configure_nx+0x50/0x50
[   12.093689]  alloc_cpumask_var_node+0x1f/0x30
[   12.093691]  alloc_cpumask_var+0xe/0x10
[   12.093694]  native_send_call_func_ipi+0x2e/0x130
[   12.093696]  ? find_next_bit+0xb/0x10
[   12.093699]  smp_call_function_many+0x1bb/0x260
[   12.093701]  ? x86_configure_nx+0x50/0x50
[   12.093703]  on_each_cpu+0x2d/0x60
[   12.093704]  flush_tlb_kernel_range+0x79/0x80
[   12.093708]  ? purge_fragmented_blocks_allcpus+0x53/0x1f0
[   12.093711]  __purge_vmap_area_lazy+0x52/0xc0
[   12.093713]  vm_unmap_aliases+0xfa/0x130
[   12.093716]  change_page_attr_set_clr+0xea/0x370
[   12.093718]  ? 0xffffffffc0578000
[   12.093721]  set_memory_ro+0x29/0x30
[   12.093722]  ? 0xffffffffc0578000
[   12.093724]  frob_text.isra.33+0x23/0x30
[   12.093726]  module_enable_ro.part.54+0x35/0x90
[   12.093728]  do_init_module+0x119/0x219
[   12.093730]  load_module+0x28e6/0x2e00
[   12.093734]  ? ima_post_read_file+0x83/0xa0
[   12.093737]  SYSC_finit_module+0xe5/0x120
[   12.093738]  ? SYSC_finit_module+0xe5/0x120
[   12.093740]  SyS_finit_module+0xe/0x10
[   12.093743]  do_syscall_64+0x73/0x130
[   12.093746]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   12.093747] RIP: 0033:0x7f6094b01229
[   12.093748] RSP: 002b:00007ffe1ac72988 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   12.093750] RAX: ffffffffffffffda RBX: 00005650c392dc40 RCX: 00007f6094b01229
[   12.093751] RDX: 0000000000000000 RSI: 00007f6094fea265 RDI: 0000000000000006
[   12.093752] RBP: 00007f6094fea265 R08: 0000000000000000 R09: 0000000000000000
[   12.093753] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
[   12.093754] R13: 00005650c392d930 R14: 0000000000020000 R15: 00007ffe1ac72af0
[   12.093755] Code: 89 d0 4c 01 d3 48 33 1b 49 33 9f 40 01 00 00 65 48 0f c7 0f 0f 94 c0 84 c0 0f 84 ef fe ff ff 48 85 db 74 14 49 63 47 20 48 01 c3 <48> 33 1b 49 33 9f 40 01 00 00 0f 18 0b 41 f7 c4 00 80 00 00 4c
[   12.093781] RIP: __kmalloc_node+0x1b0/0x2b0 RSP: ffffbda146d37a20
[   12.093816] ---[ end trace 6a54e144d0e4034e ]---
[   12.118561] general protection fault: 0000 [#2] SMP PTI
[   12.118562] Modules linked in: ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbmouse usbkbd usbhid hid psmouse bnx2 sfc mpt3sas mtd ptp raid_class pps_core mdio hpsa scsi_transport_sas
[   12.118582] CPU: 1 PID: 335 Comm: mount Tainted: G      D   I      4.15.18-1-pve #1
[   12.118582] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[   12.118586] RIP: 0010:__kmalloc_track_caller+0xc3/0x220
[   12.118587] RSP: 0018:ffffbda146d7be70 EFLAGS: 00010286
[   12.118588] RAX: b9f63dd25eea69bd RBX: b9f63dd25eea69bd RCX: 00000000000009a8
[   12.118589] RDX: 00000000000009a7 RSI: 0000000000000000 RDI: b9f63dd25eea69bd
[   12.118590] RBP: ffffbda146d7bea0 R08: 0000000000027040 R09: ffff981ce6807c00
[   12.118591] R10: 8080808080808080 R11: fefefefefefefeff R12: 00000000014000c0
[   12.118592] R13: 0000000000000005 R14: ffffffff9bdf5506 R15: ffff981ce6807c00
[   12.118594] FS:  00007ff7b3ba4480(0000) GS:ffff981ce6e40000(0000) knlGS:0000000000000000
[   12.118595] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.118596] CR2: 00007ffc7504bd68 CR3: 00000008f4e5a003 CR4: 00000000000206e0
[   12.118597] Call Trace:
[   12.118601]  memdup_user+0x2c/0x70
[   12.118603]  strndup_user+0x46/0x60
[   12.118607]  SyS_mount+0x34/0xd0
[   12.118609]  do_syscall_64+0x73/0x130
[   12.118611]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   12.118612] RIP: 0033:0x7ff7b326c24a
[   12.118613] RSP: 002b:00007ffc7504cdb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[   12.118615] RAX: ffffffffffffffda RBX: 0000558d7bde9030 RCX: 00007ff7b326c24a
[   12.118616] RDX: 0000558d7bde9210 RSI: 0000558d7bde9250 RDI: 0000558d7bde9230
[   12.118617] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[   12.118618] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000558d7bde9230
[   12.118619] R13: 0000558d7bde9210 R14: 0000000000000000 R15: 00000000ffffffff
[   12.118620] Code: 5e 1c 64 49 83 78 10 00 49 8b 38 0f 84 ea 00 00 00 48 85 ff 0f 84 e1 00 00 00 49 63 5f 20 4d 8b 07 48 8d 4a 01 48 89 f8 48 01 fb <48> 33 1b 49 33 9f 40 01 00 00 65 49 0f c7 08 0f 94 c0 84 c0 74
[   12.118646] RIP: __kmalloc_track_caller+0xc3/0x220 RSP: ffffbda146d7be70
[   12.118648] ---[ end trace 6a54e144d0e4034f ]---
[   12.122166] general protection fault: 0000 [#3] SMP PTI
[   12.122166] Modules linked in: ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbmouse usbkbd usbhid hid psmouse bnx2 sfc mpt3sas mtd ptp raid_class pps_core mdio hpsa scsi_transport_sas
[   12.122187] CPU: 1 PID: 333 Comm: mount Tainted: G      D   I      4.15.18-1-pve #1
[   12.122187] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[   12.122190] RIP: 0010:__kmalloc_track_caller+0xc3/0x220
[   12.122191] RSP: 0018:ffffbda146dc7e70 EFLAGS: 00010286
[   12.122193] RAX: b9f63dd25eea69bd RBX: b9f63dd25eea69bd RCX: 00000000000009a8
[   12.122194] RDX: 00000000000009a7 RSI: 0000000000000000 RDI: b9f63dd25eea69bd
[   12.122195] RBP: ffffbda146dc7ea0 R08: 0000000000027040 R09: ffff981ce6807c00
[   12.122196] R10: 8080808080808080 R11: fefefefefefefeff R12: 00000000014000c0
[   12.122197] R13: 0000000000000007 R14: ffffffff9bdf5506 R15: ffff981ce6807c00
[   12.122198] FS:  00007f5a2fccb480(0000) GS:ffff981ce6e40000(0000) knlGS:0000000000000000
[   12.122200] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   12.122201] CR2: 00007f5a2f2f0d30 CR3: 000000091e25c004 CR4: 00000000000206e0
[   12.122201] Call Trace:
[   12.122205]  memdup_user+0x2c/0x70
[   12.122207]  strndup_user+0x46/0x60
[   12.122209]  SyS_mount+0x51/0xd0
[   12.122211]  do_syscall_64+0x73/0x130
[   12.122213]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   12.122215] RIP: 0033:0x7f5a2f39324a
[   12.122216] RSP: 002b:00007ffee4915338 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[   12.122217] RAX: ffffffffffffffda RBX: 0000560f461de030 RCX: 00007f5a2f39324a
[   12.122218] RDX: 0000560f461de210 RSI: 0000560f461de250 RDI: 0000560f461de230
[   12.122219] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[   12.122220] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000560f461de230
[   12.122221] R13: 0000560f461de210 R14: 0000000000000000 R15: 00000000ffffffff
[   12.122222] Code: 5e 1c 64 49 83 78 10 00 49 8b 38 0f 84 ea 00 00 00 48 85 ff 0f 84 e1 00 00 00 49 63 5f 20 4d 8b 07 48 8d 4a 01 48 89 f8 48 01 fb <48> 33 1b 49 33 9f 40 01 00 00 65 49 0f c7 08 0f 94 c0 84 c0 74
[   12.122249] RIP: __kmalloc_track_caller+0xc3/0x220 RSP: ffffbda146dc7e70
[   12.122250] ---[ end trace 6a54e144d0e40350 ]---

edit: I might have spoken too soon, one of my machines still does not work with the new kernel, it boots further than without the nopti flag but still crashes.

First back trace:

Code:

[    6.192737] usercopy: kernel memory exposure attempt detected from 000000001790da28 (kmalloc-8) (15 bytes)
[    6.670605] kernel BUG at mm/usercopy.c:72!
[    6.877866] invalid opcode: 0000 [#1] SMP NOPTI
[    7.102128] Modules linked in: tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic usbkbd usbmouse usbhid hid psmouse mpt3sas raid_class bnx2 sfc mtd ptp pps_core mdio hpsa scsi_transport_sas
[    8.462061] CPU: 3 PID: 314 Comm: udevadm Tainted: G          I      4.15.18-1-pve #1
[    8.849616] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[    9.163444] RIP: 0010:__check_object_size+0x167/0x190
[    9.413124] RSP: 0018:ffffac9dc6db3db8 EFLAGS: 00010286
[    9.671357] RAX: 000000000000005e RBX: 000000000000000f RCX: 0000000000000000
[   10.025387] RDX: 0000000000000000 RSI: ffff9300c2ed6498 RDI: ffff9300c2ed6498
[   10.378198] RBP: ffffac9dc6db3dd8 R08: 0000000000000003 R09: 00000000000003bc
[   10.731334] R10: 0000000000000008 R11: ffffffffb155680d R12: 0000000000000001
[   11.084980] R13: ffff9300b41d3857 R14: ffff9300b41d3848 R15: ffff9300b41d3848
[   11.437743] FS:  00007fd5189d98c0(0000) GS:ffff9300c2ec0000(0000) knlGS:0000000000000000
[   11.437744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   11.437747] CR2: 000055971ec212b8 CR3: 00000008f7ecc004 CR4: 00000000000206e0
[   11.437748] Call Trace:
[   11.437754]  filldir+0xb0/0x140
[   11.437758]  kernfs_fop_readdir+0x103/0x270
[   11.437760]  iterate_dir+0xa8/0x1a0
[   11.437762]  SyS_getdents+0x9e/0x120
[   11.437763]  ? fillonedir+0x100/0x100
[   11.437767]  do_syscall_64+0x73/0x130
[   11.437768]  ? do_syscall_64+0x73/0x130
[   11.437772]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[   11.437773] RIP: 0033:0x7fd51782bf2b
[   11.437774] RSP: 002b:00007fff8a8eec40 EFLAGS: 00000202 ORIG_RAX: 000000000000004e
[   11.437776] RAX: ffffffffffffffda RBX: 000055971e81a170 RCX: 00007fd51782bf2b
[   11.437777] RDX: 0000000000008000 RSI: 000055971e81a170 RDI: 0000000000000004
[   11.437778] RBP: 000055971e81a170 R08: fffe000000000000 R09: 0000000000008040
[   11.437779] R10: 0000000000000090 R11: 0000000000000202 R12: fffffffffffffe58
[   11.437780] R13: 0000000000000000 R14: 000055971e81a140 R15: 00007fd5189d9718
[   11.437781] Code: 48 0f 45 d1 48 c7 c6 ef 11 ce b0 48 c7 c1 e9 11 cf b0 48 0f 45 f1 49 89 d9 49 89 c0 4c 89 f1 48 c7 c7 28 12 cf b0 e8 99 e8 e7 ff <0f> 0b 48 c7 c0 d2 11 cf b0 eb b9 48 c7 c0 e2 11 cf b0 eb b0 48
[   11.437807] RIP: __check_object_size+0x167/0x190 RSP: ffffac9dc6db3db8
[   11.437821] ---[ end trace fa877bd9a718e005 ]---

And a bit later:

Code:

[  121.361676] general protection fault: 0000 [#2] SMP NOPTI
[  121.632590] Modules linked in: ipt_REJECT nf_reject_ipv4 iptable_filter bonding 8021q garp mrp softdog nfnetlink_log nfnetlink vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq hid_generic usbkbd usbmouse usbhid hid psmouse mpt3sas raid_class bnx2 sfc mtd ptp pps_core mdio hpsa scsi_transport_sas
[  123.476354] CPU: 3 PID: 3860 Comm: (start.sh) Tainted: G      D   I      4.15.18-1-pve #1
[  123.881161] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[  124.196485] RIP: 0010:__kmalloc_track_caller+0xe5/0x220
[  124.454684] RSP: 0018:ffffac9dccc47ce0 EFLAGS: 00010282
[  124.714902] RAX: 0000000000000000 RBX: 9a870d7bf26606d3 RCX: 000000000000166b
[  125.067714] RDX: 000000000000166a RSI: 0000000000000000 RDI: ffff9300b41d3838
[  125.420275] RBP: ffffac9dccc47d10 R08: 0000000000027040 R09: ffff9300c2807c00
[  125.775725] R10: 0000000000000155 R11: ffffac9dccc47cf0 R12: 00000000014000c0
[  126.129125] R13: 0000000000000006 R14: ffffffffafdf50a4 R15: ffff9300c2807c00
[  126.481481] FS:  00007f1590203940(0000) GS:ffff9300c2ec0000(0000) knlGS:0000000000000000
[  126.882090] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  127.165534] CR2: 00007f158e867e48 CR3: 00000008cb2f0003 CR4: 00000000000206e0
[  127.519583] Call Trace:
[  127.640589]  kstrdup+0x31/0x60
[  127.792890]  kstrdup_const+0x24/0x30
[  127.970359]  alloc_vfsmnt+0xb1/0x230
[  128.146541]  clone_mnt+0x36/0x330
[  128.309898]  copy_tree+0x17c/0x310
[  128.477589]  copy_mnt_ns+0x86/0x290
[  128.650410]  ? create_new_namespaces+0x36/0x1e0
[  128.874440]  create_new_namespaces+0x61/0x1e0
[  129.090364]  unshare_nsproxy_namespaces+0x5a/0xb0
[  129.322606]  SyS_unshare+0x201/0x3a0
[  129.499390]  do_syscall_64+0x73/0x130
[  129.680466]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  129.930403] RIP: 0033:0x7f158e7e7487
[  130.107402] RSP: 002b:00007ffe2bb5e248 EFLAGS: 00000a07 ORIG_RAX: 0000000000000110
[  130.481490] RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f158e7e7487
[  130.836166] RDX: 00007ffe2bb5e238 RSI: 00007ffe2bb5e370 RDI: 0000000000020000
[  131.189537] RBP: 00007ffe2bb5e4c0 R08: 0000556ddc80e442 R09: 00007ffe2bb5e260
[  131.541847] R10: 0000000000000000 R11: 0000000000000a07 R12: 00007ffe2bb5e250
[  131.894958] R13: 00007ffe2bb5e250 R14: 0000556ddc80e441 R15: 0000556ddc8072d9
[  132.246620] Code: 8d 4a 01 48 89 f8 48 01 fb 48 33 1b 49 33 9f 40 01 00 00 65 49 0f c7 08 0f 94 c0 84 c0 74 b2 48 85 db 74 14 49 63 47 20 48 01 c3 <48> 33 1b 49 33 9f 40 01 00 00 0f 18 0b 41 f7 c4 00 80 00 00 48
[  133.181775] RIP: __kmalloc_track_caller+0xe5/0x220 RSP: ffffac9dccc47ce0
[  133.514693] softdog: Initiating system reboot
[  133.515953] ------------[ cut here ]------------
[  133.515955] NETDEV WATCHDOG: ens3f1np1 (sfc): transmit queue 1 timed out
[  133.515976] WARNING: CPU: 1 PID: 3776 at net/sched/sch_generic.c:323 dev_watchdog+0x222/0x230
[  133.515977] Modules linked in: ipt_REJECT nf_reject_ipv4 iptable_filter bonding 8021q garp mrp softdog nfnetlink_log nfnetlink vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs xor zstd_compress
[  133.515996] sfc 0000:0e:00.0 ens3f0np0: TX stuck with port_enabled=1: resetting channels
[  133.515996]  raid6_pq hid_generic usbkbd usbmouse usbhid hid psmouse mpt3sas raid_class bnx2 sfc mtd ptp pps_core mdio hpsa scsi_transport_sas
[  133.516008] CPU: 1 PID: 3776 Comm: pmxcfs Tainted: G      D   I      4.15.18-1-pve #1
[  133.516009] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
[  133.516010] RIP: 0010:dev_watchdog+0x222/0x230
[  133.516011] RSP: 0018:ffff9300c2e43e58 EFLAGS: 00010286
[  133.516013] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000000000001f
[  133.516014] RDX: 0000000000000001 RSI: 0000000000000002 RDI: 0000000000000246
[  133.516015] RBP: ffff9300c2e43e88 R08: 0000000000000000 R09: 000000000000003c
[  133.516016] R10: ffff9300c2e5a770 R11: 0000000000028fd0 R12: 0000000000000040
[  133.516017] R13: ffff9300b1c22000 R14: ffff9300b1c22478 R15: ffff9300b1c2cf40
[  133.516018] FS:  00007f9cbeffd700(0000) GS:ffff9300c2e40000(0000) knlGS:0000000000000000
[  133.516019] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  133.516021] CR2: 00007f1b980441b8 CR3: 00000008be846003 CR4: 00000000000206e0
[  133.516021] Call Trace:
[  133.516022]  <IRQ>
[  133.516025]  ? dev_deactivate_queue.constprop.33+0x60/0x60
[  133.516029]  call_timer_fn+0x32/0x130
[  133.516032]  run_timer_softirq+0x1dd/0x430
[  133.516035]  ? timerqueue_add+0x59/0x90
[  133.516037]  ? ktime_get+0x43/0xa0
[  133.516040]  __do_softirq+0x109/0x29b
[  133.516043]  irq_exit+0xb6/0xc0
[  133.516045]  smp_apic_timer_interrupt+0x71/0x130
[  133.516046]  apic_timer_interrupt+0x84/0x90
[  133.516047]  </IRQ>
[  133.516051] RIP: 0010:finish_task_switch+0x78/0x200
[  133.516052] RSP: 0018:ffffac9dccad7c10 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
[  133.516053] RAX: ffff9300bc185600 RBX: ffff93007e955600 RCX: 0000000000000000
[  133.516054] RDX: 0000000000007f9c RSI: 00000000beffd700 RDI: ffff9300c2e628c0
[  133.516055] RBP: ffffac9dccad7c38 R08: 0000000000001534 R09: 0000000000000002
[  133.516056] R10: ffffac9dc62a3e08 R11: 0000000000000400 R12: ffff9300c2e628c0
[  133.516059] R13: ffff9300bc185600 R14: ffff93007e853180 R15: 0000000000000000
[  133.516063]  __schedule+0x3e8/0x870
[  133.516068]  ? fuse_copy_one+0x53/0x70
[  133.516070]  schedule+0x36/0x80
[  133.516073]  do_wait_intr+0x6f/0x80
[  133.516075]  fuse_dev_do_read.isra.25+0x47f/0x860
[  133.516077]  ? wait_woken+0x80/0x80
[  133.516079]  fuse_dev_read+0x65/0x90
[  133.516082]  new_sync_read+0xe4/0x130
[  133.516084]  __vfs_read+0x29/0x40
[  133.516086]  vfs_read+0x96/0x130
[  133.516088]  SyS_read+0x55/0xc0
[  133.516091]  do_syscall_64+0x73/0x130
[  133.516092]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  133.516094] RIP: 0033:0x7f9cccd4c20d
[  133.516095] RSP: 002b:00007f9cbeffcbf0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
[  133.516096] RAX: ffffffffffffffda RBX: 00007f9cbeffcd40 RCX: 00007f9cccd4c20d
[  133.516097] RDX: 0000000000021000 RSI: 00007f9cce38c010 RDI: 0000000000000007
[  133.516098] RBP: 00007f9cbeffcd38 R08: 0000000000000000 R09: 0000000000000000
[  133.516099] R10: 00007f9cb40008c0 R11: 0000000000000293 R12: 000055c7aaf50080
[  133.516100] R13: 000055c7aaf4f9c0 R14: 00007f9cbeffd698 R15: 0000000000021000
[  133.516101] Code: 37 00 49 63 4e e8 eb 92 4c 89 ef c6 05 26 29 d8 00 01 e8 a2 21 fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 60 56 d9 b0 e8 1e ce 7f ff <0f> 0b eb c0 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
[  133.516128] ---[ end trace fa877bd9a718e006 ]---

coppola_f · Aug 7, 2018

Menno,
we're working HP DL380 G6 too....
we solved rolling back to 4.13.xx kernel

many thanks again for your time,
regards,
Francesco

Alwin · Aug 7, 2018

Menno said:
[ 8.849616] Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015

@Menno, your BIOS is not up-to-date, there have been two updates past 08/16/2015 that ship microcode for the Intel CPUs. This might also be a reason for the kernel crash.
https://support.hpe.com/hpsc/swd/pu...a7d8a4990bcc245dfc3&swEnvOid=4184#tab-history

coppola_f · Aug 7, 2018

Actually unable to retrieve bios version
i really can't reboot any node,
hoping to give you feedback about this value on our 4x dl380 g6 ASAP!!

regards,
Francesco

Alwin · Aug 8, 2018

@coppola_f, dmidecode should give you the information too.

coppola_f · Aug 8, 2018

@Alwin

here the results:
(bios relase date is 02/22/2018!!)

# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.7 present.
133 structures occupying 4124 bytes.
Table at 0xDF7FE000.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: HP
Version: P62
Release Date: 02/22/2018
Address: 0xF0000
Runtime Size: 64 kB
ROM Size: 8192 kB
Characteristics:
PCI is supported
PNP is supported
BIOS is upgradeable
BIOS shadowing is allowed
ESCD support is available
Boot from CD is supported
Selectable boot is supported
EDD is supported
5.25"/360 kB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 kB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
BIOS boot specification is supported
Function key-initiated network boot is supported
Targeted content distribution is supported
Firmware Revision: 2.33

Menno · Aug 9, 2018

Alwin said:
@Menno, your BIOS is not up-to-date, there have been two updates past 08/16/2015 that ship microcode for the Intel CPUs. This might also be a reason for the kernel crash.

Thanks Alwin, I was not aware there was an updated BIOS.

These machines however are only for testing and out of warranty so I'm unable to upgrade the BIOS, we do have newer hardware available to continue our testing so I'm in the process of upgrading the machines as we speak. Hopefully they play nice with the latest kernel.

Though I find this issue to be a regression as a kernel upgrade should never break things, the issue also plays on multiple machines of different generations (gen 6 and 7) and is reported by multiple users. Perhaps someone else is able to upgrade their BIOS to see if it resolves the issue so it can be marked as fixed that way or otherwise be debugged some more.

David Herselman · Aug 9, 2018

We have a 3 x HP ProLiant DL380 G7 node cluster which is working perfectly:

kvm1:

Code:

HP ProLiant DL380 G7 (583914-B21)
BIOS: 05/05/2011

kvm2:

Code:

HP ProLiant DL380 G7 (583914-B21)
BIOS: 12/01/2010

kvm3:

Code:

HP ProLiant DL380 G7 (583914-B21)
BIOS: 12/01/2010

Running Ceph 12.2.7, two NICs in a LACP bond for VM traffic and another two NICs in a LACP bond for Ceph traffic.

We're running OVS, perhaps that's different to your environment?

Code:

[root@kvm1 ~]# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-1-pve)
pve-manager: 5.2-5 (running version: 5.2-5/eb24855a)
pve-kernel-4.15: 5.2-4
pve-kernel-4.15.18-1-pve: 4.15.18-15
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-35
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-1
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-28
pve-container: 2.0-24
pve-docs: 5.2-4
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-29
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

Code:

[root@kvm2 ~]# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-1-pve)
pve-manager: 5.2-5 (running version: 5.2-5/eb24855a)
pve-kernel-4.15: 5.2-4
pve-kernel-4.15.18-1-pve: 4.15.18-15
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-35
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-1
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-28
pve-container: 2.0-24
pve-docs: 5.2-4
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-29
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

Code:

[root@kvm3 ~]# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-1-pve)
pve-manager: 5.2-5 (running version: 5.2-5/eb24855a)
pve-kernel-4.15: 5.2-4
pve-kernel-4.15.18-1-pve: 4.15.18-15
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-35
libpve-guest-common-perl: 2.0-17
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-24
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-1
openvswitch-switch: 2.7.0-3
proxmox-widget-toolkit: 1.0-19
pve-cluster: 5.0-28
pve-container: 2.0-24
pve-docs: 5.2-4
pve-firewall: 3.0-13
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-29
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

Hrm... I don't see ceph in the pveversion -v output. Using Proxmox apt sources though:

Code:

[root@kvm1 ~]# cat /etc/apt/sources.list.d/ceph.list
deb http://download.proxmox.com/debian/ceph-luminous stretch main

[root@kvm1 sources.list.d]# dpkg -l | grep ceph
ii  ceph-base                            12.2.7-pve1                    amd64        common ceph daemon libraries and management tools
ii  ceph-common                          12.2.7-pve1                    amd64        common utilities to mount and interact with a ceph storage cluster
ii  ceph-fuse                            12.2.7-pve1                    amd64        FUSE-based client for the Ceph distributed file system
ii  ceph-mds                             12.2.7-pve1                    amd64        metadata server for the ceph distributed file system
ii  ceph-mgr                             12.2.7-pve1                    amd64        manager for the ceph distributed storage system
ii  ceph-mon                             12.2.7-pve1                    amd64        monitor server for the ceph storage system
ii  ceph-osd                             12.2.7-pve1                    amd64        OSD server for the ceph storage system
ii  libcephfs1                           10.2.10-1~bpo80+1              amd64        Ceph distributed file system client library
ii  libcephfs2                           12.2.7-pve1                    amd64        Ceph distributed file system client library
ii  python-ceph                          12.2.7-pve1                    amd64        Meta-package for python libraries for the Ceph libraries
ii  python-cephfs                        12.2.7-pve1                    amd64        Python 2 libraries for the Ceph libcephfs library

4.15 based test kernel for PVE 5.x available

Distinguished Member

Renowned Member

Renowned Member

Proxmox Staff Member

Active Member

New Member

Proxmox Staff Member

New Member

Attachments

Renowned Member

New Member

Active Member

Renowned Member

Member

Renowned Member

Proxmox Retired Staff

Renowned Member

Proxmox Retired Staff

Renowned Member

Member

Renowned Member

We value your privacy