PVE 5.1: KVM broken on old CPUs

profpolymath · Oct 24, 2017

VMs fail to launch after upgrading from PVE 5.0 to 5.1 on an old server:

Code:

Could not access KVM kernel module: No such file or directory
failed to initialize KVM: No such file or directory

The kvm_intel module refuses to load:

Code:

# lsmod | grep kv
kvm                   581632  0
irqbypass              16384  1 kvm
# modprobe kvm-intel
modprobe: ERROR: could not insert 'kvm_intel': Input/output error

The following thread suggests this is due to upstream changes in KVM deprecating older platforms:
hXXps://bbs.archlinux.org/viewtopic.php?pid=1727757#p1727757

The machine is a PowerEdge 2950 III with dual 4-core Xeon 'Dempsey' 5050 CPUs. About ten years old. If we want to continue running PVE on this hardware it sounds as though it will have to be pegged at 5.0.

Code:

# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    2
Socket(s):             2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            15
Model:                 6
Model name:            Intel(R) Xeon(TM) CPU 3.00GHz
Stepping:              4
CPU MHz:               2992.595
CPU max MHz:           3000.0000
CPU min MHz:           2000.0000
BogoMIPS:              5985.19
Virtualization:        VT-x
L1d cache:             16K
L2 cache:              2048K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts nopl cpuid pni dtes64 monitor ds_cpl vmx est cid cx16 xtpr pdcm lahf_lm tpr_shadow

Code:

# uname -a
Linux pve2950 4.13.4-1-pve #1 SMP PVE 4.13.4-25 (Fri, 13 Oct 2017 08:59:53 +0200) x86_64 GNU/Linux
# pveversion -v
proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
pve-kernel-4.13.4-1-pve: 4.13.4-25
pve-kernel-4.10.17-3-pve: 4.10.17-23
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.1-2
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.2-pve1~bpo90

fabian · Oct 25, 2017

thanks for reporting this, we'll see about speeding up the (already planned) revert..

macleod · Oct 25, 2017

Same problem on a HP Proliant DL380G5 server. Reverting to 4.10 kernel solved the problem.

Code:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 15
Model name:            Intel(R) Xeon(R) CPU            5130  @ 2.00GHz
Stepping:              6
CPU MHz:               2000.002
BogoMIPS:              4000.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              4096K
NUMA node0 CPU(s):     0-3
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow dtherm

dendi · Oct 25, 2017

It seems neither AMD Opteron 6xxx supports vnmi

fabian · Oct 25, 2017

test kernel is available on http://download.proxmox.com/temp/pve-kernel-4.13.4-1-pve_4.13.4-26~vmxtest1_amd64.deb , hash sums are:

Code:

SHA512: bf1abdaef81afbd3e06340bc9e01594acc175d6dde225e9311e7fbae53b9256aa62397c74ccb384560404b0de5240b3bbbe152782a77eeb00c6fb07dbc84874b
SHA256: b07b79306318341ae359752fecf1baa9faac09202286634e03b0e11caffd759c 
MD5: 109945d8e7929678df61f927e59b904b

please provide feedback, we don't have any affected machines (anymore) in our test lab, and neither do the upstream KVM kernel developers..

profpolymath · Oct 25, 2017

fabian said:
please provide feedback, we don't have any affected machines (anymore) in our test lab, and neither do the upstream KVM kernel developers..

Thanks Fabian. After installing the test kernel VMs are again able to start.

sandqst · Oct 25, 2017

After rolling back the 5.1 update, and staying on pve 5.0, testing the kernel posted above seams to work. However, I will be staying with the 4.10.17 kernel for now (and pve 5.0).

This is running on a HP proliant microserver gen 8 with a Intel Xenon E3-1256L V2 processor. Well, I guess that it is beginning to show its age. :-/

cybermcm · Oct 25, 2017

HP DL380 G5 (really old ;-)) with 5.1 and new kernel -> working!!
Thanks to the Proxmox team for this quick solution

apoc · Oct 25, 2017

dendi said:
It seems neither AMD Opteron 6xxx supports vnmi

My Opteron 6276 on a Supermicro H8SGL-F seems to just run fine if that helps.

dendi · Oct 26, 2017

tburger said:
My Opteron 6276 on a Supermicro H8SGL-F seems to just run fine if that helps.

With or without the patch above?
Thank you!

apoc · Oct 26, 2017

I am just using the standard (free) repro. Patchlevel is as of yesterday 18:00 German time.
Havent installed the test-kernel from this thread.
Would you like me to check something particular in the logs?
edit/
Something just jumped to my mind: I would not expect that this would make a difference, but I chose to use a custom partition layout, therefore installed debian first and after that applied the proxmox-kernel.
/edit

chOzcA75vE0poCY0F6XC · Oct 26, 2017

I've installed Proxmox on a Dell D620 yesterday, and brought it fully up to date with dist-upgrade.
Containers worked just fine, but couldn't start VM's.
After installing the test kernel and rebooting the VM's work, but the Containers won't start anymore.

The D620 has got a T5600 CPU.

Edit: For some reason it didn't mount the CIFS share on boot, that was the reason why the containers wouldn't start. Fixed now.

Here are some logs:

systemctl status pve-container@607.service

● pve-container@607.service - PVE LXC Container: 607
Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2017-10-26 13:07:59 CEST; 5min ago
Docs: man:lxc-start
man:lxc
man

ct
Process: 2149 ExecStart=/usr/bin/lxc-start -n 607 (code=exited, status=1/FAILURE)

Oct 26 13:07:58 hypervisor3 systemd[1]: Starting PVE LXC Container: 607...
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: lxccontainer.c: wait_on_daemonized_start: 751 No such file or directory - Failed to receive the container state
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 368 The container failed to start.
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 370 To get more details, run the container in foreground mode.
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 372 Additional information can be obtained by setting the --logfile and --logpriority optio
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Control process exited, code=exited status=1
Oct 26 13:07:59 hypervisor3 systemd[1]: Failed to start PVE LXC Container: 607.
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Unit entered failed state.
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Failed with result 'exit-code'.
~

journalctl -xe

-- Unit user@0.service has finished starting up.
--
-- The start-up result is done.
Oct 26 13:07:27 hypervisor3 kernel: perf: interrupt took too long (3145 > 3131), lowering kernel.perf_event_max_sample
Oct 26 13:07:58 hypervisor3 pvedaemon[2147]: starting CT 607: UPID:hypervisor3:00000863:00008392:59F1C20E:vzstart:607:
Oct 26 13:07:58 hypervisor3 pvedaemon[1194]: <root@pam> starting task UPID:hypervisor3:00000863:00008392:59F1C20E:vzst
Oct 26 13:07:58 hypervisor3 systemd[1]: Starting PVE LXC Container: 607...
-- Subject: Unit pve-container@607.service has begun start-up
-- Defined-By: systemd
--
-- Unit pve-container@607.service has begun starting up.
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: lxccontainer.c: wait_on_daemonized_start: 751 No such fil
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 368 The container failed to star
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 370 To get more details, run the
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 372 Additional information can b
Oct 26 13:07:59 hypervisor3 pvedaemon[1192]: unable to get PID for CT 607 (not running?)
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Control process exited, code=exited status=1
Oct 26 13:07:59 hypervisor3 systemd[1]: Failed to start PVE LXC Container: 607.
-- Subject: Unit pve-container@607.service has failed
-- Defined-By: systemd
--
-- Unit pve-container@607.service has failed.
--
-- The result is failed.
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Unit entered failed state.
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Failed with result 'exit-code'.
Oct 26 13:07:59 hypervisor3 pvedaemon[2147]: command 'systemctl start pve-container@607' failed: exit code 1
Oct 26 13:07:59 hypervisor3 pvedaemon[1194]: <root@pam> end task UPID:hypervisor3:00000863:00008392:59F1C20E:vzstart:6
Oct 26 13:08:00 hypervisor3 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: Unit pvesr.service has begun start-up
-- Defined-By: systemd
--
-- Unit pvesr.service has begun starting up.
Oct 26 13:08:01 hypervisor3 systemd[1]: Started Proxmox VE replication runner.
-- Subject: Unit pvesr.service has finished start-up
-- Defined-By: systemd
--
-- Unit pvesr.service has finished starting up.

dendi · Oct 26, 2017

tburger said:
I am just using the standard (free) repro. Patchlevel is as of yesterday 18:00 German time.
Havent installed the test-kernel from this thread.
Would you like me to check something particular in the logs?

Yes, if I did understand well, the problem was on CPUs with no virtual nmi support...
Can you check with command:
cat /proc/cpuinfo | grep nmi

you should get no output if your CPU doesn't support virtual nmi

Just for curiosity.

Thank you

chOzcA75vE0poCY0F6XC · Oct 26, 2017

I don't get any output with cat /proc/cpuinfo | grep nmi.

superbit · Oct 26, 2017

I had problems too with a HP ProLiant ml110, with a Intel Xeon 3040 (no nmi support).
I installed test kernel and everything ok with VMs.

My server is still showing this error at booting:
/sbin/modprobe failed: 1
Can't process LV pve/data: thin-pool target support missing from kernel?
Can't process LV pve/vm-901-disk-1: thin-pool target support missing from kernel?

I run lvm and lvscan and LVM seem to be fine:

Code:

  ACTIVE            '/dev/pve/swap' [4.00 GiB] inherit
  ACTIVE            '/dev/pve/root' [116.25 GiB] inherit
  ACTIVE            '/dev/pve/data' [329.26 GiB] inherit
  ACTIVE            '/dev/pve/vm-901-disk-1' [100.00 GiB] inherit

chOzcA75vE0poCY0F6XC · Oct 27, 2017

I've got a similar error message on boot, it says that the LVM-thin pool apparently is missing support from the kernel just like superbit mentioned.
Didn't test it out yet on LVM-thin, but that old D620 machine is still capable of running KVM VMs and containers.

apoc · Oct 27, 2017

dendi said:
...you should get no output if your CPU doesn't support virtual nmi...

As chOzcA75vE0poCY0F6XC states, I dont get anything back from that as well (Opteron 6276)
I dont get the LVM message, but that makes sense as I am not using LVM...

fabian · Oct 27, 2017

thanks for the feedback, forwarded upstream. the LVM-thin message on boot can be safely ignored, it is not relevant at that stage of the boot process.

superbit · Oct 30, 2017

Thanks Fabian. I prefer don't see an error on boot, but I'm quiet if it can be safely ignored. Thanks again. A personal doubt, what's the reason of this error?

fabian said:
thanks for the feedback, forwarded upstream. the LVM-thin message on boot can be safely ignored, it is not relevant at that stage of the boot process.

neiion · Oct 31, 2017

chOzcA75vE0poCY0F6XC said:
I've installed Proxmox on a Dell D620 yesterday, and brought it fully up to date with dist-upgrade.
Containers worked just fine, but couldn't start VM's.
After installing the test kernel and rebooting the VM's work, but the Containers won't start anymore.

The D620 has got a T5600 CPU.

Edit: For some reason it didn't mount the CIFS share on boot, that was the reason why the containers wouldn't start. Fixed now.

Here are some logs:

systemctl status pve-container@607.service

● pve-container@607.service - PVE LXC Container: 607
Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2017-10-26 13:07:59 CEST; 5min ago
Docs: man:lxc-start
man:lxc
manct
Process: 2149 ExecStart=/usr/bin/lxc-start -n 607 (code=exited, status=1/FAILURE)

Oct 26 13:07:58 hypervisor3 systemd[1]: Starting PVE LXC Container: 607...
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: lxccontainer.c: wait_on_daemonized_start: 751 No such file or directory - Failed to receive the container state
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 368 The container failed to start.
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 370 To get more details, run the container in foreground mode.
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 372 Additional information can be obtained by setting the --logfile and --logpriority optio
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Control process exited, code=exited status=1
Oct 26 13:07:59 hypervisor3 systemd[1]: Failed to start PVE LXC Container: 607.
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Unit entered failed state.
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Failed with result 'exit-code'.
~

journalctl -xe

-- Unit user@0.service has finished starting up.
--
-- The start-up result is done.
Oct 26 13:07:27 hypervisor3 kernel: perf: interrupt took too long (3145 > 3131), lowering kernel.perf_event_max_sample
Oct 26 13:07:58 hypervisor3 pvedaemon[2147]: starting CT 607: UPID:hypervisor3:00000863:00008392:59F1C20E:vzstart:607:
Oct 26 13:07:58 hypervisor3 pvedaemon[1194]: <root@pam> starting task UPID:hypervisor3:00000863:00008392:59F1C20E:vzst
Oct 26 13:07:58 hypervisor3 systemd[1]: Starting PVE LXC Container: 607...
-- Subject: Unit pve-container@607.service has begun start-up
-- Defined-By: systemd
--
-- Unit pve-container@607.service has begun starting up.
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: lxccontainer.c: wait_on_daemonized_start: 751 No such fil
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 368 The container failed to star
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 370 To get more details, run the
Oct 26 13:07:59 hypervisor3 lxc-start[2149]: lxc-start: 607: tools/lxc_start.c: main: 372 Additional information can b
Oct 26 13:07:59 hypervisor3 pvedaemon[1192]: unable to get PID for CT 607 (not running?)
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Control process exited, code=exited status=1
Oct 26 13:07:59 hypervisor3 systemd[1]: Failed to start PVE LXC Container: 607.
-- Subject: Unit pve-container@607.service has failed
-- Defined-By: systemd
--
-- Unit pve-container@607.service has failed.
--
-- The result is failed.
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Unit entered failed state.
Oct 26 13:07:59 hypervisor3 systemd[1]: pve-container@607.service: Failed with result 'exit-code'.
Oct 26 13:07:59 hypervisor3 pvedaemon[2147]: command 'systemctl start pve-container@607' failed: exit code 1
Oct 26 13:07:59 hypervisor3 pvedaemon[1194]: <root@pam> end task UPID:hypervisor3:00000863:00008392:59F1C20E:vzstart:6
Oct 26 13:08:00 hypervisor3 systemd[1]: Starting Proxmox VE replication runner...
-- Subject: Unit pvesr.service has begun start-up
-- Defined-By: systemd
--
-- Unit pvesr.service has begun starting up.
Oct 26 13:08:01 hypervisor3 systemd[1]: Started Proxmox VE replication runner.
-- Subject: Unit pvesr.service has finished start-up
-- Defined-By: systemd
--
-- Unit pvesr.service has finished starting up.

I had a issue mounting shares also , this is how i fixed it https://forum.proxmox.com/threads/pve-5-1-cifs-share-issue-mount-error-112-host-is-down.37788/

hope it helps

PVE 5.1: KVM broken on old CPUs

Member

Proxmox Staff Member

Well-Known Member

Renowned Member

Proxmox Staff Member

Member

New Member

Well-Known Member

Famous Member

Renowned Member

Famous Member

Active Member

Renowned Member

Active Member

New Member

Active Member

Famous Member

Proxmox Staff Member

New Member

Active Member