[SOLVED] KVM Hardware Virtualization Stopped Working

xcj

Member
Jul 6, 2021
34
7
13
48
Hi there,

I have two Hosts Lenovo M900 Tiny with an Intel I7 6700T with VT-D enabled on the BIOS. Both running Virtual Environment 7.0-14+1

One of the hosts, after a reboot started showing "internal error" on the VM's, and on the Syslog showed "KVM internal error. Suberror: 1 / emulation failure".

I found that only disabling "KVM Hardware Virtualization" on the VM's i was able to boot them , except a Windows Server VM that refused to work without "KVM Hardware Virtualization".

I think something went wrong in that host installation, i had the dark mode mod installed and when this problem happened the GUI reverted back to default. The other server still has the dark GUI.

Do you guys have any idea of could be happening or where to look?

Thanks!
 
One of the hosts, after a reboot started showing "internal error" on the VM's, and on the Syslog showed "KVM internal error. Suberror: 1 / emulation failure".
Maybe a regression of the kernel, you could try booting an older by selecting it in the boot loader menu on startup?
 
They were running the same kernel:

root@node1:~# uname -r 5.11.22-7-pve root@node2:~# uname -r 5.11.22-7-pve

Rebooted the node1 on previous version:

Code:
root@node1:~# uname -r                                                                                                                                                                                                     
5.11.22-5-pve

Node1 can't still run VM's with KVM Hardware Virtualization, and Node2 (running the newer version) can.

Thanks for the sugestion, any other ideas?

Thanks in advance!
 
Can you please post the whole log output, ideally from journalctl (contains both, kernel and syslog) and post it here (directly or as attachment)? Maybe we can get more details about this error from those log.

E.g., getting the log of the current boot: journalctl -b
 
Sorry about it being in PDF , the attach function wouldn't let me attach RTF documents.

About the corosync issues on the LOGs i use a Rasperry Pi as QiDevice and it malfunctioned, i already reinstalled everything but the instructions i used before no longer work, i get a permission error when integrating the PI on Corosync. I am still looking on other threads about this issue but the KVM acceleration is the priority and focus of this thread.

I attached both hosts LOG in the way you suggested, node1 is the one with no KVM Hardware Virtualization working, on node 2 it's working, i thought i would be usefull to have the boot log of both.

Once again, thanks for the help!
 

Attachments

  • Node1Boot.pdf
    225.8 KB · Views: 4
  • Node2Boot.pdf
    206.2 KB · Views: 1
Last edited:
The relevant part from Node 1 would be the corosync start and the failure of the qdevice at the end:

Code:
Nov 17 21:02:26 node1 systemd[1]: Starting Corosync Cluster Engine...
Nov 17 21:02:26 node1 corosync[2261]:   [MAIN  ] Corosync Cluster Engine 3.1.5 starting up
Nov 17 21:02:26 node1 corosync[2261]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog systemd xmlconf vqsim nozzle snmp pie relro bindnow
Nov 17 21:02:26 node1 corosync[2261]:   [TOTEM ] Initializing transport (Kronosnet).
Nov 17 21:02:26 node1 kernel: sctp: Hash tables configured (bind 512/512)
Nov 17 21:02:26 node1 corosync[2261]:   [TOTEM ] totemknet initialized
Nov 17 21:02:26 node1 corosync[2261]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Nov 17 21:02:26 node1 corosync[2261]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Nov 17 21:02:26 node1 corosync[2261]:   [QB    ] server name: cmap
Nov 17 21:02:26 node1 corosync[2261]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Nov 17 21:02:26 node1 corosync[2261]:   [QB    ] server name: cfg
Nov 17 21:02:26 node1 corosync[2261]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Nov 17 21:02:26 node1 corosync[2261]:   [QB    ] server name: cpg
Nov 17 21:02:26 node1 corosync[2261]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Nov 17 21:02:26 node1 corosync[2261]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Nov 17 21:02:26 node1 corosync[2261]:   [WD    ] Watchdog not enabled by configuration
Nov 17 21:02:26 node1 corosync[2261]:   [WD    ] resource load_15min missing a recovery key.
Nov 17 21:02:26 node1 corosync[2261]:   [WD    ] resource memory_used missing a recovery key.
Nov 17 21:02:26 node1 corosync[2261]:   [WD    ] no resources configured.
Nov 17 21:02:26 node1 corosync[2261]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Nov 17 21:02:26 node1 corosync[2261]:   [QUORUM] Using quorum provider corosync_votequorum
Nov 17 21:02:26 node1 corosync[2261]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Nov 17 21:02:26 node1 corosync[2261]:   [QB    ] server name: votequorum
Nov 17 21:02:26 node1 corosync[2261]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Nov 17 21:02:26 node1 corosync[2261]:   [QB    ] server name: quorum
Nov 17 21:02:26 node1 corosync[2261]:   [TOTEM ] Configuring link 0
Nov 17 21:02:26 node1 corosync[2261]:   [TOTEM ] Configured link number 0: local addr: 192.168.1.22, port=5405
Nov 17 21:02:26 node1 corosync[2261]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 17 21:02:26 node1 corosync[2261]:   [KNET  ] host: host: 2 has no active links
Nov 17 21:02:26 node1 corosync[2261]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 17 21:02:26 node1 corosync[2261]:   [KNET  ] host: host: 2 has no active links
Nov 17 21:02:26 node1 corosync[2261]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Nov 17 21:02:26 node1 corosync[2261]:   [KNET  ] host: host: 2 has no active links
Nov 17 21:02:26 node1 corosync[2261]:   [QUORUM] Sync members[1]: 1
Nov 17 21:02:26 node1 corosync[2261]:   [QUORUM] Sync joined[1]: 1
Nov 17 21:02:26 node1 corosync[2261]:   [TOTEM ] A new membership (1.a12) was formed. Members joined: 1
Nov 17 21:02:26 node1 corosync[2261]:   [QUORUM] Members[1]: 1
Nov 17 21:02:26 node1 systemd[1]: Started Corosync Cluster Engine.
Nov 17 21:02:26 node1 corosync[2261]:   [MAIN  ] Completed service
synchronization, ready to provide service.
Nov 17 21:02:26 node1 systemd[1]: Starting Corosync Qdevice daemon...
Nov 17 21:02:26 node1 systemd[1]: Starting PVE API Daemon...
Nov 17 21:02:26 node1 corosync-qdevice[2279]: Can't init nss (-8174): security library: bad database.
Nov 17 21:02:26 node1 systemd[1]: corosync-qdevice.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 21:02:26 node1 systemd[1]: corosync-qdevice.service: Failed with result 'exit-code'.
Nov 17 21:02:26 node1 systemd[1]: Failed to start Corosync Qdevice daemon.

So corosync does not gets quorate and thus pmxcfs cannot start correctly later on, the real issue of that setup is rather cluster (network) related.

Can you please also post the output of pveversion -v and cat /etc/corosync/corosync.conf?
Possibly it's just an unfinished upgrade..
 
Hi , i see, the Rasperry Pi is out of service for a while now, and i think i had already rebooted both nodes after the PI being out. But it's the most verbose error the node1 gives on boot, and when it finishes booting the node2 reboots sponaneously.

Node 1
Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-5 (running version: 7.1-5/6fe299a0)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-6
pve-kernel-5.13.19-1-pve: 5.13.19-2
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.14-1
proxmox-backup-file-restore: 2.0.14-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-2
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-1
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-3
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
 
Last edited by a moderator:
Node2:
Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-5 (running version: 7.1-5/6fe299a0)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-6
pve-kernel-5.13.19-1-pve: 5.13.19-2
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.14-1
proxmox-backup-file-restore: 2.0.14-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.4-2
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-1
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-3
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
 
Last edited by a moderator:
Node1:
Code:
root@node1:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}
nodelist {
  node {
    name: node1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.22
  }
  node {
    name: node2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.23
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: 192.168.1.24
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: Network21
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
 
Last edited by a moderator:
Node2:

Code:
root@node2:~# cat /etc/corosync/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: node1
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.22
  }
  node {
    name: node2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.23
  }
}

quorum {
  device {
    model: net
    net {
      algorithm: ffsplit
      host: 192.168.1.24
      tls: on
    }
    votes: 1
  }
  provider: corosync_votequorum
}

totem {
  cluster_name: Network21
  config_version: 3
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Thanks for all the help!
 
Hi there, i reinstalled both servers, rebuilt the cluster, re-integrated the QIDevice (a RasperryPI 1) and i was hopping the problem would go away with a fresh install.

It didn't.

Both machines are identical , both have the same configuration on the BIOS and both are running the latest BIOS Lenovo made for them.

One can run VM's with the "KVM Hardware Virtualization" ON or OFF, the can only run VM's with the "KVM Hardware Virtualization" OFF (unchecked in the options).

I have really no clue where the source of the problem is...

I attached the error a machine gives when i boot it with the "KVM Hardware Virtualization" set to ON.
 

Attachments

  • kvm.txt
    1.7 KB · Views: 3
I gathered more debugs that were requested in older threads of users having this issue, in those threads the problem went away and the root cause was not found.


LSMOD | GREP KVM + UNAME -R for both nodes:

Code:
root@node1:~# lsmod | grep kvm
kvm_intel             286720  0
kvm                   872448  1 kvm_intel
irqbypass              16384  1 kvm

root@node2:~# lsmod | grep kvm
kvm_intel             286720  32
kvm                   872448  1 kvm_intel
irqbypass              16384  17 kvm

root@node1:/etc# uname -r
5.13.19-2-pve

root@node2:~# uname -r
5.13.19-2-pve

NODE1 pveversion -v

Code:
root@node1:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

NODE2 pveversion -v

Code:
root@node2:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

Also i attached the boot sequence from the Node1 Syslog

Thanks in advance for any help or hint in the right direction.
 

Attachments

  • Node1BootJan1st.txt
    122.2 KB · Views: 1
Last edited:
Hi , this can be marked as solved, i turns out it's a problem in the CPU, from one moment to another this feature stopped working.
I tested this CPU on another machine (i am migrating to another chassis), the problem appeared there two.
I tested another CPU on this node and the problem went away.
 
Hi , this can be marked as solved, i turns out it's a problem in the CPU, from one moment to another this feature stopped working.
I tested this CPU on another machine (i am migrating to another chassis), the problem appeared there two.
I tested another CPU on this node and the problem went away.
Thanks for your feedback. As original poster you can mark the thread as solved your self by using the Edit Thread button at the top right and select the Solved prefix.

Possibly still worth a shot, install the microcode package for your CPU vendor: https://wiki.debian.org/Microcode
 
I tried that, but the problem persisted. I' marking the thread as solved , weirdest hardware malfunction i ever saw.

Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!