[SOLVED] PVE 6.2 - Unable to start nested virtualisation guest

Jun 8, 2016
344
75
93
48
Johannesburg, South Africa
Have a nested virtualisation PVE guest that has stopped working since upgrading to PVE 6.2

Code:
[admin@kvm1d ~]# cat /sys/module/kvm_intel/parameters/nested
Y

I temporarily remove the 'args' line from the VM configuration file, start the guest to record the 'cpu' parameters passed to the VM, shut it down and then append '+vmx'. The resulting 'args' line in the VM's configuration file is subsequently:
Code:
args: -cpu 'Westmere,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+pcid,vendor=GenuineIntel,+vmx'

Receive the following error when attempting to start the nested virtual:
Code:
kvm: warning: host doesn't support requested feature: MSR(48FH).vmx-exit-load-perf-global-ctrl [bit 12]
kvm: warning: host doesn't support requested feature: MSR(490H).vmx-entry-load-perf-global-ctrl [bit 13]


Code:
[admin@kvm1d ~]# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.34-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-1
pve-kernel-helper: 6.2-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.9-pve1
ceph-fuse: 14.2.9-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.3
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-5
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
 
Hallo,

ich nehme an das sich da mehrere Einstellungen überschneiden.

Um mehr darüber sagen zu können, wäre der Start-Befehl von KVM gut.
Bitte aber mit den -args

Code:
qm showcmd <VMID> --pretty
 
Hi Wolfgang,

Herewith the output of the command:
Code:
[admin@kvm1d ~]# qm showcmd 105 --pretty
/usr/bin/kvm \
  -id 105 \
  -name pve-test \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/105.qmp,server,nowait' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/105.pid \
  -daemonize \
  -smbios 'type=1,uuid=f16fe8de-4d2e-4b7f-8fed-4dce217f9903' \
  -smp '2,sockets=2,cores=1,maxcpus=2' \
  -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vnc unix:/var/run/qemu-server/105.vnc,password \
  -cpu 'Westmere,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+pcid,vendor=GenuineIntel' \
  -m 4096 \
  -object 'memory-backend-ram,id=ram-node0,size=2048M' \
  -numa 'node,nodeid=0,cpus=0,memdev=ram-node0' \
  -object 'memory-backend-ram,id=ram-node1,size=2048M' \
  -numa 'node,nodeid=1,cpus=1,memdev=ram-node1' \
  -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' \
  -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' \
  -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' \
  -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' \
  -device 'cirrus-vga,id=vga,bus=pci.0,addr=0x2' \
  -chardev 'socket,path=/var/run/qemu-server/105.qga,server,nowait,id=qga0' \
  -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' \
  -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' \
  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:b0faf3ba49c7' \
  -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' \
  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' \
  -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' \
  -drive 'file=/dev/rbd/rbd_hdd/vm-105-disk-0,if=none,id=drive-scsi0,cache=writeback,discard=on,format=raw,aio=threads,detect-zeroes=unmap' \
  -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,rotation_rate=1,bootindex=100' \
  -netdev 'type=tap,id=net0,ifname=tap105i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
  -device 'virtio-net-pci,mac=E2:CF:D8:35:E3:11,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300' \
  -rtc 'base=localtime' \
  -machine 'type=pc+pve0' \
  -cpu 'Westmere,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+pcid,vendor=GenuineIntel,+vmx'


I have a feeling this is due to changes in Qemu 5 and/or kernel 5.4 with regards to Intel cpu vulnerability mitigation. I presume 'l1tf' or 'mds'. I have tried booting the kernel with parameters where I've tried to disable mitigation (mds=off l1tf=off tsx_async_abort=off) but this makes no difference.

Dell R710 with the following CPU:
Code:
[admin@kvm1d ~]# head -n 27 /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
stepping        : 2
microcode       : 0x1f
cpu MHz         : 1595.961
cache size      : 12288 KB
physical id     : 1
siblings        : 8
core id         : 0
cpu cores       : 4
apicid          : 32
initial apicid  : 32
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d
bugs            : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips        : 4787.79
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

Code:
[admin@kvm1d ~]# for f in /sys/devices/system/cpu/vulnerabilities/*; do echo -e "${f##*/}\t-" $(cat "$f"); done
itlb_multihit     - KVM: Mitigation: Split huge pages
l1tf              - Mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
mds               - Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
meltdown          - Mitigation: PTI
spec_store_bypass - Mitigation: Speculative Store Bypass disabled via prctl and seccomp
spectre_v1        - Mitigation: usercopy/swapgs barriers and __user pointer sanitization
spectre_v2        - Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling
tsx_async_abort   - Not affected
 
I don't see any obvious mistakes.

But can you try to use "host" without vmx instead of Westmere?
 
Hi Wolfgang,

Setting the CPU type as host and removing the '-args' line works perfectly. Can't migrate nested virtual guests anyway so those hoops I was jumping through were unnecessary. We restrict the CPU type for other guests to ensure they are able to migrate during maintenance.


Many thanks for your prompt assistance!