EFI and TPM removed from VM config when stopped, not when shutdown

David Herselman · Oct 12, 2021

We have had good success with the Secure Boot capable EFI disks and TPM v2.0 emulation. Tested on latest no-subscription with Ceph Pacific 16.2.6. Live migrate works with Windows 11 with full disk encryption (BitLocker) and everything works just perfectly as long as one selects the start/shutdown/migrate options. Issuing a stop instruction results in EFI and TPM references being removed from the VM configuration file.

Nice work, looking forward to this landing in the enterprise repo soon!

Code:

[admin@kvm1d ~]# cat /etc/pve/nodes/kvm1d/qemu-server/122.conf > /root/122.conf.backup; cat /etc/pve/nodes/kvm1d/qemu-server/122.conf
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 1
cpu: Westmere,flags=+pcid
efidisk0: rbd_hdd:vm-122-disk-1,efitype=4m,pre-enrolled-keys=1,size=1M
ide2: none,media=cdrom
localtime: 1
machine: pc-q35-6.0
memory: 4096
name: lair-temp
net0: virtio=00:16:3e:00:01:12,bridge=vmbr0,tag=1
numa: 1
ostype: win10
protection: 1
scsi0: rbd_hdd:vm-122-disk-0,cache=writeback,discard=on,size=80G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=f45692f6-0e09-48d2-ae74-7ce85f3f3267
sockets: 2
tpmstate0: rbd_hdd:vm-122-disk-2,size=4M,version=v2.0

[admin@kvm1d ~]# rbd showmapped | grep -e namespace -e 122
id  pool     namespace  image          snap  device
10  rbd_hdd             vm-122-disk-1  -     /dev/rbd10

[admin@kvm1d ~]# rbd ls rbd_hdd -l | grep -e NAME -e 122
NAME                                   SIZE     PARENT                            FMT  PROT  LOCK
vm-122-disk-0                           80 GiB                                      2
vm-122-disk-1                            1 MiB                                      2        excl
vm-122-disk-2                            4 MiB                                      2

[admin@kvm1d ~]# qm start 122; sleep 45; qm stop 122; sleep 20;
Requesting HA start for VM 122
Requesting HA stop for VM 122

[admin@kvm1d ~]# rbd showmapped | grep -e namespace -e 122
id  pool     namespace  image          snap  device
10  rbd_hdd             vm-122-disk-1  -     /dev/rbd10

[admin@kvm1d ~]# rbd ls rbd_hdd -l | grep -e NAME -e 122
NAME                                   SIZE     PARENT                            FMT  PROT  LOCK
vm-122-disk-0                           80 GiB                                      2
vm-122-disk-1                            1 MiB                                      2        excl
vm-122-disk-2                            4 MiB                                      2

[admin@kvm1d ~]# diff -uNr /root/122.conf.backup /etc/pve/nodes/kvm1d/qemu-server/122.conf
--- /root/122.conf.backup       2021-10-12 21:52:49.922585883 +0200
+++ /etc/pve/nodes/kvm1d/qemu-server/122.conf   2021-10-12 21:55:49.000000000 +0200
@@ -3,7 +3,6 @@
 boot: order=scsi0;ide2;net0
 cores: 1
 cpu: Westmere,flags=+pcid
-efidisk0: rbd_hdd:vm-122-disk-1,efitype=4m,pre-enrolled-keys=1,size=1M
 ide2: none,media=cdrom
 localtime: 1
 machine: pc-q35-6.0
@@ -17,4 +16,3 @@
 scsihw: virtio-scsi-pci
 smbios1: uuid=f45692f6-0e09-48d2-ae74-7ce85f3f3267
 sockets: 2
-tpmstate0: rbd_hdd:vm-122-disk-2,size=4M,version=v2.0

Works perfectly when you use 'shutdown' instead of 'stop':

Code:

[admin@kvm1d ~]# cat /root/122.conf.backup > /etc/pve/nodes/kvm1d/qemu-server/122.conf
[admin@kvm1d ~]# qm start 122; sleep 45; qm shutdown 122; sleep 20;
[admin@kvm1d ~]# kill 1069131
[admin@kvm1d ~]# rbd showmapped | grep -e namespace -e 122;
id  pool     namespace  image          snap  device
10  rbd_hdd             vm-122-disk-1  -     /dev/rbd10
11  rbd_hdd             vm-122-disk-0  -     /dev/rbd11
[admin@kvm1d ~]# rbd ls rbd_hdd -l | grep -e NAME -e 122;
NAME                                   SIZE     PARENT                            FMT  PROT  LOCK
vm-122-disk-0                           80 GiB                                      2        excl
vm-122-disk-1                            1 MiB                                      2        excl
vm-122-disk-2                            4 MiB                                      2
[admin@kvm1d ~]# diff -uNr /root/122.conf.backup /etc/pve/nodes/kvm1d/qemu-server/122.conf
<blank>

Windows 11 with Secure Boot enabled:

Destroyed the test Windows 11 system where BitLocker was working, unfortunatley didn't take a snippet from it but it worked flawlessly.

Stefan_R · Oct 13, 2021

I cannot reproduce the issue you are running into here. Usually we never remove anything from the config file, unless it's an unknown option... could it potentially be that you migrated to a slightly outdated node and then did the "stop" action there? Would be weird, since the TPM worked, but that's the only thing I can think of right now. Is this always reproducible? Any more info on your setup/anything in syslog?

David Herselman · Oct 13, 2021

I can confirm that this is reproducible at will on a cluster of PVE 7 nodes which are subscribed to the enterprise repositories, where we temporarily added the no-subscription repository to prepare ourselves for vTPM and EFI state disks becoming available on our main production clusters that exclusively use the enterprise repositories.

The problem doesn't occur if we shutdown the guest and the guest responds to this request. If we select 'shutdown' and the OS isn't booted and the guest agent subsequently isn't running it also removes the two lines as I presume that it issues a stop instruction after a timeout. We can however immediately recreate the scenario whenever we issue a stop after starting a VM with blank discs, as shown in the example above...

Checked for updates in the early hours of this morning, herewith the version information:

Code:

[admin@kvm1d ~]# pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-2
pve-kernel-5.11: 7.0-8
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-3-pve: 5.11.22-7
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-10
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-12
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
openvswitch-switch: 2.15.0+ds1-2
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-10
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-16
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

David Herselman · Oct 13, 2021

PS: The copy & paste of the console commands above also confirm that the VM was running locally on the same node that we issued 'qm start xx' and 'qm stop xx' on...

jasonsansone · Oct 13, 2021

Have the VM's been migrated to another host and back or the nodes all restarted? I had the same problems when I updated the packages but the VM was still running under the prior qemu version. Restarting or shutting down the VM within the same node didn't help.

David Herselman · Oct 15, 2021

That was indeed the problem, restarting the node resolved that issue so I presume a service that should be restarted as part of the package upgrade process... Thought that all PVE 7.0-13 cluster nodes had fenced and reset after network interfaces suddenly changed MTU (logs below), turns out I need to change legacy systems by replacing 'WATCHDOG_MODULE=ipmi_watchdog' with 'WATCHDOG_MODULE=iTCO_wdt' in /etc/default/pve-ha-manager.

We're running OvS with jumbo frames, PVE occassionally resets the MTU size which then causes Corosync to stop communicating, leading to the cluster fencing itself:

Code:

Oct 15 12:26:20 kvm1a pvedaemon[2132829]: <davidh@pam> starting task UPID:kvm1a:0020D364:015185F5:6169574C:qmsnapshot:124:davidh@pam:
Oct 15 12:26:20 kvm1a pvedaemon[2151268]: <davidh@pam> snapshot VM 124: mbr2gpt
Oct 15 12:26:22 kvm1a pvedaemon[2132829]: <davidh@pam> end task UPID:kvm1a:0020D364:015185F5:6169574C:qmsnapshot:124:davidh@pam: OK
Oct 15 12:26:40 kvm1a pvedaemon[2132829]: <davidh@pam> update VM 124: -ide2 shared:iso/win10re-1511-x64-syrex.iso,media=cdrom,size=543390K
Oct 15 12:26:46 kvm1a pvedaemon[2132829]: <davidh@pam> update VM 124: -boot order=ide2;scsi0;net0
Oct 15 12:26:48 kvm1a pvedaemon[2144873]: <davidh@pam> starting task UPID:kvm1a:0020D43C:015190A6:61695768:hastart:124:davidh@pam:
Oct 15 12:26:50 kvm1a pvedaemon[2144873]: <davidh@pam> end task UPID:kvm1a:0020D43C:015190A6:61695768:hastart:124:davidh@pam: OK
Oct 15 12:27:00 kvm1a systemd[1]: Starting Proxmox VE replication runner...
Oct 15 12:27:02 kvm1a systemd[1]: pvesr.service: Succeeded.
Oct 15 12:27:02 kvm1a systemd[1]: Finished Proxmox VE replication runner.
Oct 15 12:27:02 kvm1a systemd[1]: pvesr.service: Consumed 1.302s CPU time.
Oct 15 12:27:03 kvm1a pve-ha-lrm[2151648]: starting service vm:124
Oct 15 12:27:03 kvm1a pve-ha-lrm[2151652]: start VM 124: UPID:kvm1a:0020D4E4:015196CF:61695777:qmstart:124:root@pam:
Oct 15 12:27:03 kvm1a pve-ha-lrm[2151648]: <root@pam> starting task UPID:kvm1a:0020D4E4:015196CF:61695777:qmstart:124:root@pam:
Oct 15 12:27:04 kvm1a kernel: [221237.021281]  rbd7: p1 p2
Oct 15 12:27:04 kvm1a kernel: [221237.059391] rbd: rbd7: capacity 85899345920 features 0x1d
Oct 15 12:27:04 kvm1a systemd[1]: Started 124.scope.
Oct 15 12:27:04 kvm1a systemd-udevd[2151699]: Using default interface naming scheme 'v247'.
Oct 15 12:27:04 kvm1a kernel: [221237.366302] device tap124i0 entered promiscuous mode
Oct 15 12:27:04 kvm1a kernel: [221237.367103] vlan100: dropped over-mtu packet: 2175 > 1500
Oct 15 12:27:04 kvm1a systemd-udevd[2151699]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Oct 15 12:27:04 kvm1a kernel: [221237.367365] vlan100: dropped over-mtu packet: 2175 > 1500
Oct 15 12:27:04 kvm1a kernel: [221237.386087] vlan100: dropped over-mtu packet: 4490 > 1500
Oct 15 12:27:04 kvm1a kernel: [221237.419133] vlan100: dropped over-mtu packet: 2175 > 1500
Oct 15 12:27:04 kvm1a kernel: [221237.419144] vlan100: dropped over-mtu packet: 2175 > 1500
Oct 15 12:27:04 kvm1a kernel: [221237.419329] vlan100: dropped over-mtu packet: 2175 > 1500
Oct 15 12:27:04 kvm1a kernel: [221237.419348] vlan100: dropped over-mtu packet: 2175 > 1500
Oct 15 12:27:04 kvm1a kernel: [221237.419368] vlan100: dropped over-mtu packet: 2175 > 1500
Oct 15 12:27:04 kvm1a kernel: [221237.419475] vlan100: dropped over-mtu packet: 2175 > 1500
Oct 15 12:27:04 kvm1a kernel: [221237.419619] vlan100: dropped over-mtu packet: 2175 > 1500
Oct 15 12:27:06 kvm1a ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap124i0
Oct 15 12:27:06 kvm1a ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln124i0
Oct 15 12:27:06 kvm1a ovs-vsctl: ovs|00002|db_ctl_base|ERR|no port named fwln124i0
Oct 15 12:27:06 kvm1a ovs-vsctl: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl -- add-port vmbr0 tap124i0 tag=1 vlan_mode=dot1q-tunnel other-config:qinq-ethtype=802.1q
Oct 15 12:27:06 kvm1a pve-ha-lrm[2151648]: <root@pam> end task UPID:kvm1a:0020D4E4:015196CF:61695777:qmstart:124:root@pam: OK
Oct 15 12:27:06 kvm1a pve-ha-lrm[2151648]: service status vm:124 started
Oct 15 12:27:06 kvm1a corosync[1998]:   [TOTEM ] Retransmit List: 1e73d5
Oct 15 12:27:06 kvm1a corosync[1998]:   [TOTEM ] Retransmit List: 1e73d5
Oct 15 12:27:06 kvm1a corosync[1998]:   [TOTEM ] Retransmit List: 1e73d5
Oct 15 12:27:06 kvm1a corosync[1998]:   [TOTEM ] Retransmit List: 1e73d5
Oct 15 12:27:06 kvm1a corosync[1998]:   [TOTEM ] Retransmit List: 1e73d5
Oct 15 12:27:06 kvm1a corosync[1998]:   [TOTEM ] Retransmit List: 1e73d5
Oct 15 12:27:06 kvm1a corosync[1998]:   [TOTEM ] Retransmit List: 1e73d5
Oct 15 12:27:06 kvm1a corosync[1998]:   [TOTEM ] Retransmit List: 1e73d5
Oct 15 12:27:06 kvm1a corosync[1998]:   [TOTEM ] Retransmit List: 1e73d5

Search

Search

EFI and TPM removed from VM config when stopped, not when shutdown

David Herselman

Renowned Member

Stefan_R

Proxmox Retired Staff

David Herselman

Renowned Member

David Herselman

Renowned Member

jasonsansone

Well-Known Member

David Herselman

Renowned Member

We value your privacy