Unable to migrate & start VM due to problematic TPM-disks

Jan 17, 2025
30
18
8
Netherlands
Hello,

Recently we've experienced problematic behavior regarding TPM-disks within Proxmox 8.4.16, in our 30 node cluster ONE single node is unable to do the following:
- Start new VMs which have a TPM-disk configured, error [1]
- Live-migrate VMs with a TPM-disk configured, error [2]

Once we shut down the VM, we are able to 'offline' migrate it, and start the VM on another node without any problems.
We've verified that all packages & active kernels are identical, the same is with the active qemu-process version verified by the command `info version` within the QEMU monitor.

Using the `rbd status` & `rbd info` commands the only difference between healthy & problematic TPM-disks is that the problematic disk does not have any RBD Watchers, whilst the 'healthy' disk does have an active RBD Watcher.
Since this VM is online we were unable to verify if manually unmapping & mapping RBD volumes does work.

[1] can't map rbd volume: vm-$vmid-disk-3: rbd: sysfs write failed
[2] migration status error: failed - pre-save failed: tpm-emulator

We are running the following package versions:
- pve-manager/8.4.16 (running kernel: 6.8.12-15-pve)
- pve-qemu-kvm/ 9.2.0-7
- qemu-server/8.4.5
- ceph/18.2.7-pve2


Has anyone else experienced this before?
 
  • Like
Reactions: RiHe
Last edited:
Hi @ all,

@wbk All hosts are identical, all actual 'Servers' with EPYC Milan chips which do support the TPM functionality. We're running the TPM v2.0 version due to WIN11 requirements.
I do not think that we're running into bug # 7066, snapshot-as-a-volume chain afaik only applies to iSCSI storage.

@SteveITS, All disks on the VMs (both problematic ones and healthy ones) are running on Ceph storage, RBD can list all details about the disk.