Hello everyone,
we are running a 4-Node pve cluster with 3 Nodes in a hyper-converged setup with ceph and the 4th Node just for virtualization without its own osds. After creating a VM with a TPM state device on a ceph pool it fails to start with the error message:
Other VMs without TPM work fine on the same pool. It is even possible to remove and recreate the TPM state and move the disk from one pool to another. Whenever the disk is running on ceph it fails for the same reason but when it is located on the local storage the VM starts without trouble.
Assuming from the error message the TPM disk is mapped locally on the hypervisor whereas the other drives are handled by qemu directly which makes the TPM quiet unique. Creating a VM with the ceph-common tools installed and trying to map a random rbd (not just TPMs) from there fails as well.
After having no success in troubleshooting we set up a test-cluster on 3 old desktop PCs running into the problem again.
The issue first occured during the cluster's initial setup on pve 7.4 with Ceph quincy and is still persisting after upgrading to pve 8. During the installation of the VMs we put the TPM on a local pool but this way we are missing failover when entering the productive phase and creating a snapshot with a local disk attached prevents easy live migration.
We are running out of ideas. Thanks a lot,
Marcus
we are running a 4-Node pve cluster with 3 Nodes in a hyper-converged setup with ceph and the 4th Node just for virtualization without its own osds. After creating a VM with a TPM state device on a ceph pool it fails to start with the error message:
rbd: sysfs write failed
TASK ERROR: start failed: can't map rbd volume vm-103-disk-1: rbd: sysfs write failed
Other VMs without TPM work fine on the same pool. It is even possible to remove and recreate the TPM state and move the disk from one pool to another. Whenever the disk is running on ceph it fails for the same reason but when it is located on the local storage the VM starts without trouble.
Assuming from the error message the TPM disk is mapped locally on the hypervisor whereas the other drives are handled by qemu directly which makes the TPM quiet unique. Creating a VM with the ceph-common tools installed and trying to map a random rbd (not just TPMs) from there fails as well.
After having no success in troubleshooting we set up a test-cluster on 3 old desktop PCs running into the problem again.
The issue first occured during the cluster's initial setup on pve 7.4 with Ceph quincy and is still persisting after upgrading to pve 8. During the installation of the VMs we put the TPM on a local pool but this way we are missing failover when entering the productive phase and creating a snapshot with a local disk attached prevents easy live migration.
We are running out of ideas. Thanks a lot,
Marcus