Hi,
I am not sure I fully understand the question, but Proxmox VE uses a self-signed certificate (signed by a certificate authority on the cluster) in order to provide a TLS encrypted communication via the API server, as described in the wiki article you linked to. This is used by default, as it can be setup without further user interaction.
Adding an external certificate authority to generate certificates requires additional setup of the corresponding acme account and plugins, but can be easily achieved via the WebUI. The certificates managed by acme are then auto renewed by the pve-daily-update.service
systemd service.
You can list the nodes certificate infomation also via pvenode cert info
.
What issues did you encounter exactly? Where these related to the certificates or rather to the corosync setup?
Hi Chris
I have read the documentation on multiple occasions trying to resolve problems with what appears to me to be;
1. A patchy treatment to the internal certificate process that can be broken and can be difficult to fully repair
2. A less than well documented practice for the use and deployment of ACME certs.
3 How or whether ACME certs and self generated certs should be mixed or only one type deployed
Let me state that I have spent much of the last two years researching cluster platforms for edge cloud construction. There is a lot to like in what has been achieved with ProxMox. However, for the record, as the digital systems landscape is built going forward, secure, stable, reliable and extensible certification will be a critical feature not only for the internal operational reliability of the clusters, but also for the cross site and integration of communications with users accessing cluster hosted applications.
I started my evaluation of ProxMox on an early version of PVE 7 and had a cluster under way when PVE 8 arrived and was adding servers to the cluster and upgrading as I went. The transition from 7 to 8 was not painless, and with adding and removing servers I managed to break the certification system. This was not a complete failure, but a failure of reliability in some functions. Case in point, with some nodes, and I have 8 nodes several of the newer ones that were built on PVE 8 and had Ubuntu 22.04.3 VM's built on them that were unable to connect to the VNC console by way of failure of the host node to recognize the VM certs. Rebuilt nodes also displayed cert failures, not always, and often for reasons that were not obvious.
In my opinion, if a node has been certified and is being "managed" by the cluster host then it should up to the cluster host to resolve any and all cert issues transparently at the OS level by way of the web API for all of its attached VM's.
This also needs to be addressed systemically in a distributed cluster environment. My next project in my ProxMox evaluation is a set of physically separated but fiber linked connected host sites. The sites are single 42U racks with power support and rack servers running the latest PVE 8 code with the physical sites being connected on a site LAN to a Layer 3 Fiber Switch with a distributed DHCP relay at the router level maintaining the internal cluster LANs on two test sites and the external internet IP's assigned by the network operators and reverse proxy routing of HTTPS services and static IP routing of VM's hosting NAS, VPN access, and SMTP services.
High reliability is a key issue and certificate failures must be resolved in real time and reported via SNMP to a systemic control system.
Where I sit at the moment is that the certs issue is the last critical weak point that I need to resolve to my satisfaction before I commit to this platform. There's lots to like, but there are several points of failure that are not obvious until they fail.