[SOLVED] Unable to run host backups anymore: TLS certificate verify failed

getcom

Active Member
Sep 7, 2019
17
0
41
58
Kitzingen
getcom.de
Hello all,

after setup a new PBS, integrated it in PVE cluster, run automated VM backups, I created also a separate user for host backups for the PVE cluster nodes.
I wrote a backup script for the cluster nodes which was working without any problem yesterday.
After that I updated PBS to the latest version 1.1-5 and PVE to 6.4-6.

Today if I run a host backup I got following error on PVE side:

Error: error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:../ssl/statem/statem_clnt.c:1915:

Something has broken the TLS environment.
After that I added the environment variable PBS_FINGERPRINT into the profile.
With this it is working again.
Is the PBS_FINGERPRINT now necessary or is there a bug in the latest versions?

Thank you in advance.
Ralf
 
the fingerprint is necessary if the certificate of the PBS server is not a trusted one.
 
Yes, this is what I read in the documentation and the reason why I added the variable.
Before upgrade to the latest version the server certificate was trusted.

So the question is what is the root cause for this?
The key file on PBS is still the same. I assume that the PVE upgrade from 6.4.4 to 6.4.6 is the root cause.
 
do you have a trusted certificate (e.g., one from Let's Encrypt or a similar CA) setup on the PBS host? does curl -v IP_OR_HOSTNAME_OF_PBS:8007 work or display an error?
 
Yes, it is a Letsencrypt R3 wildcard certificate.
curl is working with https://IP_OR_HOSTNAME_OF_PBS:8007
I got a "SSL certificate verify ok."

The ugly thing with the PBS_FINGERPRINT var is that I have to run a script on the PBS after the automated Letsencrypt deployment to catch the new fingerprint and then deploy it to the hosts which I want to backup.
The fingerprint is more or less a temporary workaround.

I checked the certificate with 'openssl s_client -connect IP_OR_HOSTNAME_OF_PBS:8007
Result: Verify return code: 0 (ok)
It is using TLSv1.3 with Cipher TLS_AES_256_GCM_SHA384.
Same with -CAfile /etc/ssl/certs/ca-certificates.crt.
 
Last edited:
Ok., I got it.
I ran a
Code:
strace proxmox-backup-client snapshots --output-format json --repository USERNAME@HOSTNAME@HOSTNAME:DATASTORENAME
and saw that it reads the ~/.config/proxmox-backup/fingerprints if it is existing and if yes it is using it.
With PBS_FINGERPRINT you will just overwrite it.

After renaming the folder ~/.config/proxmox-backup to ~/.config/proxmox-backup.back it was printing the fingerprint and asking me if I want to continue connecting =>y.
After that it prints the json.

This means that after a Letsencrypt certificate renew the host backups are broken until the .config/proxmox-backup was deleted and the fingerprint is accepted manually or if it will be automatically deployed to ~/.config/proxmox-backup/fingerprints within a script after an acme renew run (this is actually done by a pfSense box for all hosts using SSL/TLS and could be expanded). The support of StrictHostKeyChecking=accept-new would be nice...

Is this how it should working or what is/was the idea behind?
 
Last edited:
My workaround solution to this issue is now the extension of the action script on FreeBSD/pfSense side for the ACME service with following lines for deployment of the certificates plus fingerprint after Letsencrypt renew in a multidomain environment:

Code:
# Proxmox Backup Server Cluster Nodes (pbscnN)
printf "Deploying PBS servers: $PBSClusterNodes ...\n"
test -e /tmp/acme/fingerprints && rm -f /tmp/acme/fingerprints
for h in $PBSClusterNodes
         {
        # Domain
        d="$(echo $h | cut -d. -f2-)"
        printf "\tDeploying $d certs to $h...\n"
        cat /tmp/acme/wildcard_$d/\*.$d/fullchain.cer /tmp/acme/wildcard_$d/\*.$d/\*.$d.key > /tmp/acme/wildcard_$d/\*.$d/proxy.pem
        openssl x509 -in /tmp/acme/wildcard_$d/\*.$d/proxy.pem -noout -fingerprint -sha256 | cut -d= -f2 | tr '[:upper:]' '[:lower:]' >/tmp/acme/wildcard_$d/\*.$d/proxy.pem.fingerprint
        FINGERPRINT="$(cat /tmp/acme/wildcard_$d/\*.$d/proxy.pem.fingerprint)"
        echo "$h $FINGERPRINT" >>/tmp/acme/fingerprints
        echo "$(echo $h | cut -d. -f1) $FINGERPRINT" >>/tmp/acme/fingerprints
        scp /tmp/acme/wildcard_$d/\*.$d/proxy.pem root@$h:/etc/proxmox-backup/proxy.pem 2>/dev/null 1>/dev/null
        scp /tmp/acme/wildcard_$d/\*.$d/\*.$d.key root@$h:/etc/proxmox-backup/proxy.key 2>/dev/null 1>/dev/null
        printf "\tRestarting PBS proxy...\n"
        /usr/bin/ssh -o 'BatchMode=yes' root@$h service proxmox-backup-proxy restart
        }

# Fingerprint deployment
printf "Deploying TLS fingerprint to PBS host backup clients: $PBSClientNodes ...\n"
for h in $PBSClientNodes
        {
        # Proxmox backup clients for host backups
        /usr/bin/ssh -o 'BatchMode=yes' root@$h  test -d ~/.config/proxmox-backup || mkdir -p ~/.config/proxmox-backup 2>/dev/null 1>/dev/null
        scp /tmp/acme/fingerprints root@$h:~/.config/proxmox-backup 2>/dev/null 1>/dev/null
         }  
FIRSTPVECN="$(echo $PVEClusterNodes | awk '{print $1}')"
FPCURRENT="$(/usr/bin/ssh -o 'BatchMode=yes' root@$FIRSTPVECN grep -A4 \": $PBSDatastore\" /etc/pve/storage.cfg | grep fingerprint | awk '{print $2}')"
printf "Changing previous fingerprint:\n\t$FPCURRENT\n\tto\n\t$FINGERPRINT\n"    
/usr/bin/ssh -o 'BatchMode=yes' root@$FIRSTPVECN sed -i "s/$FPCURRENT/$FINGERPRINT/" /etc/pve/storage.cfg 2>/dev/null
 
Last edited:
the fingerprint cache should only ever be used for certificates that fail verification via the system trust store.. maybe there is an issue there with wildcard certificates..
 
one more question - your talking about host backups here - of the PVE host? inside a VM? if the latter, which distro? are you using packages compiled by us or by yourself/some third party? I vaguely remember an issue where it is possible to compile the dependencies without enabling the system trust store at all, in which case it would always fall back to the self-signed logic of verifying the fingerprint manually and caching the result..
 
the fingerprint cache should only ever be used for certificates that fail verification via the system trust store.. maybe there is an issue there with wildcard certificates..
I checked the certificates already against the system CAs with openssl successfully.
It is unclear for me why curl and openssl trust it but the backup client does not if it is using the same CA certs.
 
one more question - your talking about host backups here - of the PVE host? inside a VM? if the latter, which distro? are you using packages compiled by us or by yourself/some third party? I vaguely remember an issue where it is possible to compile the dependencies without enabling the system trust store at all, in which case it would always fall back to the self-signed logic of verifying the fingerprint manually and caching the result..

Maybe this was described inaccurately.
PVE cluster with five or seven nodes. One is the iSCSI/iSER target, three/five are the nodes which are running the VMs, one is backup storage/NFS server.
The PBS is a VM which has its datastore mounted from NFS server. Second ZFS storage server for zpool sync and and additional PBS servers are not considered here.

The host backups are at the moment only for physical Proxmox servers, like PVE cluster nodes, except Proxmox mail gateway and the PBS servers which are VMs at the moment.
The proxmox-backup-client is included, it was already installed in PVE. PVE version is 6.4-6, PBS is 1.1-5, PMG is 6.4-4.
At the moment there is no host backup running for non-Proxmox distributions.
 
thanks for the detailed response - we'll see if we can reproduce this issue.
 
Ok., I found the root cause.
promox-backup-client does not use the FQDN from DNS if only a hostname is given with --repository var:

Code:
strace proxmox-backup-client snapshots --output-format json --repository USERNAME@HOSTNAME@HOSTNAME:DATASTORENAME

It has to be changed to

Code:
strace proxmox-backup-client snapshots --output-format json --repository USERNAME@HOSTNAME@FQDN:DATASTORENAME

or better, if supported:

Code:
strace proxmox-backup-client snapshots --output-format json --repository USERNAME@FQDN@FQDN:DATASTORENAME

In the complete documentation the FQDN is not mentioned, it is talking from hostnames only.
Ok., my fault, sometimes I lose sight of the wood for the trees. It should be clear that the FQDN has to be configured if a wildcard certificate is used.
Same for PVE integration: in storage config the FQDN is necessary.
Maybe this could be added to the documentation or maybe only the FQDN should be used here.
 
Last edited:
ah, that makes sense yes. it is actually USERNAME@REALM@HOST:DATASTORE, where realm is usually either pbs for users defined just in PBS' user config, or pam for system users that are authorized to login with PBS. so my guess is your hostname is 'pbs' as well, which makes this even more confusing ;)

I'll see about adding a note to the docs!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!