[TUTORIAL] SSH Host Key Certificates - How to bypass SSH known_hosts bug(s)

pve1:

02:09 PM [pve]~ root # journalctl -u ssh --since "2024-03-03 13:50"
Mar 03 13:50:01 pve.ifire.net sshd[819164]: Accepted publickey for root from 192.168.1.252 port 39698 ssh2: RSA SHA256:>
Mar 03 13:50:01 pve.ifire.net sshd[819164]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:01 pve.ifire.net sshd[819164]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 13:50:02 pve.ifire.net sshd[819164]: Received disconnect from 192.168.1.252 port 39698:11: disconnected by user
Mar 03 13:50:02 pve.ifire.net sshd[819164]: Disconnected from user root 192.168.1.252 port 39698
Mar 03 13:50:02 pve.ifire.net sshd[819164]: pam_unix(sshd:session): session closed for user root
Mar 03 13:50:02 pve.ifire.net sshd[819176]: Accepted publickey for root from 192.168.1.252 port 39708 ssh2: RSA SHA256:>
Mar 03 13:50:02 pve.ifire.net sshd[819176]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:02 pve.ifire.net sshd[819176]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 13:50:02 pve.ifire.net sshd[819176]: Received disconnect from 192.168.1.252 port 39708:11: disconnected by user
Mar 03 13:50:02 pve.ifire.net sshd[819176]: Disconnected from user root 192.168.1.252 port 39708
Mar 03 13:50:02 pve.ifire.net sshd[819176]: pam_unix(sshd:session): session closed for user root
Mar 03 13:50:02 pve.ifire.net sshd[819188]: Connection closed by 192.168.1.251 port 60410 [preauth]
Mar 03 13:50:02 pve.ifire.net sshd[819190]: Accepted publickey for root from 192.168.1.252 port 39718 ssh2: RSA SHA256:>
Mar 03 13:50:02 pve.ifire.net sshd[819190]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:02 pve.ifire.net sshd[819195]: Connection closed by 192.168.1.251 port 60412 [preauth]
Mar 03 13:50:02 pve.ifire.net sshd[819190]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 13:50:03 pve.ifire.net sshd[819190]: Received disconnect from 192.168.1.252 port 39718:11: disconnected by user
Mar 03 13:50:03 pve.ifire.net sshd[819190]: Disconnected from user root 192.168.1.252 port 39718
Mar 03 13:50:03 pve.ifire.net sshd[819190]: pam_unix(sshd:session): session closed for user root
Mar 03 13:50:03 pve.ifire.net sshd[819216]: Accepted publickey for root from 192.168.1.252 port 39730 ssh2: RSA SHA256:>
Mar 03 13:50:03 pve.ifire.net sshd[819216]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:03 pve.ifire.net sshd[819216]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 13:50:03 pve.ifire.net sshd[819216]: Received disconnect from 192.168.1.252 port 39730:11: disconnected by user
Mar 03 13:50:03 pve.ifire.net sshd[819216]: Disconnected from user root 192.168.1.252 port 39730
Mar 03 13:50:03 pve.ifire.net sshd[819216]: pam_unix(sshd:session): session closed for user root
Mar 03 13:50:04 pve.ifire.net sshd[819229]: Accepted publickey for root from 192.168.1.252 port 39732 ssh2: RSA SHA256:>
Mar 03 13:50:04 pve.ifire.net sshd[819229]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:04 pve.ifire.net sshd[819229]: pam_env(sshd:session): deprecated reading of user environment enabled
02:26 PM [pve]~ root #

pve2:

02:17 PM [pve2]~ root # journalctl -u ssh --since "2024-03-03 13:50"
-- No entries --
02:26 PM [pve2]~ root #
 
pve1:

02:09 PM [pve]~ root # journalctl -u ssh --since "2024-03-03 13:50"
Mar 03 13:50:01 pve.ifire.net sshd[819164]: Accepted publickey for root from 192.168.1.252 port 39698 ssh2: RSA SHA256:>
Mar 03 13:50:01 pve.ifire.net sshd[819164]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:01 pve.ifire.net sshd[819164]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 13:50:02 pve.ifire.net sshd[819164]: Received disconnect from 192.168.1.252 port 39698:11: disconnected by user
Mar 03 13:50:02 pve.ifire.net sshd[819164]: Disconnected from user root 192.168.1.252 port 39698
Mar 03 13:50:02 pve.ifire.net sshd[819164]: pam_unix(sshd:session): session closed for user root
Mar 03 13:50:02 pve.ifire.net sshd[819176]: Accepted publickey for root from 192.168.1.252 port 39708 ssh2: RSA SHA256:>
Mar 03 13:50:02 pve.ifire.net sshd[819176]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:02 pve.ifire.net sshd[819176]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 13:50:02 pve.ifire.net sshd[819176]: Received disconnect from 192.168.1.252 port 39708:11: disconnected by user
Mar 03 13:50:02 pve.ifire.net sshd[819176]: Disconnected from user root 192.168.1.252 port 39708
Mar 03 13:50:02 pve.ifire.net sshd[819176]: pam_unix(sshd:session): session closed for user root
Mar 03 13:50:02 pve.ifire.net sshd[819188]: Connection closed by 192.168.1.251 port 60410 [preauth]
Mar 03 13:50:02 pve.ifire.net sshd[819190]: Accepted publickey for root from 192.168.1.252 port 39718 ssh2: RSA SHA256:>
Mar 03 13:50:02 pve.ifire.net sshd[819190]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:02 pve.ifire.net sshd[819195]: Connection closed by 192.168.1.251 port 60412 [preauth]
Mar 03 13:50:02 pve.ifire.net sshd[819190]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 13:50:03 pve.ifire.net sshd[819190]: Received disconnect from 192.168.1.252 port 39718:11: disconnected by user
Mar 03 13:50:03 pve.ifire.net sshd[819190]: Disconnected from user root 192.168.1.252 port 39718
Mar 03 13:50:03 pve.ifire.net sshd[819190]: pam_unix(sshd:session): session closed for user root
Mar 03 13:50:03 pve.ifire.net sshd[819216]: Accepted publickey for root from 192.168.1.252 port 39730 ssh2: RSA SHA256:>
Mar 03 13:50:03 pve.ifire.net sshd[819216]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:03 pve.ifire.net sshd[819216]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 13:50:03 pve.ifire.net sshd[819216]: Received disconnect from 192.168.1.252 port 39730:11: disconnected by user
Mar 03 13:50:03 pve.ifire.net sshd[819216]: Disconnected from user root 192.168.1.252 port 39730
Mar 03 13:50:03 pve.ifire.net sshd[819216]: pam_unix(sshd:session): session closed for user root
Mar 03 13:50:04 pve.ifire.net sshd[819229]: Accepted publickey for root from 192.168.1.252 port 39732 ssh2: RSA SHA256:>
Mar 03 13:50:04 pve.ifire.net sshd[819229]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 13:50:04 pve.ifire.net sshd[819229]: pam_env(sshd:session): deprecated reading of user environment enabled
02:26 PM [pve]~ root #

pve2:

02:17 PM [pve2]~ root # journalctl -u ssh --since "2024-03-03 13:50"
-- No entries --
02:26 PM [pve2]~ root #

Something is very wrong here, as if there was no connection whatsover going to pve2 at all. Just in case, can you check if your times are synced on both machines? I put in the --since "2024-03-03 13:50" to see whatever happened at 13:51 which was tme showing on your pve node as when you launched to command.

Can you also check pve# traceroute 192.168.1.252 ?
 
Times are synced. Cluster replication broke this morning sometime after I added the qdevice for node 3.

The 192.168.x.x is on its own interface and is a private connection between pve and pve2 and the qdevice, its own vlan or whatever at the datacenter.

02:40 PM [pve]~ root # traceroute 192.168.1.252
traceroute to 192.168.1.252 (192.168.1.252), 30 hops max, 60 byte packets
1 pve2 (192.168.1.252) 0.361 ms 0.578 ms 0.560 ms
02:40 PM [pve]~ root #

--

02:41 PM [pve2]~ root # traceroute 192.168.1.251
traceroute to 192.168.1.251 (192.168.1.251), 30 hops max, 60 byte packets
1 pve (192.168.1.251) 0.636 ms 0.603 ms 0.589 ms
02:41 PM [pve2]~ root #
 
Let me also add I can ssh from pve to pve2 without error. It's only the replication job that seems to be complaining. pvecm says we have quorum.

Will also add that I use tailscale. But that pre-dated things breaking. The fault started today after adding qdevice on third node (pbs install w/qdevice)
 
Last edited:
Can you do the following:

1. Close all web GUI access, this will get us rid of the annoying "Accepted publickey for root" ... spam.
2. Connect in separate sessions from a neutral machine to both pve and pve2 by regular ssh connection.
3. On pve2, let this run: journalctl -efu ssh
4. Simultaneously on pve, run as before: /usr/bin/ssh -vvv -o 'HostKeyAlias=pve2' root@192.168.1.252 -- /bin/true

5. What will show up on the journal of pve2 after you run that on pve?

6. How is exactly did you add the qdevice and where is it?
7. What did you mean by "node 3"?
 
1. done
2. done
3-5 below
6. qdevice is 'pbs-ifire', its local on 192.168.1.253. it is working. it is a PBS host.
7. node 3 I should have said more accurately "qdevice"

3, my workstation is the 181 ip. I aborted after running the command on pve which failed:
Mar 03 14:56:07 pve2.ifire.net sshd[2301368]: Accepted publickey for root from 181.199.63.171 port 8498 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 14:56:07 pve2.ifire.net sshd[2301368]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 14:56:07 pve2.ifire.net sshd[2301368]: pam_env(sshd:session): deprecated reading of user environment enabled
^C
02:58 PM [pve2]~ root #

4.

02:56 PM [pve]~ root # /usr/bin/ssh -vvv -o 'HostKeyAlias=pve2' root@192.168.1.252 -- /bin/true
OpenSSH_9.2p1 Debian-2+deb12u2, OpenSSL 3.0.11 19 Sep 2023
debug1: Reading configuration data /root/.ssh/config
debug1: /root/.ssh/config line 2: Applying options for *
debug1: /root/.ssh/config line 12: Applying options for 192.168.1.252
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug2: resolve_canonicalize: hostname 192.168.1.251 is address
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts' -> '/root/.ssh/known_hosts'
debug3: expanded UserKnownHostsFile '~/.ssh/known_hosts2' -> '/root/.ssh/known_hosts2'
debug3: ssh_connect_direct: entering
debug1: Connecting to 192.168.1.251 [192.168.1.251] port 212.
debug3: set_sock_tos: set socket 3 IP_TOS 0x10
debug1: Connection established.
debug1: identity file /root/.ssh/id_rsa type 0
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa_sk type -1
debug1: identity file /root/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: identity file /root/.ssh/id_ed25519_sk type -1
debug1: identity file /root/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /root/.ssh/id_xmss type -1
debug1: identity file /root/.ssh/id_xmss-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.2p1 Debian-2+deb12u2
debug1: Remote protocol version 2.0, remote software version OpenSSH_9.2p1 Debian-2+deb12u2
debug1: compat_banner: match: OpenSSH_9.2p1 Debian-2+deb12u2 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug1: Authenticating to 192.168.1.251:212 as 'root'
debug1: using hostkeyalias: pve2
debug1: load_hostkeys: fopen /root/.ssh/known_hosts2: No such file or directory
debug3: record_hostkey: found key type RSA in file /etc/ssh/ssh_known_hosts:5
debug3: record_hostkey: found ca key type RSA in file /etc/ssh/ssh_known_hosts:17
debug3: load_hostkeys_file: loaded 2 keys from pve2
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug3: order_hostkeyalgs: prefer hostkeyalgs: ssh-ed25519-cert-v01@openssh.com,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,sk-ssh-ed25519-cert-v01@openssh.com,sk-ecdsa-sha2-nistp256-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,rsa-sha2-512,rsa-sha2-256
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug3: receive packet: type 20
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: sntrup761x25519-sha512@openssh.com,curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,ext-info-c,kex-strict-c-v00@openssh.com
debug2: host key algorithms: ssh-ed25519-cert-v01@openssh.com,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,sk-ssh-ed25519-cert-v01@openssh.com,sk-ecdsa-sha2-nistp256-cert-v01@openssh.com,rsa-sha2-512-cert-v01@openssh.com,rsa-sha2-256-cert-v01@openssh.com,rsa-sha2-512,rsa-sha2-256,ssh-ed25519,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ssh-ed25519@openssh.com,sk-ecdsa-sha2-nistp256@openssh.com
debug2: ciphers ctos: aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com
debug2: ciphers stoc: aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com,zlib
debug2: compression stoc: none,zlib@openssh.com,zlib
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: sntrup761x25519-sha512@openssh.com,curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,kex-strict-s-v00@openssh.com
debug2: host key algorithms: rsa-sha2-512,rsa-sha2-256,ecdsa-sha2-nistp256,ssh-ed25519,ssh-ed25519-cert-v01@openssh.com
debug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: ciphers stoc: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com
debug2: MACs ctos: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,zlib@openssh.com
debug2: compression stoc: none,zlib@openssh.com
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug3: kex_choose_conf: will use strict KEX ordering
debug1: kex: algorithm: sntrup761x25519-sha512@openssh.com
debug1: kex: host key algorithm: ssh-ed25519-cert-v01@openssh.com
debug1: kex: server->client cipher: aes128-ctr MAC: umac-64-etm@openssh.com compression: none
debug1: kex: client->server cipher: aes128-ctr MAC: umac-64-etm@openssh.com compression: none
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug3: receive packet: type 31
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host certificate: ssh-ed25519-cert-v01@openssh.com SHA256:BcZpML7q+U6nUn4YoHj9Qr+uzqT1ZAnFfFxBSaOmj+s, serial 0 ID "pve.ifire.net" CA ssh-rsa SHA256:YlbjnKqsNV4sNWRmiS5DQ+p3ZpnFw9RBrXVQXl8CM/c valid forever
debug2: Server host certificate hostname: pve
debug2: Server host certificate hostname: pve
debug2: Server host certificate hostname: 216.18.207.194
debug2: Server host certificate hostname: 192.168.1.251
debug2: Server host certificate hostname: 100.109.207.73
debug3: put_host_port: [192.168.1.251]:212
debug1: using hostkeyalias: pve2
debug1: load_hostkeys: fopen /root/.ssh/known_hosts2: No such file or directory
debug3: record_hostkey: found key type RSA in file /etc/ssh/ssh_known_hosts:5
debug3: record_hostkey: found ca key type RSA in file /etc/ssh/ssh_known_hosts:17
debug3: load_hostkeys_file: loaded 2 keys from pve2
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: Host 'pve2' is known and matches the ED25519-CERT host certificate.
debug1: Found CA key in /etc/ssh/ssh_known_hosts:17
Certificate invalid: name is not a listed principal
debug1: No matching CA found. Retry with plain key
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:BcZpML7q+U6nUn4YoHj9Qr+uzqT1ZAnFfFxBSaOmj+s.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending RSA key in /etc/ssh/ssh_known_hosts:5
remove with:
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "pve2"
Host key for pve2 has changed and you have requested strict checking.
Host key verification failed.
02:57 PM [pve]~ root #

5. nothing it seems?
 
1. done
2. done
3-5 below
6. qdevice is 'pbs-ifire', its local on 192.168.1.253. it is working. it is a PBS host.
7. node 3 I should have said more accurately "qdevice"

No worries, all clear.

3, my workstation is the 181 ip. I aborted after running the command on pve which failed:
Mar 03 14:56:07 pve2.ifire.net sshd[2301368]: Accepted publickey for root from 181.199.63.171 port 8498 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 14:56:07 pve2.ifire.net sshd[2301368]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 14:56:07 pve2.ifire.net sshd[2301368]: pam_env(sshd:session): deprecated reading of user environment enabled
^C
02:58 PM [pve2]~ root #

This is like some Sherlock Holmes episode. You are showing ouput on pve2 that there was no trace of pve trying to connect to it with the below command. But instead you are seeing your very own machine connect to pve2, but that makes no sense because you run this command already connected to pve2 and you are not connecting from your machine the second time. You are however connecting to pve from your machine.

Did you use password or public key login from your workstation?

EDIT: Sorry, i mixed it up.

I have a silly proposal. Close existing sessions, open 2 separate SSH sessions from your machine both to pve2 pve.

Run the journal command in one session. And run the ssh -vvv command in another session, the command that is meant to connect to pve pve2.

Will you see it hit your journal on self?
 
Last edited:
I use public key, never password.

I will do the test:
a. close connections
b. open workstation > pve2
c. run on pve2 > journalctl -efu ssh
d. open workstation tab 2 > pve2
e.

Mar 03 15:15:36 pve2.ifire.net sshd[2330521]: Accepted publickey for root from 181.199.63.171 port 9594 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 15:15:36 pve2.ifire.net sshd[2330521]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 15:15:36 pve2.ifire.net sshd[2330521]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 15:15:48 pve2.ifire.net sshd[2330898]: Accepted publickey for root from 181.199.63.171 port 8476 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 15:15:48 pve2.ifire.net sshd[2330898]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 15:15:48 pve2.ifire.net sshd[2330898]: pam_env(sshd:session): deprecated reading of user environment enabled
^C
03:16 PM [pve2]~ root #
 
ok, doing same but ssh target pve:

Mar 03 15:18:25 pve.ifire.net sshd[912557]: Accepted publickey for root from 181.199.63.171 port 8488 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 15:18:25 pve.ifire.net sshd[912557]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 15:18:26 pve.ifire.net sshd[912557]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 15:18:37 pve.ifire.net sshd[912866]: Accepted publickey for root from 181.199.63.171 port 8469 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 15:18:37 pve.ifire.net sshd[912866]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 15:18:37 pve.ifire.net sshd[912866]: pam_env(sshd:session): deprecated reading of user environment enabled
^C
03:18 PM [pve]~ root #
 
ok, doing same but ssh target pve:

Mar 03 15:18:25 pve.ifire.net sshd[912557]: Accepted publickey for root from 181.199.63.171 port 8488 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 15:18:25 pve.ifire.net sshd[912557]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 15:18:26 pve.ifire.net sshd[912557]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 15:18:37 pve.ifire.net sshd[912866]: Accepted publickey for root from 181.199.63.171 port 8469 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 15:18:37 pve.ifire.net sshd[912866]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 15:18:37 pve.ifire.net sshd[912866]: pam_env(sshd:session): deprecated reading of user environment enabled
^C
03:18 PM [pve]~ root #

And this was when you ran the
 
I think it's confusing because what I am getting at is I want to know if it's hitting wrong machine.

So I would prefer you to:

FIRST open two sessions (to pve)
THEN ONLY run journal in one session
THEN ONLY run ssh -vvv (with pve2 alias) in the other session

and watch if you see anything in the journal ouput - you should see nothing, you are connecting away.
 
Last edited:
I think it's confusing because what I am getting at is I want to know if it's hitting wrong machine.

So I would prefer you to:

FIRST open two sessions (to pve)
THEN ONLY run journal in one session
THEN ONLY run ssh -vvv (with pve2 alias)

and watch if you see anything in the journal ouput - you should see nothing, you are connecting away.

Ok. give me a min.
 
a. workstation > pve
b. session (a) pve # journalctl -efu ssh
c. workstation tab 2 > pve
d. session (c) pve # ssh -vvv pve2

Mar 03 15:26:16 pve.ifire.net sshd[921248]: Accepted publickey for root from 181.199.63.171 port 59206 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 15:26:16 pve.ifire.net sshd[921248]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 15:26:16 pve.ifire.net sshd[921248]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 15:26:32 pve.ifire.net sshd[921693]: Accepted publickey for root from 181.199.63.171 port 8535 ssh2: RSA SHA256:ZRgHJSjs9COoTp8AP1SSCua5bjeULMFfDYvdi0+0OUo
Mar 03 15:26:32 pve.ifire.net sshd[921693]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 15:26:32 pve.ifire.net sshd[921693]: pam_env(sshd:session): deprecated reading of user environment enabled
Mar 03 15:26:41 pve.ifire.net sshd[921982]: Accepted publickey for root from 192.168.1.251 port 59658 ssh2: RSA SHA256:y91Elr0gUxv5NAuwZ4znmYf/pRpD9n0lrGSONzX33i4
Mar 03 15:26:41 pve.ifire.net sshd[921982]: pam_unix(sshd:session): session opened for user root(uid=0) by (uid=0)
Mar 03 15:26:41 pve.ifire.net sshd[921982]: pam_env(sshd:session): deprecated reading of user environment enabled
^C
03:26 PM [pve]~ root #

then simultaneously other session:

https://pastebin.com/hfcWarBA

had to pastebin due to size limit here on forum.
--

something wrong here. I asked to connect to pve2, but I got connected to pve. that's new, worked before the problems started.
 
Last edited:
Yeah that was my hunch.


So here is the issue... it's not about if you "can connect by SSH", the certs actually work as intented, they help you avoid connecting to the wrong machine even if came by e.g. wrong IP address. :)

Can you open separate thread, link it from there and we can check it there?
 
As of PVE 8.1, there's still a bug where running pvecm updatecerts deletes all but the oldest (instead of newest) SSH keys from the shared cluster-wide known_hosts file which then causes issues manifesting themselves through WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! and Offending RSA key in /etc/ssh/ssh_known_hosts:$lineno and remove with: ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "$alias", which then breaks the symlink into pmxcfs and makes one dig even deeper into the troubleshooting rabbit hole.

This is a simple streamlined process to either prevent this issue or get out of it without having to do potentially risky Perl file patching or deleting keys one might have wished to otherwise retain. It bypasses the known_hosts corruption issue by using SSH certificates for the purpose of remote host authentication, it does NOT change the behaviour in relation to the user authorisation (related to the authorized_keys file).

Assuming the cluster is otherwise healthy and has quorum and no connectivity issues, except for disrupted SSH connections, e.g. proxying console/shell, secure local-storage migration and replication, but also QDevice setup. See also [1].

There's an existing Certification Authority (CA) used in PVE - see also [2], currently only for SSL connections, but as SSH certificates are nothing more than CA-signed SSH keys with associated IDs (principals), it is easiest to reuse the said CA (see note (i)):

In any single node's root shell perform once (the location is shared for all nodes in the cluster):

Code:
# openssl x509 -in /etc/pve/pve-root-ca.pem -inform pem -pubkey -noout | ssh-keygen -f /dev/stdin -i -m PKCS8 > /etc/pve/pve-root-ca.pub
# echo "@cert-authority * `cat /etc/pve/pve-root-ca.pub`" >> /etc/ssh/ssh_known_hosts

This converts the CA certificate to a format needed for SSH and adds any current or future SSH key signed by the CA as recognized by any node of the cluster as valid, even in case of other conflicting entries present.

On each individual node (you may want to automate this in case of large cluster), the respective host key then needs to be signed and set for the node:

Code:
# ssh-keygen -I `hostname` -s /etc/pve/priv/pve-root-ca.key -h -n `(hostname -s; hostname -f; hostname -I | xargs -n1) | paste -sd,` /etc/ssh/ssh_host_ed25519_key.pub
# echo "HostCertificate /etc/ssh/ssh_host_ed25519_key-cert.pub" >> /etc/ssh/sshd_config.d/PVEHostCertificate.conf

This makes use of Ed25519 keys, it did however use the RSA (albeit 4096bit) CA's key to sign them. If you have any specific reason, you may of course opt for any other SSH keys in /etc/ssh/ to be used here, not necessarily Ed25519. See also note (ii).

Note: The sshd service needs to be restarted for the changes to take effect.

And that's it! From now on, your nodes will be always able to SSH connect to each other. The only annoyance being, all future nodes need to have the two liner executed once. Again, this would be best automated as it does not interfere with the rest of PVE's internals. There's no caveats to this, however, if you do not sign your future nodes' keys and PVE manages to find individual recognised key on record, it will work still. But if you encounter the bug in pvecm updatecerts it will not disrupt connections to those nodes which had signed host keys as the buggy tool safely ignores @cert-authority entries in the known_hosts file.

One final note on how PVE makes use of HostKeyAlias option for SSH connections. This option is always used for e.g. migrations/replications and will make use of specific ID from the known_hosts file irrespective of the hostname or IP address of the node being connected to. If your IDs (principals) listed in the signed keys (see note (ii) include this alias, it will keep working as expected, i.e. it will even work if this is your x-th time introducing a cluster node by the same name (as some dead nodes used to have) as long as its host key is signed. The leftover keys on record are safely ignored then, as they should have been to begin with.

If you end up with multiple records present with the same name that is also the ID listed in the key signed by the CA, the signed key will take precedence as can be checked:
Code:
# ssh -vvv -o HostKeyAlias=$alias $ipaddress
...
debug1: Found CA key in /etc/ssh/ssh_known_hosts:$lineno
debug3: check_host_key: certificate host key in use; disabling UpdateHostkeys

If you however failed to list the ID under which your node is recognised by PVE, you will have a failure (only in case it would have failed anyways due to the bug):
Code:
#ssh -vvv -o HostKeyAlias=$alias $ipaddress
...
debug1: Host '$alias' is known and matches the ED25519-CERT host certificate.
debug1: Found CA key in /etc/ssh/ssh_known_hosts:$lineno
Certificate invalid: name is not a listed principal
debug1: No matching CA found. Retry with plain key

TESTED ON: pve-manager/8.1.3/b46aac3b42da5d15 (running kernel: 6.5.11-6-pve)

Related bug reports: #4252, #4886

References:
[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_role_of_ssh_in_proxmox_ve_clusters
[2] https://pve.proxmox.com/wiki/Certificate_Management

Notes:
(i) If you wanted to know how much validity there's left for the CA, feel free to check with openssl x509 -in /etc/pve/pve-root-ca.pem -text -noout, it is 10 years as nominally generated by PVE and therefore rotation is a not in scope of this tutorial either.
(ii) If you wish to double check that all the correct IDs (principals) were included in the signed key, you can do so with ssh-keygen -L -f /etc/ssh/ssh_host_ed25519_key-cert.pub. There should be the hostname, FQDN as well as all IP addresses listed. You can, of course, change this list by editing the list within the -n option of ssh-keygen. Please also note there's absolutely no expiry defined for these keys which mimics the default behaviour of PVE regarding SSH key handling.

esi_y:

Hobbyist/Homelab user with a 3 node cluster. Recently tried to change the migration network via GUI Datacenter>Options>Migration Settings.
After the change migrations fail with the SSH/MITM errors. Immediately changed the setting back to original network and all is OK again.

Is it correct to assume that if I use your tutorial commands that I will then be able to make this same change via the GUI and then have migration success?

Thanks,
Jim
 
esi_y:

Hobbyist/Homelab user with a 3 node cluster. Recently tried to change the migration network via GUI Datacenter>Options>Migration Settings.
After the change migrations fail with the SSH/MITM errors. Immediately changed the setting back to original network and all is OK again.

Is it correct to assume that if I use your tutorial commands that I will then be able to make this same change via the GUI and then have migration success?

Thanks,
Jim

That's an interesting question. Because as far as I know the migration network will just cause it to use different IP address in the ssh command, it's all the same alias for SSH to pick a key. Unless I am missing something, changing migration network should not cause broken ssh conns in and of itself.

Can you post the migration command log when successful and when ending up in an error? It should refer to exactly the same HostKeyAlias.
 
That's an interesting question. Because as far as I know the migration network will just cause it to use different IP address in the ssh command, it's all the same alias for SSH to pick a key. Unless I am missing something, changing migration network should not cause broken ssh conns in and of itself.

Can you post the migration command log when successful and when ending up in an error? It should refer to exactly the same HostKeyAlias.
Here is the log with success:
2024-03-04 15:07:22 use dedicated network address for sending migration traffic (10.10.33.70)
2024-03-04 15:07:23 starting migration of VM 101 to node 'pve01' (10.10.33.70)
2024-03-04 15:07:23 found local disk 'local-zfs:vm-101-disk-0' (attached)
2024-03-04 15:07:23 copying local disk images
2024-03-04 15:07:23 full send of rpool/data/vm-101-disk-0@__migration__ estimated size is 28.8K
2024-03-04 15:07:23 total estimated size is 28.8K
2024-03-04 15:07:23 TIME SENT SNAPSHOT rpool/data/vm-101-disk-0@__migration__
2024-03-04 15:07:24 successfully imported 'local-zfs:vm-101-disk-0'
2024-03-04 15:07:24 volume 'local-zfs:vm-101-disk-0' is 'local-zfs:vm-101-disk-0' on the target
2024-03-04 15:07:24 migration finished successfully (duration 00:00:02)
TASK OK

And with error:
2024-03-04 15:09:52 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve03' root@10.10.10.0 /bin/true
2024-03-04 15:09:52 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2024-03-04 15:09:52 @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
2024-03-04 15:09:52 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2024-03-04 15:09:52 IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
2024-03-04 15:09:52 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
2024-03-04 15:09:52 It is also possible that a host key has just been changed.
2024-03-04 15:09:52 The fingerprint for the RSA key sent by the remote host is
2024-03-04 15:09:52 SHA256:hKjSLXqdWwDIZ/zXDcr8Z1QenvI3+wMjwSrm8iMKePk.
2024-03-04 15:09:52 Please contact your system administrator.
2024-03-04 15:09:52 Add correct host key in /root/.ssh/known_hosts to get rid of this message.
2024-03-04 15:09:52 Offending RSA key in /etc/ssh/ssh_known_hosts:6
2024-03-04 15:09:52 remove with:
2024-03-04 15:09:52 ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "pve03"
2024-03-04 15:09:52 Host key for pve03 has changed and you have requested strict checking.
2024-03-04 15:09:52 Host key verification failed.
2024-03-04 15:09:52 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

Hope this is the information you need...

Thanks again for you interest in this issue,
Jim
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!