[SOLVED] Host verification errors between nodes on cluster

CycloneB

Member
Jan 26, 2020
13
2
23
42
I need some help on how to recover and clean up my cluster. Recently I added another node to my cluster, giving me three nodes. As part of this, I removed my Raspberry pi as a qdevice.

While I can issue some commands to remove entries from /root/.ssh/known_hosts, I still end up with problems. So far, I have done multiple variations of the following:

  • ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "192.168.111.140" (plus .141 and .142) on each of my three nodes
  • pvecm updatecerts -F on each of my three nodes
  • systemctl restart pvedaemon pveproxy
  • rm /root/.ssh/known_hosts
If I do a combination of these, I can get to the shell for each node from any of the nodes. However, I still end up with host verification issues if I try to migrate CTs and VMs between the nodes.

I'm at a loss on how to get things cleaned up and corrected. If folks could help me not only with the commands to run, but the order to run them and from which node, I would much appreciate it.

Code:
proxmox-ve: 7.4-1 (running kernel: 6.2.11-2-pve)
pve-manager: 7.4-13 (running version: 7.4-13/46c37d9c)
pve-kernel-6.2: 7.4-3
pve-kernel-5.15: 7.4-3
pve-kernel-6.2.11-2-pve: 6.2.11-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.2
pve-cluster: 7.3-3
pve-container: 4.4-4
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-13 (running version: 7.4-13/46c37d9c)
pve-kernel-5.15: 7.4-3
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.4.162-1-pve: 5.4.162-2
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-4
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-13 (running version: 7.4-13/46c37d9c)
pve-kernel-5.15: 7.4-3
pve-kernel-5.11: 7.0-10
pve-kernel-5.0: 6.0-11
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.107-1-pve: 5.15.107-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.0.21-5-pve: 5.0.21-10
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-4
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
 
My current status:

From web-ui on Node 141:
  • can get to shell on Nodes 140 , 141 , and 142
  • CT from 140 -> 141 successful
  • CT from 140 -> 142 successful
  • CT from 141 -> 142 successful
  • same CT originally on 140 from 141 -> 140 fails with "Host key verification failed."
  • same CT originally on 140 from 142 -> 140 fails with "@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @"
From web-ui on Node 142:
  • can get to shell on Nodes 140 , 141 , and 142
  • same CT originally on 140 from 141 -> 140 fails with "Host key verification failed."
  • same CT originally on 140 from 142 -> 140 fails with "@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @"
From web-ui on Node 140:
  • can get to shell on Nodes 140 , 141 , and 142
  • same CT originally on 140 from 141 -> 140 fails with "Host key verification failed."
  • same CT originally on 140 from 142 -> 140 fails with "@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @"
 
Few things to keep in mind:
In a cluster you have shared configuration filesystem mounted on /etc/pve.
Some files located outside of /etc/pve are still pointed to it, ie:
Code:
ls -al /root/.ssh/authorized_keys
lrwxrwxrwx 1 root root 29 Aug 19  2021 /root/.ssh/authorized_keys -> /etc/pve/priv/authorized_keys
and:
Code:
ls -al /etc/ssh/ssh_known_hosts
lrwxrwxrwx 1 root root 25 Jun 21 22:26 /etc/ssh/ssh_known_hosts -> /etc/pve/priv/known_hosts

Which means a change on one host will immediately be reflected on all hosts. The point - careful about direct modifications, make sure you consider concurrent access.

Additionally, the "force" option for "pvecm updatecerts" is "-f" not "-F". This may not make a difference but worth pointing out.

Finally, have you examined the task and "journalctl" after failure? The failing command may be listed there and you can try to execute it manually, including all options. That might give you more debug information.

Beyond that, make sure your cluster is operating properly, /ec/pve is mounted and is read/write on all nodes. Try to re-run the "pvecm updatecerts -f", single node should be sufficient.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: CycloneB
Hmmm. Interestingly enough, ls -al /etc/ssh/ssh_known_hosts does not return a symlink for me.

Node 140:
Code:
ls -al /etc/ssh/ssh_known_hosts
-rw------- 1 root root 3532 Jun 21 22:52 /etc/ssh/ssh_known_hosts

Node 141:
Code:
ls -al /etc/ssh/ssh_known_hosts
-rw------- 1 root root 3188 Jun 21 18:26 /etc/ssh/ssh_known_hosts

Node 142:
Code:
ls -al /etc/ssh/ssh_known_hosts
-rw------- 1 root root 3752 Jun 21 18:24 /etc/ssh/ssh_known_hosts

However, authorized keys as you first showed as an example is. I copied /etc/pve/priv/known_hosts to .old and made an edit to the file. On all three nodes, /etc/ssh/ssh_known_hosts is now a symlink as you outlined.

Tried the migration and it didn't work, but then redid pvecm updatecerts -f on a single node. This time the migration worked. I'll play around and make sure things are all moving between the nodes as expected.
 
Last edited:
After you run
Code:
pvecm updatecerts -f
systemctl restart pvedaemon pveproxy
on each node

ssh from each host to the others using the hostname to verify.
 
  • Like
Reactions: CycloneB
I moved a CT back and forth between the various nodes with success. Manually doing ssh from each node to the other nodes worked a-ok. I think this is now resolved for me. Thank you both!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!