Host names and Cluster

justanton

New Member
Feb 25, 2022
14
1
1
32
Hey all, I deployed two servers in a cluster previously and how finished deploying 3rd for HA and having some issues with getting replication working on all three. it seems to be ssh key and hosts related, then dug deeper and likely to do with my host names being wrong.

Currently 3 servers are:

1.

x.x.x.x bob bob

2.

x.x.x.x bob-midi bob-midi

3.

x.x.x.x bob-mini bob-mini


Can someone give me advice on what they should be and some tips on how to fix it in an active cluster with bunch of VM's.
 
Please provide the output of pveversion -v.

What are the exact errors you're getting?

Please also provide your /etc/hosts file for all 3 nodes.
 
Please provide the output of pveversion -v.

What are the exact errors you're getting?

Please also provide your /etc/hosts file for all 3 nodes.
Code:
~# pveversion -v
proxmox-ve: 7.1-2 (running kernel: 5.13.19-6-pve)
pve-manager: 7.1-12 (running version: 7.1-12/b3c09de3)
pve-kernel-helper: 7.2-2
pve-kernel-5.13: 7.1-9
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-5
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-5-pve: 5.13.19-13
pve-kernel-5.13.19-4-pve: 5.13.19-9
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
pve-kernel-5.11.22-3-pve: 5.11.22-7
pve-kernel-5.11.22-2-pve: 5.11.22-4
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-7
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-5
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.1.6-1
proxmox-backup-file-restore: 2.1.6-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-10
pve-cluster: 7.1-3
pve-container: 4.1-5
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-1
pve-ha-manager: 3.3-4
pve-i18n: 2.6-3
pve-qemu-kvm: 6.2.0-5
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-5
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
Please provide the output of pveversion -v.

What are the exact errors you're getting?

Please also provide your /etc/hosts file for all 3 nodes.
Code:
Host 1

127.0.0.1 localhost.localdomain localhost
10.0.X.X bob bob

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Host 2

127.0.0.1 localhost.localdomain localhost
10.0.x.x bob-midi bob-midi

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Host 3

127.0.0.1 localhost.localdomain localhost
10.0.x.x bob-mini bob-mini

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

errors:
2022-05-03 18:16:01 102-0: start replication job
2022-05-03 18:16:01 102-0: guest => VM 102, running => 13474
2022-05-03 18:16:01 102-0: volumes => bob-Local-VMS:vm-102-disk-0
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) Someone could be eavesdropping on you right now (man-in-the-middle attack)!
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) It is also possible that a host key has just been changed.
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) The fingerprint for the RSA key sent by the remote host is
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) *****.
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) Please contact your system administrator.
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) Add correct host key in /root/.ssh/known_hosts to get rid of this message.
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) Offending RSA key in /etc/ssh/ssh_known_hosts:10
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) remove with:
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "bob-midi"
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) RSA host key for bob-midi has changed and you have requested strict checking.
2022-05-03 18:16:01 102-0: (remote_prepare_local_job) Host key verification failed.


When running

ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "bob-midi"

next error is:

2022-05-03 18:47:01 102-0: start replication job
2022-05-03 18:47:01 102-0: guest => VM 102, running => 13474
2022-05-03 18:47:01 102-0: volumes => bob-Local-VMS:vm-102-disk-0
2022-05-03 18:47:01 102-0: (remote_prepare_local_job) Host key verification faile


what is weird - host 2 to 3 does not work but 3 to 2 works if you do the refreshing of the keys and also do pvecm updatecerts
 
Last edited:
Is something running that changes your host keys?

Did you try pvecm updatecerts --force (with the --force parameter) as well?
This should clean up the known_hosts file and add the current node keys to it.
 
Is something running that changes your host keys?

Did you try pvecm updatecerts --force (with the --force parameter) as well?
This should clean up the known_hosts file and add the current node keys to it.
not that I know. just using those commands on each node.

do I run this on all 3 nodes?
 
Is something running that changes your host keys?

Did you try pvecm updatecerts --force (with the --force parameter) as well?
This should clean up the known_hosts file and add the current node keys to it.
hey, sorry to trouble you on this. do you know if there is anything else I can try?
 
Please provide the complete /etc/hosts file without masking any private IPs (there's absolutely no reason to).

Usually a pvecm updatecerts --force should fix these issues.
If it doesn't work, there might be something else wrong.
Please also provide the output of ls -l /root/.ssh

One additional thing, you don't need the hostname twice in the /etc/hosts file.
From the manpage (man hosts):
Code:
IP_address canonical_hostname [aliases...]
Remove it if you don't plan on adding an FQDN or a different alias.
 
Last edited:
Please provide the complete /etc/hosts file without masking any private IPs (there's absolutely no reason to).

Usually a pvecm updatecerts --force should fix these issues.
If it doesn't work, there might be something else wrong.
Please also provide the output of ls -l /root/.ssh

One additional thing, you don't need the hostname twice in the /etc/hosts file.
From the manpage (man hosts):
Code:
IP_address canonical_hostname [aliases...]
Remove it if you don't plan on adding an FQDN or a different alias.

np, hosts:

1.

127.0.0.1 localhost.localdomain localhost
10.0.10.12 skynet skynet

# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

2.

127.0.0.1 localhost.localdomain localhost
10.0.10.13 skynet-midi skynet-midi

# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts


3.


127.0.0.1 localhost.localdomain localhost
10.0.10.17 skynet-mini skynet-mini

# The following lines are desirable for IPv6 capable hosts

::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts


total 10
lrwxrwxrwx 1 root root 29 Jul 24 2021 authorized_keys -> /etc/pve/priv/authorized_keys
-rw-r----- 1 root root 117 Jul 24 2021 config
-rw------- 1 root root 1811 Jul 24 2021 id_rsa
-rw-r--r-- 1 root root 390 Jul 24 2021 id_rsa.pub

hope this helps

I removed the second host name - still same issue.
 
Did you run pvecm updatecerts --force afterwards?
Please also provide the output of ls -l /etc/ssh/.
 
Did you run pvecm updatecerts --force afterwards?
Please also provide the output of ls -l /etc/ssh/.
I did, no luck.


output:

root@skynet:~# ls -l /etc/ssh/
total 83
-rw-r--r-- 1 root root 577771 Mar 13 2021 moduli
-rw-r--r-- 1 root root 1650 Mar 13 2021 ssh_config
drwxr-xr-x 2 root root 2 Mar 13 2021 ssh_config.d
-rw-r--r-- 1 root root 3235 Jul 24 2021 sshd_config
drwxr-xr-x 2 root root 2 Mar 13 2021 sshd_config.d
-rw-r--r-- 1 root root 3274 Jul 24 2021 sshd_config.ucf-dist
-rw------- 1 root root 505 Jul 24 2021 ssh_host_ecdsa_key
-rw-r--r-- 1 root root 170 Jul 24 2021 ssh_host_ecdsa_key.pub
-rw------- 1 root root 399 Jul 24 2021 ssh_host_ed25519_key
-rw-r--r-- 1 root root 90 Jul 24 2021 ssh_host_ed25519_key.pub
-rw------- 1 root root 1811 Jul 24 2021 ssh_host_rsa_key
-rw-r--r-- 1 root root 390 Jul 24 2021 ssh_host_rsa_key.pub
lrwxrwxrwx 1 root root 25 May 6 20:08 ssh_known_hosts -> /etc/pve/priv/known_hosts
lrwxrwxrwx 1 root root 25 May 6 19:44 ssh_known_hosts.old -> /etc/pve/priv/known_hosts


What is weird - it half works.

Right now host 1 can replicates up to host 3 only

host 2 replicates up to host 1 and 3

host 3 replicates up to 1 only
 
Check the entries for host 2 in `/etc/pve/priv/known_hosts`. Does it match its SSH keys?
Please provide that file here. Feel free to mask any entries not matching your 3 hosts.
 
Check the entries for host 2 in `/etc/pve/priv/known_hosts`. Does it match its SSH keys?
Please provide that file here. Feel free to mask any entries not matching your 3 hosts.
I just went in and cleared
/etc/pve/priv/known_hosts
Then ran :
pvecm updatecerts --force on all 3 hosts. then triggered replication everywhere and all completed. I think that just fixed the problems. Will monitor and revert back if any other problems come up
 
  • Like
Reactions: Corwin

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!