VM Migration issue newly built cluster

ferret

Member
Dec 6, 2020
21
2
8
23
Hi,

I have built a new PVE 3 node Cluster v7.2.7 with iSCSI shared storage.

I can migrate from node 1 to node 2 no issues, although am unable to migrate from node 1 or node 2 to node 3, I am receiving the following error;

2022-07-22 11:20:22 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve3' root@172.16.60.103 /bin/true
2022-07-22 11:20:22 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2022-07-22 11:20:22 @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
2022-07-22 11:20:22 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
2022-07-22 11:20:22 IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
2022-07-22 11:20:22 Someone could be eavesdropping on you right now (man-in-the-middle attack)!
2022-07-22 11:20:22 It is also possible that a host key has just been changed.
2022-07-22 11:20:22 The fingerprint for the RSA key sent by the remote host is
2022-07-22 11:20:22 SHA256:jbeo+3cJxCdtiBcd1tUGS2q0xsiKEcm0SLQRB5AR/MY.
2022-07-22 11:20:22 Please contact your system administrator.
2022-07-22 11:20:22 Add correct host key in /root/.ssh/known_hosts to get rid of this message.
2022-07-22 11:20:22 Offending RSA key in /etc/ssh/ssh_known_hosts:1
2022-07-22 11:20:22 remove with:
2022-07-22 11:20:22 ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "pve3"
2022-07-22 11:20:22 RSA host key for pve3 has changed and you have requested strict checking.
2022-07-22 11:20:22 Host key verification failed.
2022-07-22 11:20:22 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

Any assistance would be very much appreciate, I followed the above step to rectify the issue and still no luck.

Cheers
 
Hi,

I originally had 5 nodes and only 2 were have the issue reported, so I removed those 2 nodes and now node 3 is having an issue which it wasn't previously.

The above issue is being reported on node 3, so therefore which node do I run the script master or node 3?

Cheers
 
The above issue is being reported on node 3, so therefore which node do I run the script master or node 3?
In PVE there is no master, every node is equal. The error you see is always from the source side of things. I recommend just logging into each node and try to ssh in to each other node. The nodes that don't work, do what is written in the ouput (ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R ....) until you can log in from each node to each node and the live migration should work.
 
Hi,

Many thanks for the advice.

Followed your recommendation and still no luck, the reported error is now;

2022-07-22 21:16:53 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve3' root@172.16.60.103 /bin/true
2022-07-22 21:16:53 Host key verification failed.
2022-07-22 21:16:53 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted

I am able to ssh ro node 3 without entering password. Seem to be in a never ending cycle.
 
try pvecm updatecerts --force to generate new certificates
 
Hi,

Many thanks. That sort of worked for a few minutes.

I was able to migrate vm's but now node 2 & 3 are unusable.

Now I have 2 live vm's in node 3 and can not console into nor migrate to node 1 and only node 1 is now usable.
 

Attachments

  • Screen Shot 2022-07-22 at 10.23.45 pm.png
    Screen Shot 2022-07-22 at 10.23.45 pm.png
    26.2 KB · Views: 7
  • Screen Shot 2022-07-22 at 10.23.24 pm.png
    Screen Shot 2022-07-22 at 10.23.24 pm.png
    9 KB · Views: 8
  • Screen Shot 2022-07-22 at 10.23.15 pm.png
    Screen Shot 2022-07-22 at 10.23.15 pm.png
    11.9 KB · Views: 7
  • Screen Shot 2022-07-22 at 10.35.01 pm.png
    Screen Shot 2022-07-22 at 10.35.01 pm.png
    62.4 KB · Views: 5
  • Screen Shot 2022-07-22 at 10.34.36 pm.png
    Screen Shot 2022-07-22 at 10.34.36 pm.png
    15.3 KB · Views: 5
can not console into nor migrate to node 1
So ssh fails? What's the error message?

I believe you need to execute pvecm updatecerts --force once on every node.

Is there anything unusual in the syslog?
What's the output of pvecm status?
 
2022-07-22 21:16:53 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=pve3' root@172.16.60.103 /bin/true
2022-07-22 21:16:53 Host key verification failed.
2022-07-22 21:16:53 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted
Of course you need to connect to each node with ssh without using the BackMode in order to accept the key.

try pvecm updatecerts --force to generate new certificates
Nice! I wasn't aware of this tool.
 
So ssh fails? What's the error message?

I believe you need to execute pvecm updatecerts --force once on every node.

Is there anything unusual in the syslog?
What's the output of pvecm status?
ssh works perfectly without password to each node and visa versa.

I was able to migrate vm's via cli back to node 1

I removed node 2 & 3 from cluster now nothing is working on node 1, 2 & 3. Although I can ssh to all nodes as root but web go simply doesn't work anymore...
 
Hi All,

Resolved my issue by install signed certificates and replace the self signed certs.

Many thanks for everyones input.

Cheers
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!