[SOLVED] Cannot Migrate From One Node To Another

emilhozan

Member
Aug 27, 2019
51
4
8
Hey all,

I get the following error message when trying to migrate a VM from one node to another:

2019-11-19 21:43:09 # /usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=k1n5' root@<IP> /bin/true
2019-11-19 21:43:09 key_load_public: invalid format
2019-11-19 21:43:09 Permission denied (publickey,password).
2019-11-19 21:43:09 ERROR: migration aborted (duration 00:00:00): Can't connect to destination address using public key
TASK ERROR: migration aborted


After researching this issue online, I am still unable to resolve the issues at hand.
Some details:
- I have 5x cluster, all same PVE version and was previously working.
- I USED to be able to migrate between other nodes just fine, but now I cannot migrate on any node
- To be clear, I cannot migrate any VMs now
- pve versions are the same
- I tried "pvecm updatecert" to no avail
- "ssh -o "HostKeyAlias=NODENAME" root@NODE" was a no go either


If I ssh into other nodes, some times I get prompted for passwords, other times no. But oddly I see this first:
key_load_public: invalid format
Enter passphrase for key '/root/.ssh/id_rsa':

If I hit enter, I then get prompted for the root password and can log in this way.


I don't get what's going on with the cluster. When I first set this up, I did some vigorous testing (at least I thought it was). This consisted of setting up VMs, testing migrations / ha capabilities, and didn't run into any issues. Now, however, I feel like things are going backwards.

Any help would be much appreciated.
 
Hi,
please try the following on one of the nodes from which you cannot migrate:
Code:
cd /root/.ssh
mv id_rsa id_rsa.old
mv id_rsa.pub id_rsa.pub.old
mv config config.old
pvecm updatecerts
Then try to migrate to a different host.
 
Hi,
please try the following on one of the nodes from which you cannot migrate:
Code:
cd /root/.ssh
mv id_rsa id_rsa.old
mv id_rsa.pub id_rsa.pub.old
mv config config.old
pvecm updatecerts
Then try to migrate to a different host.


@Chris
Thanks for the update. I attempted this and at first it was working, but then it stopped. I did these steps on one of the nodes, 3 others followed suit (meaning the *.old was created) but not the 4th one. I repeated these steps on the 5th as well to test it out.

I am able to migrate some VMs but not all, and I can't migrate them backwards. So I can go from node 3 to node 5, for example, but can't go back from 3 to 5. I tested a few other VMs as well and got mixed results.

Any other suggestions?
 
@Chris
Thanks for the update. I attempted this and at first it was working, but then it stopped. I did these steps on one of the nodes, 3 others followed suit (meaning the *.old was created) but not the 4th one. I repeated these steps on the 5th as well to test it out.

I am able to migrate some VMs but not all, and I can't migrate them backwards. So I can go from node 3 to node 5, for example, but can't go back from 3 to 5. I tested a few other VMs as well and got mixed results.

Any other suggestions?
Did the error message change?
Is the cluster quorate pvecm status?
What is your PVE version pveversion -v?
Please try to perform the procedure on all nodes. Any other hints in the syslog?
You can also try to ssh between the nodes with the verbose flag to get the debug output (-v or even -vvv) and see if there is more information there on what's going wrong.
 
Is the cluster quorate pvecm status?

1574458653926.png


What is your PVE version pveversion -v?

1574457135822.png


Please try to perform the procedure on all nodes.
I did do this considering the issues still being present. One thing though is that for some reason, one of the nodes is having some serious issues in hosting VMs. It's currently offline and we're troubleshooting that node - would that make this drastic of a difference?


You can also try to ssh between the nodes with the verbose flag to get the debug output (-v or even -vvv) and see if there is more information there on what's going wrong.
Attached are two "-vvv" commands, one with "no complications" and another "with complications". What I mean by "complications" in this case is that if I "ssh <node>", I get "Enter passphrase for key '/root/.ssh/id_rsa': " error. I redacted the IPs and some SHA256 keys, hopefully that doesn't mess the data up, but I don't notice any "errors" per say.


Any other hints in the syslog?
I attached the "tail -f /var/log/syslog" output when attempting to migrate to two different nodes. I can't tell what the issue is from these. Any ideas here?
 

Attachments

  • 1574457067137.png
    1574457067137.png
    86 KB · Views: 11
  • ssh_verbose_no-ssh-connection-complications.txt
    9.4 KB · Views: 9
  • ssh_verbose_with-ssh-connection-complications.txt
    10.8 KB · Views: 6
  • migration_syslog_logs_tail-f.txt
    2.2 KB · Views: 5
Attached are two "-vvv" commands, one with "no complications" and another "with complications". What I mean by "complications" in this case is that if I "ssh <node>", I get "Enter passphrase for key '/root/.ssh/id_rsa': " error. I redacted the IPs and some SHA256 keys, hopefully that doesn't mess the data up, but I don't notice any "errors" per say.
Well you still get a invalid format error for the id_rsa key on node k1n4. Please make sure this one is correct as well.
See the output of diff -y
Code:
debug1: identity file /root/.ssh/id_rsa type 1                | key_load_public: invalid format
                                                              > debug1: identity file /root/.ssh/id_rsa type -1
 
@Chris and all

Just to close this loop, all ended up working once the problem node was resolved too - sorry for the delay in this update.

I didn't expect a faulty node to prevent the whole cluster from permitting migrations. After troubleshooting the problem node, being completely removing the node and reinstalling PVE onto it, and then rejoining to the cluster, I was able to redo all the steps and I can now migrate without issue. I did need to manually removing the SSH keys and manually SSH from each node to all the others to ensure no issues.

I really appreciate all your guys' help!

Consider this ticket closed!
 
Glad to hear! Please mark the thread as solved so others know what to expect.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!