! Problems with offline migration after fixing a master/node sync error

  • Thread starter Thread starter nordviks
  • Start date Start date
N

nordviks

Guest
Hi!


I encountered a sync error between my master/node servers. I was able to fix the problem and everything works well again with the exception of offline migration. I had to delete and recreate the ssh keys in this process. Since I was able to reconnect to the admin GUI of the master and see the status of both the server and that their was no error, I thought everything was OK.

I get this error when I try the offline migration:

command finished Abort
/usr/bin/ssh -t -t -n -o BatchMode=yes 172.18.140.253 /usr/sbin/qmigrate 172.18.140.254 1001
Permission denied (publickey,password).
VM 1001 migration failed -

I get the same error on newly created VM's. Can anyone please help me to solve this?


Knut
 
Last edited by a moderator:
Hi!


I encountered a sync error between my master/node servers. I was able to fix the problem and everything works well again with the exception of offline migration. I had to delete and recreate the ssh keys in this process. Since I was able to reconnect to the admin GUI of the master and see the status of both the server and that their was no error, I thought everything was OK.

I get this error when I try the offline migration:

command finished Abort
/usr/bin/ssh -t -t -n -o BatchMode=yes 172.18.140.253 /usr/sbin/qmigrate 172.18.140.254 1001
Permission denied (publickey,password).
VM 1001 migration failed -

I get the same error on newly created VM's. Can anyone please help me to solve this?


Knut
Hi Knut,
your keys are in authorized_keys? Mean you can do this without password:
on master:
ssh ip.of.sla.ve ls -l

on slave:
ssh ip.of.mast.er ls -l

Udo
 
Hi Knut,
your keys are in authorized_keys? Mean you can do this without password:
on master:
ssh ip.of.sla.ve ls -l

on slave:
ssh ip.of.mast.er ls -l

Udo

Thank you for the answer Udo!

I have to use passwords, so there is something wrong and that could be the solution to my problem.
There where some keys in the authorized_keys file. They end with the servers IP addresses. When I cat the id_rsa.bup the output ends with the dns name of the server. Otherwise they where the same. I copied the keys into the two servers authorized_keys file, but it made no difference. I still get asked for a password.
 
Thank you for the answer Udo!

I have to use passwords, so there is something wrong and that could be the solution to my problem.
There where some keys in the authorized_keys file. They end with the servers IP addresses. When I cat the id_rsa.bup the output ends with the dns name of the server. Otherwise they where the same. I copied the keys into the two servers authorized_keys file, but it made no difference. I still get asked for a password.
Hi,
i assume you have copy linebreaks in the key?! Extend your terminal-window and look if all in one line.
It's must be the /root/.ssh/id_rsa.pub of the other host inside authorized_keys.

Udo
 
Hi,
i assume you have copy linebreaks in the key?! Extend your terminal-window and look if all in one line.
It's must be the /root/.ssh/id_rsa.pub of the other host inside authorized_keys.

Udo

Yes, it is all in one line. The strange thing was that there was 2 keys on each server. One was for the other server and the other one was from localhost. Is there a way to fix this problem? Should I just delete it and run some kind of command?
 
The node has now changed it's state from A to S. In the web GUI it says: nosync it also sys that the last sync was yesterday while I was struggling to get it to work. The status says delay 1100 minutes.

I can't open the vnc windows on the node server any more :(

I get this message when I run pveca -s:

syncing master configuration from '172.18.140.254'
syncing master configuration from '172.18.140.254' failed (rsync --rsh=ssh -l root -o BatchMode=yes -lpgoq 172.18.140.254:/etc/pve/* /etc/cron.d/vzdump /etc/pve/master/ --exclude *~) : command 'rsync --rsh=ssh -l root -o BatchMode=yes -lpgoq 172.18.140.254:/etc/pve/* /etc/cron.d/vzdump /etc/pve/master/ --exclude *~' failed with exit code 255:
Permission denied (publickey,password).
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(635) [receiver=3.0.3]
 
Last edited by a moderator:
It's no problem, that you have both keys in the file - but there must something wrong, because if the keys are right you don't need an password!

Try this (as root):
Code:
cat ~/.ssh/id_rsa.pub | ssh ip.of.other.node dd of=/root/.ssh/authorized_keys
on both nodes - the the ssh should work without login.

One thing! Is the time is in sync on both nodes??

Udo
 
It's no problem, that you have both keys in the file - but there must something wrong, because if the keys are right you don't need an password!

Try this (as root):
Code:
cat ~/.ssh/id_rsa.pub | ssh ip.of.other.node dd of=/root/.ssh/authorized_keys
on both nodes - the the ssh should work without login.

One thing! Is the time is in sync on both nodes??

Udo

I found part of the problem. I had made some measures to secure the ProxMox servers recently. I had limited ssh login to only a specific user. After allowing root log ins i was able to connect from master to node without a password. But strangely enough not I was not able to do this the other way around; from node to master. I'm trying to figure this out now.

I get this message: ssh 172.18.140.254
ssh: connect to host 172.18.140.254 port 22: Connection timed out

The time is the same on both servers, but not the language. I forgot to mention that the node server suddenly complained about the charset/language settings and that it was falling back to default. I tried to fix it but the google results did not do the trick. It has stoped complaining, but there where something wrong in the output.

Master: ty. 08. nov. 23:31:57 +0100 2011
Node: ti. 08. nov. 23:32:02 +0100 2011 (In Norwegian it should say ti. like this one)
 
Last edited by a moderator:
I solved the problem.

I learned that you have to keep ssh unsecured for ProxMox to work. I will try changing the port and limit the log in attempts though.
My bad was to limit ssh log ins to one user.

The second part was simply a firewall rule that where missing. I missed a rule to ACCEPT fw:node to fw.

Thank you so much for the help :)
 
...
The time is the same on both servers, but not the language. I forgot to mention that the node server suddenly complained about the charset/language settings and that it was falling back to default. I tried to fix it but the google results did not do the trick. It has stoped complaining, but there where something wrong in the output.

Master: ty. 08. nov. 23:31:57 +0100 2011
Node: ti. 08. nov. 23:32:02 +0100 2011 (In Norwegian it should say ti. like this one)
Code:
dpkg-reconfigure locales