[SOLVED] Node in Cluster keeps going offline

LunarMagic

Member
Mar 14, 2024
57
6
8
I have been experiencing an issue with one of the nodes i just joined to my cluster. It was in an old cluster and worked perfectly fine. For some reason in this new cluster i keep getting this issue. I am able to get the error to explain itself when i'm migrating a vm so that's why in the beginning its showing about a migration failing.


Code:
drive-scsi0: transferred 13.3 GiB of 80.0 GiB (16.64%) in 1m 23s
client_loop: send disconnect: Broken pipe

drive-scsi0: Cancelling block job
drive-scsi0: Done.
2024-11-03 21:34:13 ERROR: online migrate failure - block job (mirror) error: drive-scsi0: Input/output error (io-status: ok)
2024-11-03 21:34:13 aborting phase 2 - cleanup resources
2024-11-03 21:34:13 migrate_cancel
2024-11-03 21:34:27 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=R830-4' -o 'UserKnownHostsFile=/etc/pve/nodes/R830-4/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.50.4 qm stop 103 --skiplock --migratedfrom R830-2' failed: exit code 255
2024-11-03 21:34:28 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=R830-4' -o 'UserKnownHostsFile=/etc/pve/nodes/R830-4/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.50.4 pvesm free Virtual-Machines:vm-103-disk-0' failed: exit code 255
2024-11-03 21:34:28 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=R830-4' -o 'UserKnownHostsFile=/etc/pve/nodes/R830-4/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.50.4 pvesm free Virtual-Machines:vm-103-disk-1' failed: exit code 255
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!

Someone could be eavesdropping on you right now (man-in-the-middle attack)!

It is also possible that a host key has just been changed.

The fingerprint for the RSA key sent by the remote host is
SHA256:hLTfEMqqm7y8Z/NIsXR2tXHQq8AHG5XMTacxMGMG3vY.

Please contact your system administrator.

Add correct host key in /etc/pve/nodes/R830-4/ssh_known_hosts to get rid of this message.

Offending RSA key in /etc/pve/nodes/R830-4/ssh_known_hosts:1

  remove with:

  ssh-keygen -f "/etc/pve/nodes/R830-4/ssh_known_hosts" -R "r830-4"

Host key for r830-4 has changed and you have requested strict checking.

Host key verification failed.

2024-11-03 21:34:29 ERROR: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=R830-4' -o 'UserKnownHostsFile=/etc/pve/nodes/R830-4/ssh_known_hosts' -o 'GlobalKnownHostsFile=none' root@192.168.50.4 rm -f /run/qemu-server/103_nbd.migrate /run/qemu-server/103.migrate' failed: exit code 255
2024-11-03 21:34:29 ERROR: migration finished with problems (duration 00:01:50)
TASK ERROR: migration problems
 
I think that the key in the known_hosts file (line 1) is no longer valid and needs to be removed.

I would do this:
Code:
remove with:

  ssh-keygen -f "/etc/pve/nodes/R830-4/ssh_known_hosts" -R "r830-4"
 
So do i replace the key there with hLTfEMqqm7y8Z/NIsXR2tXHQq8AHG5XMTacxMGMG3vY ?

The command gives me this issue but i can edit it with nano

1730729075842.png
I think that the key in the known_hosts file (line 1) is no longer valid and needs to be removed.

I would do this:
Code:
remove with:

  ssh-keygen -f "/etc/pve/nodes/R830-4/ssh_known_hosts" -R "r830-4"