[SOLVED] Broken Pipe Lessons Learned (node naming and ssh_known_hosts fumble)

nuvious

New Member
Nov 28, 2020
4
2
3
36
Just sharing a possible issue some people may run into in the future experiencing a broken pipe when migrating a VM, during a replicate operation, etc.

BACKGROUND

I recently upgraded the CPU and motherboard for my alternate node. There were some issues during the upgrade and considering I had already elected to move all my VM's to the primary, I decided to remove the node from the cluster and re-install from scratch. I gave it the same name as the original alternate node (pve-alt specifically), and re-added it to the cluster. Everything was working fine up until I tried migrating nodes back to pve-alt. For very small VM's it would SOMETIMES work. I was able to get a small vm moved over, then set up replication for it and HA and was even able to transition the node back and forth by changing the HA group. For any VM with a disk larger than 16GB's though, I'd get a failure on migration/replication/etc with a broken pipe error and a failed task.

ISSUE

What the root of the issue was that upon re-adding the VM added ssh keys to /etc/ssh/ssh_known_hosts but the original node key was not removed. I resolved the issue by deleting the keys from /etc/ssh/ssh_known_hosts that were from the old installation and rebooting both nodes. I then had an issue where the installation SSD for the alternate node was failing (I kept getting broken pipes and other errors and tried another fresh install under a different node name, but the installation was failing). Thus a new drive, different node name, and a re-add to the cluster and everything is working like clockwork again.

TLDR

When removing a node from the cluster, you need to clean up it's keys from /etc/ssh/ssh_known_hosts (comment if you know other places such keys need to be cleaned up that I didn't mention) and/or when adding a new node to a cluster, consider giving it a name that is unique and doesn't overlap with a previously removed node. Also...don't use SSD's that are 8 years old.

Hope this helps someone in the future!
 

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
6,968
1,080
164
Thanks for taking the time to describe your solution!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!