Some notes and questions about Proxmox Cluster networking

trent--

Member
Mar 19, 2021
19
0
6
Hello,

I am trying to find more information about Proxmox Cluster networking, and specially the use of ports 22, 5404 and 5405 for intra cluster communication. I feel like the PVE admin guide could be updated with more accurate information (some of which I am contributing in this thread).
I hope this is useful to other folks, and I also hope more experienced users or Proxmox developers can answer some things I don't quite understand.

  • Ports 5404 and 5405 :
In pve-admin-guide for PVE 7, Chapter 5.1 Requirements, p.86, one can read :
« All nodes must be able to connect to each other via UDP ports 5404 and 5405 for corosync to work. »
This does not seem to be the case. I filtered this port on a cluster node, and it seems to be running just fine.
I successfully migrated vms to and from this vm. I also edited the corosync config file on another node using the instructions from chapter 5.11 Corosync Configuration, and the edits were successfully replicated to every node.

I see the following note in chapter 5.7.1 Network Requirements :
« Corosync used Multicast before version 3.0 (introduced in Proxmox VE 6.0). Modern versions rely on
Kronosnet for cluster communication, which, for now, only supports regular UDP unicast. »


So my guess is ports 5404 and 5405 were used for Corosync traffic before using Kronosnet, but Kronosnet does not use them.

The Kronosnet documentation is supposed to be found at https://kronosnet.org/ but is seriously lacking. The home page and the Github page both point to a Google Drive including presentations, but I could not make sense of these, and I have no idea which ports it uses.

So I am wondering, are ports 5404 and 5405 still used ? How does Kronosnet communication work ?

  • SSH / port 22 (or others) :
Also in chapter 5.1 Requirements :
« SSH tunnel on TCP port 22 between nodes is used. »
I would like to know exactly which processes use SSH. I did a little experiment, and it seems that with SSH access disabled, corosync and Proxmox cluster status are still OK, but commands cannot be sent to other nodes (for example, migrating a vm is not possible).

Chapter 5.9 « Role of SSH in Proxmox VE Clusters » lists some uses of SSH, but I do not feel that this list is complete. When trying to do an offline vm migration to a node with SSH disabled, the migration is aborted (it first tries to establish an SSH connection) even though this does not involve any of the things listed in chapter 5.9.
So maybe I am nitpicking here, but it seems that Proxmox uses SSH as soon as you launch a migration.

By the way, SSH does not have to run on port 22. For fellow sysadmins who want to hide their SSH server on another port, you have to set it in both /etc/ssh/ssh_config and /etc/ssh/sshd_config, on all nodes.

You also do not have to set "PermitRootLogin yes" in your ssh_config, you can set it to "without-password" and use /root/.ssh/authorized_keys to allow only keys from others nodes, thus improving your cluster security.

Maybe these pieces of information could be added in the admin guide.

  • Port 8006 :
I also feel like the guide misses an important piece of information about the need to open port 8006 between nodes.
When you are logged in the web GUI through one node, you need this as soon as you click another node to view or execute actions on that other node.
 
I don't tcp 5404 traffic anymore, but 5405 is really used. (with default corosync config)

tcpdump -i eth0 -n port 5405

Code:
18:26:07.918216 IP 10.59.100.232.5405 > 10.59.100.233.5405: UDP, length 128
18:26:07.918505 IP 10.59.100.233.5405 > 10.59.100.231.5405: UDP, length 128
18:26:07.919299 IP 10.59.100.232.5405 > 10.59.100.233.5405: UDP, length 128
18:26:07.919558 IP 10.59.100.233.5405 > 10.59.100.231.5405: UDP, length 128
18:26:07.920352 IP 10.59.100.232.5405 > 10.59.100.233.5405: UDP, length 128
...


ssh need to be on port 22, it's not used by corosync, but by proxmox daemon to send some command across nodes (with hardcoded port 22). (like live migration command for example, also the tunnel migration is using ssh too)
 
<snip>

The Kronosnet documentation is supposed to be found at https://kronosnet.org/ but is seriously lacking. The home page and the Github page both point to a Google Drive including presentations, but I could not make sense of these, and I have no idea which ports it uses.

So I am wondering, are ports 5404 and 5405 still used ? How does Kronosnet communication work ?

Port UDP/5404 was used with Corosync2 in PVE < 6.x. This is a cluster running PVE 5.4 with Corosync2 (with RRP):
Code:
# netstat -planu | grep corosync
udp        0      0 10.3.10.205:5404        0.0.0.0:*                           968125/corosync     
udp        0      0 172.27.0.15:5404        0.0.0.0:*                           968125/corosync     
udp        0      0 239.192.96.53:5405      0.0.0.0:*                           968125/corosync     
udp        0      0 10.3.10.205:5405        0.0.0.0:*                           968125/corosync     
udp        0      0 239.192.96.52:5405      0.0.0.0:*                           968125/corosync     
udp        0      0 172.27.0.15:5405        0.0.0.0:*                           968125/corosync

With PVE 6.x+ and the introduction of Corosync3/KNET, the default port is UDP/5405+linknumber [0-7]. This is a cluster with Corosync3/KNET (also with RRP):
Code:
# netstat -planu | grep corosync
udp        0      0 172.25.10.74:5405       0.0.0.0:*                           1852/corosync       
udp        0      0 10.3.64.14:5406         0.0.0.0:*                           1852/corosync

That is documented here [0] which states in part:
mcastport: <n>

tells knet to use that port number <n> for communication,. The default remains the old one of 5405 +linknumber, but you can override it per link here. Even though knet doesn't do actual multicasting the name remains for old time's sake.

As for the rest, all these projects are open source and you are welcome as anyone to submit documentation updates/changes, e.g., https://bugzilla.proxmox.com/ for PVE.

[0] http://people.redhat.com/ccaulfie/docs/KnetCorosync.pdf
 
Thank you both for the explanation regarding corosync / kronosnet networking, it is clearer now.
I botched my iptables rules so I didn't think it used these ports, but I can now confirm that corosync uses ports 5405 and 5406 / UDP on my cluster, as I use 2 links between my nodes.

ssh need to be on port 22, it's not used by corosync, but by proxmox daemon to send some command across nodes (with hardcoded port 22). (like live migration command for example, also the tunnel migration is using ssh too)
Are you sure about this ? As far as I can see, everything including live migration works fine with SSH running on another port, after setting the port in both /etc/ssh/ssh_config and /etc/ssh/sshd_config, on all nodes. Moreover, I don't see any reference to port 22 in the pve-cluster source code.

As for the rest, all these projects are open source and you are welcome as anyone to submit documentation updates/changes, e.g., https://bugzilla.proxmox.com/ for PVE.
Sure, I will submit documentation updates.
As far as I can see, reading https://pve.proxmox.com/wiki/Developer_Documentation, patches should be sent to the pve-devel mailing lists and there is no public repository allowing merge requests. Is that right ?
 
By the way, SSH does not have to run on port 22. For fellow sysadmins who want to hide their SSH server on another port, you have to set it in both /etc/ssh/ssh_config and /etc/ssh/sshd_config, on all nodes.

You also do not have to set "PermitRootLogin yes" in your ssh_config, you can set it to "without-password" and use /root/.ssh/authorized_keys to allow only keys from others nodes, thus improving your cluster security.
Hiding default port on different number is security by obscurity.

I think, we need root password in cluster join, if i remember correctly (it's some long time, when i created new cluster).
 
cluster joining happens over the API nowadays (but the fallback via SSH is still available/included for the time being).
 
Hiding default port on different number is security by obscurity.

Well, kind of, but it is useful. It hides the SSH server from many scanners. You should have best practices in place anyway, like blocking attackers (for example with fail2ban), disabling root login (which you cannot do on a Proxmox cluster) and disabling password authentication.

I believe changing the SSH port adds an “additional” layer of defense, as proposed in this blog : https://utkusen.com/blog/security-by-obscurity-is-underrated.html
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!