Remote cluster setup

bullerwins

New Member
Nov 24, 2022
8
0
1
Hi!

I would like to set up a cluster with 3 nodes. 2 Local (same network), and 1 remote. The remote one is not that far, 5Km. With latency around 3-4ms. The local nodes have 0.5ms latency. All clusters have 10GBit both local and WAN connection (more like 7Gbit in real life usage).

I'm trying to cluster the remote node to the 2 local ones but I haven't got much success. I though it might be the latency as per old threads I found that it needed to be 2ms or less. But according to the lastest wiki, it works with 5ms or less https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_cluster_network_requirements

What I've tried:
Node 1 (local) Network 192.168.10.0/24
Node 2 (local, this would be just a raspberry pi with proxmox installed for quorum)
Node 3 (remote) Network 192.168.1.0/24

Node 1: Set up a LXC Ubuntu 22.04 container, install pivpn with wireguard as a server (I followed this tutorial, but used 22.04), as it seems like it's the easiest way to setup wireguard. Generate a new "client" .conf file. Lest call it wg0.conf. Port forward etc. The vpn works as I've tried to add it to my phone and I can connect to the proxmox GUI just fine using LTE.

Node 3: Install wireguard in the proxmox host (as root):
apt install wireguard
apt install resolvconf
cd /etc/wireguard
nano wg0.conf (copy the generated data in the server)
*For some reason installing wireguard and resolvconf removes the DNS configuration from proxmox. So I have to add the domain and dns server again. I just used 1.1.1.1 or 9.9.9.9.
systemctl start wg-quick@wg0 (to connect)

**I think I have to install the client in the proxmox host as it need to not have any VM or Container in order to join as cluster.

This works, I can now ping from Node 3 to Node 1.

But when joining the cluster, I get this:

Code:
Establishing API connection with host '192.168.1.210'
Login succeeded.
check cluster join API version
No cluster network links passed explicitly, fallback to local node IP '192.168.10.2'
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service

And this pop-up:
1669297103367.png
If I refresh now I can't login as it says wrong password (it's not)

The node appears on the cluster, but with a red sign:
1669297140429.png

And now when restarting the node, the proxmox install is borked. No GUI. I have to manually enter in the CLI this to recover the proxmox GUI:
Code:
systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
rm /etc/pve/corosync.conf
rm -r /etc/corosync/*
killall pmxcfs
systemctl start pve-cluster

Then I can enter the GUI again. But also the "local" and "local-zfs" have merged and only have "local" now. But the options of the "local-zfs" have merge into local (this is really weird to be honest).

What could be wrong? Is this a latency problem? I can add shared storage into Node 3 from the Node1 LAN with no problem.
Even if the performance is not really good I would like to try it.

Thanks a lot!
This are my first steps with Proxmox and my idea is to be able to have a redundant VM and/or Containers so if my node 1 fails, node 3 can take over.
 
Last edited:
All nodes need to be in the same layer 2 network.
Hmmm so essentially behind the same router without going out to the internet right? (I'm sorry for my network skills)
Is this a corosync limitation? any way around it? latency is not that high I think
 
You can bridge Layer 2 networks through a VPN between two routers.
My routers does not allow to install VPNs as they are the ISP default routers and are very limited, I can basically forward ports and that's it. And the ISP does not allow/provide to install other router (not even using theirs in bridge mode).
So I'm limited to to installing/using VPN's in the devices networks. I don't know if that would work.
 
If that's a meshed VPN setup where all nodes see each other in the same IP network it may work. Corosync is using unicast UDP packets for the cluster communication.
I'm trying to set that that "meshed" setup but seems like it needs Static Routing in the router to be able to do it. I'm following this https://forums.unraid.net/topic/88906-lan-to-lan-wireguard/ as it seems like "Lan to Lan Access" would be what's needed.

My router does not allow static routing :(

I can just do Remote to Lan or Remote tunneled, but no LAN to LAN as it requieres static routing in the Routers:
1669368842244.png
 
Hello !
Have a look to VPN tinc, it's a mesh VPN than you can install on your Proxmox machine then you will only need to NAT the right port on your router.
I didn't try yet this, but it's what I'm gonna try soon

There is some tutorial, here an example :
https://silicon.blog/2022/06/06/create-proxmox-cluster-over-a-private-network-using-tinc/
This sounds awesome. Weird that I didn't find this article as I've search some much how to do this, and by the title this seems to be exactly what I need. A mesh vpn is what I need if that's what I think it means.
I'll check it out and report. Thanks!
 
Hi,

Be aware, what you want, and be sure that truly understood the implication of any setup do you want to use(tinc or whatever).

Some ideas!
- keep in mind always what do you need to solve(by the way I do not understood what do you try to solve, I only guess, but maybe a 3 node cluster is not the right solution for your case)
- find at least 2 different solutions, and test each of them

Let imagine that you need to create a 3 nodes cluster ..... OK!

A)
- for tha,t as other posts say in this thread you need only a vpn from site A(2 nodes) and site(1 node)
- this can be done with one single routed vpn(a vpn endpoint on A. on any device do you have here, and the B vpn endpoint - any of this endpoints will must be capable to route trafic from A side to B side in both directions)
- so routing must be done on this endpoints an not on your border routers who are present in A or in B(is not ideal, but is possible)
- it is not important what kind of vpn do you want to use(but must be secure, AND the implementation must be also secure - keep in mind this)
B)
- another case is a "switch" vpn(aka tinc)

Even if B) seems very nice to have it is NOT! Because, your scope/target is to have the minimum latency as you can get!
Simple to understood, is the fact that latency WILL increase as your band will have MORE traffic to transmit.
So both nodes in A will send corosync traffic into a tinc vpn(which is not desired) => MORE traffic. Because tinc is a "switch"(aka layer 2) then even more traffic will travel using this vpn(on both A, and B), so latency will be not the best as in case A(think at broadcast, multicast who is not present in a routed vpn case).

Good luck / Bafta !
 
  • Like
Reactions: bullerwins
Hello !
Have a look to VPN tinc, it's a mesh VPN than you can install on your Proxmox machine then you will only need to NAT the right port on your router.
I didn't try yet this, but it's what I'm gonna try soon

There is some tutorial, here an example :
https://silicon.blog/2022/06/06/create-proxmox-cluster-over-a-private-network-using-tinc/
I tried it with the Tailscale method and got the same results as with wireguard. Same behavior, the node seems to be added, but offline. I can use the shell from the remote node but get and SSL error when viewing the summary for example, and I can't log in in the remote note. Seems to be a problem with maybe being in different subnets? Proxmox doens't seem to like it, it specs every node to be on the same subnet I think.
 
Hi,

Be aware, what you want, and be sure that truly understood the implication of any setup do you want to use(tinc or whatever).

Some ideas!
- keep in mind always what do you need to solve(by the way I do not understood what do you try to solve, I only guess, but maybe a 3 node cluster is not the right solution for your case)
- find at least 2 different solutions, and test each of them

Let imagine that you need to create a 3 nodes cluster ..... OK!

A)
- for tha,t as other posts say in this thread you need only a vpn from site A(2 nodes) and site(1 node)
- this can be done with one single routed vpn(a vpn endpoint on A. on any device do you have here, and the B vpn endpoint - any of this endpoints will must be capable to route trafic from A side to B side in both directions)
- so routing must be done on this endpoints an not on your border routers who are present in A or in B(is not ideal, but is possible)
- it is not important what kind of vpn do you want to use(but must be secure, AND the implementation must be also secure - keep in mind this)
B)
- another case is a "switch" vpn(aka tinc)

Even if B) seems very nice to have it is NOT! Because, your scope/target is to have the minimum latency as you can get!
Simple to understood, is the fact that latency WILL increase as your band will have MORE traffic to transmit.
So both nodes in A will send corosync traffic into a tinc vpn(which is not desired) => MORE traffic. Because tinc is a "switch"(aka layer 2) then even more traffic will travel using this vpn(on both A, and B), so latency will be not the best as in case A(think at broadcast, multicast who is not present in a routed vpn case).

Good luck / Bafta !
You're totally right!

What I'm trying to solve is to have a remote node so it can take over the services that I have running in case the local node has a problem, or for maintenance for example if I don't want downtime. Kinda like replicate what a production environment would be to have as little downtime as possible.
I thought about clustering and HA as it's baked in into proxmox, but seems not to be the best solution and it's not really prepared to handle remote nodes, even with janky setups installing a VPN in the proxmox debian host it doens't really work (So far with what I've tried).

I may keep trying the VPN route but I haven't found yet someone who has achieved this flawlessly or how to do it. The silicon.blog links that @Wiwi posted seem to use the same subnet addresses on the example host.

I might go "production" mode and find other HA solutions. Like setting a load balancer in a VPS to point to the services running (let's use a website as an example). The "local" node is the big one, and the remote one is a little one. So maybe set the load balancer to weigh the local one and just use the remote one as backup. And then set the databases in for nodes with some sort of replication from the local to the remote, but agnostic to Proxmox (the VPN here would be useful to communicate the 2 database instances I think, but shoulnd't be much of a problem as I believe the replication methods just need an endpoint and login, which I think would work with the VPN setup). The static content of the websites doens't seem like a problem as I could just copy them to each node.
One thing that has worked using the vpn is setting up remote storage. For example I have a truenas server locally too, and I could add the NFS share in the remote node (without clustering or anything) So I could backup using the native backup solution of proxmox to the local NFS share of truenas. And the backup appears in the remote node and I can restore it.
What also works is pve-zsync, I have succesfully setup a zfs task to replicate (I can't use the native replication tool as it requieres clustering) but it works with pve-zsync, I can send the VM disk to the remote local storage. And then I can just copy the VM .conf file and create a new vm witht that disk. Have it turned off and maybe set some sort of cronjob to check in the local service status to bring it up...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!