Multi-site federation - Would this be interesting for Proxmox ?

Hakan_k

New Member
Dec 23, 2025
2
0
1
Hey everyone,

I've been thinking about multi-site architecture for Proxmox and wanted to run this idea by the community before spending time on it.

The Problem

Right now Proxmox doesn't have native multi-datacenter support. If you have clusters in multiple locations:
- You manage each one separately
- No unified view of all sites
- Manual VPN configs between sites
- No automatic failover if a whole site goes down

This makes Proxmox hard to use for:
- MSPs managing 10+ customer sites
- Companies with multiple datacenters
- Anyone wanting to build regional infrastructure

VMware has vCenter multi-site, OpenStack has regions, but both are expensive or complex.

My Idea (High-Level)

Add hierarchical levels on top of Proxmox clusters :
Global (optional)
|--- Region (like "EU-West")
|--- Site (NEW - multiple clusters in one datacenter)
|--- Cluster (Proxmox as-is, no changes)


Key points:
- Existing Proxmox clusters stay unchanged
- New orchestration layer sits on top
- Opt-in (you enable it if you need it)

Network Approach

Support 2 modes so it works for everyone :

Mode 1: Mesh VPN (smaller setups)
- Auto-configure with Netmaker or ZeroTier
- Good for MSPs without dedicated WAN

Mode 2: Enterprise WAN (bigger setups)
- Use existing MPLS/EVPN/VXLAN
- Good for enterprises with network teams

Multi-Mesh Isolation

Separate network meshes for different purposes :
- Production (isolated per customer)
- Dev/Test (separate from prod)
- Management (admin access)

With relay nodes controlling what can talk between meshes (useful for GDPR - keep EU/US data separate).

Disaster Recovery (Optional)

Make DR per-VM instead of all-or-nothing :
- Default: Local storage (1x cost) - most VMs
- Critical VMs : Enable cross-site replication (3x cost) - only what needs it

Saves money vs replicating everything.

Why Not Just Scripts?

Yes, you can do some of this with Ansible/Terraform today. But:
- Everyone reinvents the wheel
- No standardization
- Not integrated in Proxmox GUI
- Breaks when Proxmox updates
- No official support

Native integration would be way better - like how Ceph got integrated instead of everyone scripting storage.

What I'm Asking

Before I invest time developing this :

1. Is this interesting to Proxmox team ? Would you consider it for a future version ?
2. Is the approach sound ? Any major problems you see ?
3. Should it be :
- Core contribution (if you want it native)
- External add-on (if not aligned with your vision)
- Just documentation/architecture (for others to build)

Why This Matters Now

VMware prices went up 300-500% under Broadcom. Companies are looking for alternatives. Proxmox is great but multi-site limitation is a dealbreaker for many.

This could open up the MSP and multi-datacenter enterprise market for Proxmox.

Technical Sketch (If Interested)

Would use:
- Network: Netmaker/ZeroTier or MPLS/EVPN integration
- Storage: Ceph multi-site (already exists)
- Orchestration: etcd + Consul
- Estimated: ~3000 lines Perl + ~1000 lines JS
- API: New /api2/json/site/* endpoints

I have more detailed architecture docs if anyone wants to see them.

What Do You Think ?

Is this something Proxmox would want? Does the community need this?

I'm happy to work on it either way, just want to make sure I'm going in a useful direction.

Thanks for reading !
 
You should look into Proxmox Datacenter Manager for your unified view of multiple clusters. You can tie multiple sites together and at least manually start moving things between clusters. With a bit of setup, you can actually do what you want with minimal effort.

I think the largest problem in orchestration is that these are going to be very unique setups geared towards 1 business model. The way you define multi-site and the features you want may not be the way I view it.

For example, you want Ceph multi-site, there are at least 3 ways of doing that and which one to choose depends on both your sensitivity to downtime as well as latency between datacenters. In most cases you still need manual intervention when a datacenter site goes down, because maybe it’s just a temporary blip and maybe your application isn’t built to handle moving between datacenters easily, unless you have BGP to handle floating your IP addresses, which most customers won’t afford.

vCenter doesn’t have an extensive multi-site product, the closest you get is Linked Mode vCenter, but you have to arrange your own networking, security and storage. It’s an ambitious project, certainly feasible to implement today with existing Proxmox, I would suggest making a blueprint in something like Ansible using the PVE, PBS and PDM products and API. That’s how I ‘kind of’ have multi-site capabilities - I have 4 Proxmox Datacenters in 3 physically separate datacenters, one of which is pure DR/Backup - PBS backs up every VM at least once per hour and PBS itself is replicated back to another datacenter - if one of the sites ever burn down to the ground, we recover from one of the backups or if we know in advance (right now, we are moving out of one datacenter) we can use PDM to (live?) migrate to another datacenter, although we can’t really live-migrate because IP addresses would change, but at least, I can move VMs without forklifting the rack.
 
Last edited:
  • Like
Reactions: MarkusKo
You're right that PDM exists and covers the basics. What I'm looking at is adding automatic networking setup and orchestration on top of it. The flexibility part is important, supporting existing WAN infrastructure for those who have it and mesh VPN for those who don't. Same with DR being optional per-VM instead of all-or-nothing.

I agree on the failover point. Manual should be default, auto only for workloads that can handle it.

I don't think Proxmox will work on this anytime soon, but maybe it's worth building as a community project ? If it gets traction and proves useful, Proxmox might consider it later.

Would there be interest in something like this ? Worth building or are people happy with their current scripts ?
 
You're right that PDM exists and covers the basics. What I'm looking at is adding automatic networking setup and orchestration on top of it. The flexibility part is important, supporting existing WAN infrastructure for those who have it and mesh VPN for those who don't. Same with DR being optional per-VM instead of all-or-nothing.

I agree on the failover point. Manual should be default, auto only for workloads that can handle it.

I don't think Proxmox will work on this anytime soon, but maybe it's worth building as a community project ? If it gets traction and proves useful, Proxmox might consider it later.

Would there be interest in something like this ? Worth building or are people happy with their current scripts ?
About pdm, Disaster recovery is on the roadmap. Evpn central configuration is already available in 1.0. If you need other vpn mesh, they need to be added in pve sdn first