Dedicated Migration Network vs. High Speed Storage Network: Do I need two separate VLANs when Clustering?

Sep 1, 2022
499
191
48
41
Hello,

My lack of experience is showing again, but I'm looking for some clarification on the recommended network setup when clustering two or more nodes together.

Right now, with a single Proxmox VE node and Proxmox Backup Server, I have the following isolated subnets:
  • a dedicated Proxmox Management VLAN (the Proxmox VE and PBS management interfaces live there); and
  • a 10 Gbps Storage VLAN (MTU 9000)--all my 10 Gbps devices live there, including my NAS and the Proxmox node's uplink to the NAS.
Currently, I'm not using any network shared storage for VM or LXC at all. I just use the NAS to store ISO images and CT templates on the Proxmox side.

I'm going to be bringing up a second node soon (I needed a different hardware configuration for certain workloads). This will be my first time creating a cluster, and I've been reading about the need for a "dedicated migration network" or "cluster network." That made me realize I'm a bit confused about how people typically set up their Proxmox network(s) in small 2-3 node configurations.
  • In any case, the Proxmox Management VLAN will remain. It's currently set up as an MTU 1500 VLAN and just used for admin via HTTPS/SSH.
  • The Storage VLAN will remain in place, as I intend for various VMs to access resources on the NAS (e.g., database server files, etc.) even if I'm not using any sort of shared storage at the PVE level.
  • My confusion:
    • Is it recommended to have a separate "migration network" that just handles migration and cluster admin tasks (so, an additional Proxmox Migration VLAN)?
    • Or, is it safe to adjust the Proxmox Management VLAN for 10 Gbps traffic (MTU 9000) and use it as the migration network?
    • Or, maybe I'm misunderstanding and the migration network should carry all data traffic to and from the NAS, as well, so it's not intermingled with general Storage VLAN traffic? I'm not sure how that's helpful--the non-management VLANs would all be going over the same physical wiring/NICs.

It makes intutiive sense to me to keep the management interface separatre and isolated (and I have physical hardware that lets me do that--it's already configured this way), but at the same time it feels a bit overcomplicated to have three VLANs (Proxmox Management, Storage VLAN, and Migration Network) in play. What's the recommended way to do this for a 3 node production/lab small office network?
 
Technically, you do not if this is a home lab, which I am guessing it is.

Now, it is considered best production practice to separate the various network into their own VLANs especially with Corosync with it's own isolated network switches. Notice, I said best practice. However, lots of people do NOT use these best practices. For example, me.

In production, I run various Ceph clusters on isolated switches. All Ceph public, private, and Corosync traffic use these isolated redundant switches (two total per cluster). Is this considered best practice? No. Does it work? Yes. To make sure this network traffic never gets routed, I use the IPv4 link-local address of 169.254.1.0/24 and make sure the data center migration is using this network and use the insecure transfer option (because the switches are isolated).

So, for a home lab, VLANs can be considered optional. Nice to have, sure. But then again, you have to deal with the administrative and management overhead.

At work, I do have stand-alone machines, but these servers use ZFS. One can use Proxmox Data Manager to migrate between ZFS instances.

Both Ceph and ZFS nodes are all backed up to bare-metal Proxmox Backup Servers using ZFS pools.
 
Technically, you do not if this is a home lab, which I am guessing it is.

Now, it is considered best production practice to separate the various network into their own VLANs especially with Corosync with it's own isolated network switches. Notice, I said best practice. However, lots of people do NOT use these best practices. For example, me.

In production, I run various Ceph clusters on isolated switches. All Ceph public, private, and Corosync traffic use these isolated redundant switches (two total per cluster). Is this considered best practice? No. Does it work? Yes. To make sure this network traffic never gets routed, I use the IPv4 link-local address of 169.254.1.0/24 and make sure the data center migration is using this network and use the insecure transfer option (because the switches are isolated).

So, for a home lab, VLANs can be considered optional. Nice to have, sure. But then again, you have to deal with the administrative and management overhead.

At work, I do have stand-alone machines, but these servers use ZFS. One can use Proxmox Data Manager to migrate between ZFS instances.

Both Ceph and ZFS nodes are all backed up to bare-metal Proxmox Backup Servers using ZFS pools.
Thanks! I'll have to decide how complicated i want to get. I certainly don't have more than one set of actual physical switches, and each of my PVE nodes and my PBS server are all on different switches due to physical layout, so no isolated switches for Corosync, I'm afraid. At least not anytime soon.

Just to make sure I understand, you use the migration VLAN only for migration, right? No other data traffic? Why is that recommended, even when all the data VLANs are on the same physical wire? I'm assuming it makes things more stable/easier to monitor somehow.

I really should look at the current state of PDM before I go down the rabbit hole with any of this, but I really wanted to be able to start learning HA at some point, which as I understand isn't really something that PDM does.
 
When moving from standalone nodes to a cluster, the network you have to be most concerned about is the cluster network. The cluster network carries the Corosync traffic, and best practice is to have a dedicated 1Gbps network for just the cluster network (i.e. Corosync traffic). Note, a VLAN on a shared switch is not sufficient; you want a completely isolated, dedicated network. If you are unable to do that, you need to protect the Corosync traffic from any latency (e.g. use network QoS settings to prioritize the Corosync traffic).

Is it recommended to have a separate "migration network" that just handles migration and cluster admin tasks (so, an additional Proxmox Migration VLAN)?

The migration and replication networks handle the traffic between the nodes when moving VMs and LXCs between nodes. There is no specific recommendation to put these on separate VLANs. Since migration and replication have the potential to consume a significant amount of bandwidth, you can assign them to dedicated network interfaces to prevent them from negatively impacting other traffic, such as Corosync traffic.

Or, is it safe to adjust the Proxmox Management VLAN for 10 Gbps traffic (MTU 9000) and use it as the migration network?

By default, the migration and replication traffic happen over the management interface. Whether it is "safe" depends on the amount of migration and replication traffic you will have.

Or, maybe I'm misunderstanding and the migration network should carry all data traffic to and from the NAS, as well, so it's not intermingled with general Storage VLAN traffic?

Migration traffic occurs between nodes within the cluster. If you migrate a VM, the RAM gets transferred over the migration network. If the VM has local storage, that will get transferred to the other node over the migration network. Your NAS will not be involved.

Just to make sure I understand, you use the migration VLAN only for migration, right? No other data traffic? Why is that recommended, even when all the data VLANs are on the same physical wire? I'm assuming it makes things more stable/easier to monitor somehow.

Again, VLAN or not, logical separation of the migration traffic is not the objective. You are correct to wonder about why have a different network if over the same network interface (i.e. "on the same physical wire"). Moving the migration traffic off of the default management network is only beneficial if this means it will be on a different physical interface.

You might want to check out a post I did a little while ago with some general network guidelines. It might be helpful.

 
  • Like
Reactions: UdoB