ProxMox networks [Design]

Axl_fzetafs · Apr 18, 2023

Hi,

I will need 4 types of networks in my environment (I guess anybody who runs ProxMox at enterprise level would need them):

1) the one for corosync (ideally, to have a dedicate one);
2) the one for web-GUI/SSH;
3) the one for the data of the VMs (a lot of VLANs);
4) the one for the disks of the VMs hosted on the NFS share (let's call it "NFS disk network")

We have 3x HP DL380 with 4 built-in copper interfaces and a card with two SFP+ slots (2 of them will have optical SFPs, 1 just copper)
We have a Cisco stack with two units. I'm going for redundacy and try to reduce the impact in case one of the switches dies.
Hence I try to use LACP bonds whenever possible. In the case above I'd use 4 LACPs but I'd need 8 interfaces. Hence I have to redesign how to use the interfaces.

Considerations

The datacentre is run in an environment where we cannot guarantee a rapid intervention in case of issues, hence things must run even though one of the units of the stack fails and all the services must stay up for a week or two before the intervention for fixing.

In terms of criticality, I identified the most 1) and 4) (you don't want to miss neither a heartbeat nor you want your VMs have sluggish access to their disks). It's true that I can distribuite heartbeats over two L3 segments (so redundacy is done at L3 rather than by coupling L2 links).

So maybe I can remove the LACP for assign it to the VMS communications and then spread the corosync heartbeats over these three links (NFS disks network, the VMs' comms and the Web-GUI/SSH).

Anyone who would like to share his/her approach on this topic?

Alex

mira · Apr 18, 2023

I assume you plan on using HA. This means a separate physical network for Corosync alone would be recommended.
Do I understand it correctly, you will have 6 interfaces available?

In this case I'd suggest adding 2 Corosync links using 2 1G interfaces. One of those over each of the switches should make them redundant.
Using a bond for Corosync is not recommended, since it has its own failover implementation with lower latency than a bond will provide.
You can still add more links on any of the other interfaces for failover with even lower priority in case the NIC has issues.
Corosync doesn't need lots of bandwidth, but low latency instead.

You can make the GUI/SSH (IP the hostname resolves to) available via any of the interfaces. I'd recommend either the VM one or the storage one.

What are the requirements for the amount of VMs and the storage/disk speed?
Are those NICs all 1G or some of them 10G or more?

Axl_fzetafs · Apr 18, 2023

mira said:
I assume you plan on using HA. This means a separate physical network for Corosync alone would be recommended.
Do I understand it correctly, you will have 6 interfaces available?

Yes, 4 integrated + one card with SFP+

mira said:
In this case I'd suggest adding 2 Corosync links using 2 1G interfaces. One of those over each of the switches should make them redundant.
Using a bond for Corosync is not recommended, since it has its own failover implementation with lower latency than a bond will provide.

Reaching 8 interfaces? 3 LACPs and 2x L3 subnets links for corosync?

mira said:
You can still add more links on any of the other interfaces for failover with even lower priority in case the NIC has issues.
Corosync doesn't need lots of bandwidth, but low latency instead.

That's why I'd love to have corosync over fiber rather than copper. I'm told that fiber should have lower latency than copper.

mira said:
You can make the GUI/SSH (IP the hostname resolves to) available via any of the interfaces. I'd recommend either the VM one or the storage one.

The storage and corosync won't have any gateway. Maybe I might use 2 interfaces for WebGUI and traffic VLANs of VMs (that actually won't be that much). We are speaking of 3x Windows Domain Controllers, 2x RADIUS and 4 Application machines, plus syslog server and 2x Service VMs. A total of 12 machines.

mira said:
What are the requirements for the amount of VMs and the storage/disk speed?
Are those NICs all 1G or some of them 10G or more?

At the moment, due to the switch with limited optical ports (we use an uplink module), we only have 1Gbps optic for two will-be nodes and 1Gbps copper for the third node. Because they are cards with 2x SFP+, in the future we ca easily switch to 10Gbps by buying an appropriate optical switch and changing the SFP+

mira · Apr 19, 2023

Are the integrated ones one single NIC or multiple NICs?

2 links should be more than enough, one for each switch. To be safe, you can put one on each physical NIC.
For example one integrated NIC per switch and one additional failover link with lower priority on the Fiber one.
You can put the additional failover link in a VLAN to separate it from the rest of the traffic. The other 2 links are probably better off not shared with any other kind of traffic completely.
If you do have to share it, QoS settings may help keep latency low for Corosync still.

Fiber or Copper doesn't make a difference for Corosync. With low latency we mean ~2ms all your NICs should be capable of.

If all your storage is made available via network (NFS), it may not provide enough bandwidth and IOPS over a 1G link (2G bond). It depends on your storage as well.
You may want to run benchmarks with `fio` once you set everything up.

Axl_fzetafs · Apr 19, 2023

mira said:
Are the integrated ones one single NIC or multiple NICs?

There are 4 integrated network cards

mira said:
2 links should be more than enough, one for each switch. To be safe, you can put one on each physical NIC.

Indeed for each network I may be able to have a 2-links bond, where each link goes to a different switch.
I now agree with you that corosync does not take advantage of LACP, on the contrary it might be worse.

mira said:
For example one integrated NIC per switch and one additional failover link with lower priority on the Fiber one.
You can put the additional failover link in a VLAN to separate it from the rest of the traffic. The other 2 links are probably better off not shared with any other kind of traffic completely.

By failover link, do you mean the corosync network? As to the other 2 links, what do you mean?

mira said:
If you do have to share it, QoS settings may help keep latency low for Corosync still.

I know, but I may not feel comfortable to set QoS on the switch

mira said:
Fiber or Copper doesn't make a difference for Corosync. With low latency we mean ~2ms all your NICs should be capable of.

Thanks for sharing this info.

mira said:
If all your storage is made available via network (NFS), it may not provide enough bandwidth and IOPS over a 1G link (2G bond). It depends on your storage as well.

Other people more expert with storage, suggested to have a shared volume (for a faster migration) and come up with the NFS proposal. I know iSCSI was designed for such pupose, but afaik a single iSCSI volume cannot share among several initiatiors, if this held true, it would mean two OSes could access the same hard disk. I admit my poor knowledge, here, maybe the iSCSI protocol does provide a mechanism to allow two OSes to access the same block device.

mira said:
You may want to run benchmarks with `fio` once you set everything up.

Thanks, I didn't know fio

Search

Search

ProxMox networks [Design]

Axl_fzetafs

Active Member

mira

Proxmox Staff Member

Axl_fzetafs

Active Member

mira

Proxmox Staff Member

Axl_fzetafs

Active Member

We value your privacy