Hello,
I am planning to take advantage of the slow summer months and setup a lab environment using PVE and either glusterfs or ceph to evaluate them as an option for running things on top of native linux and openio like we currently do at the company where I work at. I have done a fair bit of homework studying the different options and best practices however optimal network design/segmentation/configuration for a cluster such as this is something I haven't quite managed to wrap my head around yet.
Cluster will consist of following hardware:
ToR:
Nexus 3064-X (supports mlag)
Storage Server:
4x 10Gb NIC connected directly to ToR with DAC cables
6-8 drives + cache SSD for distributed storage
Compute Server:
HP BL460s blades in C7000 chassis
2x 10Gb NIC on each blade
2X virtual connect module with 2x 10Gb uplinks to ToR in each chassis for a total of 4 uplinks per chassis.
No local storage for VMs
Both compute and storage server will be running PVE and the later will also run distributed storage platform for the VMs that are running in the cluster. Most of the VM will be running on the compute servers unless there is some special reason for moving the workload to the storage server. Like DB VM benefiting from local disk access or VM with high cpu or memory requirements demanding for a larger form factor server.
I am fully aware that having only two NICs on the compute servers and four uplinks in each chassis to ToR will severely limit my options when it comes to network design. Unfortunately it's not possible to change this at this stage but I will most definitely fix the situation if this will be put to production at some point.
As my previous type 1 hypervisor experience is limited to vSphere/vSan I am quite confused about how I should:
Below is list of networks I planned on distributing the different types of traffic to and how I planned to assign them to the different NICs. Any feedback/real world experiences on this would be highly appreciated!
Networks:
VM private
VM DMZ
Storage front
Storage back
PVE management
PVE cluster/cronsync network
Configuration on Compute:
NIC1: Trunk
Storage Front
VM private
VM DMZ
PVE management
NIC2: Access
PVE cluster/cronsync network
Configuration on Storage:
NIC1:
Storage front
NIC2:
PVE cluster/cronsync network
NIC3:
Storage back
PVE management
NIC4:
Storage back
VM Private
VM DMZ
I am planning to take advantage of the slow summer months and setup a lab environment using PVE and either glusterfs or ceph to evaluate them as an option for running things on top of native linux and openio like we currently do at the company where I work at. I have done a fair bit of homework studying the different options and best practices however optimal network design/segmentation/configuration for a cluster such as this is something I haven't quite managed to wrap my head around yet.
Cluster will consist of following hardware:
ToR:
Nexus 3064-X (supports mlag)
Storage Server:
4x 10Gb NIC connected directly to ToR with DAC cables
6-8 drives + cache SSD for distributed storage
Compute Server:
HP BL460s blades in C7000 chassis
2x 10Gb NIC on each blade
2X virtual connect module with 2x 10Gb uplinks to ToR in each chassis for a total of 4 uplinks per chassis.
No local storage for VMs
Both compute and storage server will be running PVE and the later will also run distributed storage platform for the VMs that are running in the cluster. Most of the VM will be running on the compute servers unless there is some special reason for moving the workload to the storage server. Like DB VM benefiting from local disk access or VM with high cpu or memory requirements demanding for a larger form factor server.
I am fully aware that having only two NICs on the compute servers and four uplinks in each chassis to ToR will severely limit my options when it comes to network design. Unfortunately it's not possible to change this at this stage but I will most definitely fix the situation if this will be put to production at some point.
As my previous type 1 hypervisor experience is limited to vSphere/vSan I am quite confused about how I should:
- Segment the traffic into different networks?
- How I should spread the different networks between the 2 NICs on compute or 4 NICs on storage servers?
- What type of interface configuration (access port, trunk, mlag) should I use for the NICs?
- Any additional steps (QoS?) I should take to prevent possible issues with the compute blades? All of the servers are next to each other and ToR switches are of the low latency X model so bandwidth should be the only issue.
- Given the high amount compute servers I was planning on using OVS. Any thoughts on this vs the linux alternative?
Below is list of networks I planned on distributing the different types of traffic to and how I planned to assign them to the different NICs. Any feedback/real world experiences on this would be highly appreciated!
Networks:
VM private
VM DMZ
Storage front
Storage back
PVE management
PVE cluster/cronsync network
Configuration on Compute:
NIC1: Trunk
Storage Front
VM private
VM DMZ
PVE management
NIC2: Access
PVE cluster/cronsync network
Configuration on Storage:
NIC1:
Storage front
NIC2:
PVE cluster/cronsync network
NIC3:
Storage back
PVE management
NIC4:
Storage back
VM Private
VM DMZ