10GBE cluster without switch

bbuster

Member
Apr 20, 2023
30
1
8
I have bought 3 identical servers, both have a 10gbe network adapter with dual ports.
I want the cluster to communicate via the 10gbe network adapter but each individual server via 1gbe port to the router.

If one of the nodes fails, i want it to fallback to the next server.

So far, i have connected all 3 nodes via RJ45 to the router (via an 1gbe switch).

The 3 servers are connected to each other via 10GBE (spi+ adapter)

So:
server1 (port 2) -> server2 (port 1)
server2 (port 2) -> server3 (port 1)
Server3 (port 2) -> server1 (port 1)

Is this correct and how do i need to setup this?
I searched here (https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server) but i don't exactly know where and how to do what on which node.
So far i have installed proxmox on 3 servers and they have (via 1gbe switch, ip 192.168.3.21 till 192.168.3.23)

I want to use ceph as shared storage, but i don't know if that changes anything for this
 
In order to fail back you need external storage or replication (with zfs drives). I had a 3 node cluster with replication on with zfs and it worked great.

The cluster should be really on 1 Gbps interfaces there is not much in there , depending how much traffic your VMs will use to communicate internally end over the internet.

The 10Gpbs interfaces I would use for syncing (replicating) the storage across the nodes.

Two separate networks: a) PVE cluster network that carrier cluster messages + might also carry your virtual machine "public" traffic , the vmbrX interface that you give to virtual machine and b) replication network that synchronizes your storage.

If one node goes offline the VMs will be started on the other node and work, works very well I tested and implemented it several years ago but did not touch it since then , perhaps there are also some new features.

Thx
 
Okay, sounds good to me, but how do i set it up? I need to create a cluster on the 1gb port?
And i read that cepth is preferred over zfs because the storage is shared?

Proxmox is installed on zfs raid1 with 2x 240gb ssd (per server).
I will install additional 4 x 1.92tb ssd's (per server) for the storage (virtual machines), i think in raid5?
 
Last edited:
sorry , yes you can also use Ceph. Ceph is harder to configure whereas the replication is easy. I am not sure what is the minimum number of storage drives (OSDs) requirement for Ceph but I had replication working on just 4 drives with 2 nodes + 3rd node to keep quorum - it is easy to setup.

I would move your cluster to 1 Gbps interfaces for simplicity you could also VLAN it on 10Gbps interfaces but the overall consensus is that you should not mix the cluster subnet with storage subnet.

If you cannot move it and you just install the PVE and it is not in production you can simply reinstall and use 1Gbps interfaces for PVE cluster.
 
Okay, i still can use the 1gbps interface, so far it is configured on the 1gbps port. I connected the 3 servers via the 10gbe adapter to each other but didn't do anything with that yet.

If i understand you correctly, i need to create the cluster on the default 1gbps interface and the ceph on the 10gbe network?
Can you give me some information how to do that?
 
As for Ceph, it is more challenging to install. You should at least go over this wiki document.

https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster

Because you are installing Ceph under Proxmox you can do that from web admin portal and add storage during the deployment. You need to enable Ceph repositories and install it via the Ceph section on Proxmox and during this process you can deploy it as well.
 
Thankyou, i followed some video's but for some reason it doesn't install (on all 3 servers)
Here is where it stays
 

Attachments

  • Scherm­afbeelding 2024-10-30 om 16.14.26.png
    Scherm­afbeelding 2024-10-30 om 16.14.26.png
    32.9 KB · Views: 13
My bad, i had a wrong router configuration that made the servers not be able to connect to internet. I now can install ceph but i first need to figure out how to make the 10gbe work
 
I have followed this tutorial and am now able to get 10gbe connections between all 3 servers (with fallback on 1gbe if the 10gbe are down).
This works so far (tests seem to work like at the end of the video).

https://www.apalrd.net/posts/2023/cluster_routes/


Now i want to configure ceph but i can only select my vmbr0 as Public Network IP / CIDR and Cluster Network IP/CIDR.
I think i need to configure it via the ensf0 and ensf1?

How can i achieve this? Do i need to make a bond? And if so, how do i need to set it up?
 
  • Like
Reactions: Kingneutron
We have it setup with bridges on their own respective VLAN for each network involved in an HA setup. We have at minimum the following bridges for:

Management
Ceph Private
Ceph Public
Corosync
Migration (for VM migration)

Each of the above a separate private CIDR/subnet (ie 10.0.10.0/24 for management, 10.0.11.0/24 for Ceph private, etc) or whatever IP scheme that works for you.

We bond the relevant physical adapters, and then attach bridges to those bonds. We then use SDN to define bridges/vlans for our guests.
 
How can i achieve this? Do i need to make a bond? And if so, how do i need to set it up?
the loopback (lo) address is what you would use for both ceph and corosync. there is no bonding or any other options to configure.

A word of caution of this configuration- it is possible to create a denial of service condition under heavy load, since corosync and ceph will be sharing available bandwidth.
 
I think i got it working but now i don't know if i did it correctly. I created a bond and used the ipv6 i generated earlier. I created the ceph storage and added all the ssd's from all servers. It seems to work, i can disconnect or turn off a server and after a few minutes the vm switches to another node.

Can i check if i set it up correctly?
 
Things changed because my switch didn’t have enough ports.

I could buy 2 HPE 2920-48G switches, both with 2 x J9731A dual 10gbe module and a stacking module.

My thoughts are to use the switch instead of point to point.

Every server connected 1 x 10gbe to each switch which makes it redundant, also connecting all 4 x 1gbe to both switches (2x to switch 1, 2 to switch 2).

Are my toughts correct? And how do i need to configure it for the best speed and most reliability?

Both switches have stacking modules as well, i just ordered the stacking cable which will arrive in 1 week
 
I rebuild my server rack and needed to disconnect my lan cables (1gbe and 10gbe). I restarted my cluster to test a few things without the cables attached, or only 1 cable (1gbe) and now reconnected 4 lan cables (1gbe) and the 2 x 10gbe cables in point to point configuration.

When testing with iperf3, i only get 1 gb speed:

Code:
root@proxmox-2:~# iperf3 -c fd69:beef:cafe::521
Connecting to host fd69:beef:cafe::521, port 5201
[  5] local fd69:beef:cafe::522 port 37678 connected to fd69:beef:cafe::521 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   113 MBytes   946 Mbits/sec    0    441 KBytes       
[  5]   1.00-2.00   sec   110 MBytes   925 Mbits/sec    0    441 KBytes       
[  5]   2.00-3.00   sec   111 MBytes   933 Mbits/sec    0    488 KBytes       
[  5]   3.00-4.00   sec   110 MBytes   922 Mbits/sec    0    488 KBytes       
[  5]   4.00-5.00   sec   111 MBytes   932 Mbits/sec    0    488 KBytes       
[  5]   5.00-6.00   sec   111 MBytes   927 Mbits/sec    0    488 KBytes       
[  5]   6.00-7.00   sec   110 MBytes   924 Mbits/sec    0    488 KBytes       
[  5]   7.00-8.00   sec   111 MBytes   931 Mbits/sec    0    512 KBytes       
[  5]   8.00-9.00   sec   111 MBytes   930 Mbits/sec    0    536 KBytes       
[  5]   9.00-10.00  sec   110 MBytes   925 Mbits/sec    0    536 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.08 GBytes   930 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.08 GBytes   927 Mbits/sec                  receiver


iperf Done.

I restarted the cluster multiple times but it keeps 1gb/sec instead of 10 (which worked before)
What can cause this issue?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!