So I LACP just one switch to the other switch which has LACP turned off on those ports? Right now I have LACP turned on on both sides and the connection shows up. STP is turned off on that LAG.
Im not sure what you are asking for. You have two seperated switches, those are connected to each other. But you cant do LACP with one port on switch1 and one port on switch2. So you can either put 2 Ceph ports per node on one of those switches and use lacp (2x 10Gbit), but this would also mean that if a switch fails, your ceph node is down (no network). Thats why Active Backup bond is recommend, because you dont need switch-stack for that. means. 1 ceph port on switch1 and one ceph port on switch 2. But this is 10Gbit only (less performance) because of active/backup (passive) --> but you can loose a switch without loosing ceph network.
For the interconnect between the two switches you can use lacp (crosslink).
I would assume the UI port needs an IP. How about corosync? Just assign IPs to these adapters? Same for bond1 for ceph? On the ceph side any thing wrong with two separate switches and networks (since I have the equipment) like this:
Usually in best practice deployments corosync gets its own physical switchport for the primary corosync_link0, which is used for proxmox-cluster-communication. corosync_1 fallback link can be set to the ceph storage-ip (it will only be used if link0 is down). Putting corosync on the same interface of the ui could possibly lead to a error, which causes the node to reboot if the timestamps (that proxmox ve cluster) writes are not intime. If you dont have enough ports, use Linux-VLANs inside the hypervisor to seperate UI-traffic and corosync-traffic.
1) ceph bond1 bonding 2 ports on each card. Using LACP connect with one TPlink switch
2) ceph bond2 bonding the other cards ports using LACP connect with other TPlink switch
3) ceph bond3 bonding both bond1 and bond2 either as active/backup or LACP if that would work across two switches.
1 and 2 are ok if you are using ceph public network for one bond and ceph cluster network fo the other bond. BUT as I said, when using lacp without having a switch-stack oder mlag-feature on your switches (which you dont have) a switch failure leads to complete node-downtime (storage-wise). In your case if you put all your ceph-network (doesnt matter if public or cluster ceph) you will have complete downtime (everything is offline!) With Active Backup you could loose a switch, without ceph going down.
Edit: 3 makes totally NO SENSE at all - I dont think this is working and a valid configuration.
Anything wrong with putting the management UI on vmbr0 with 4 10g ports bonded this would free up a second 1g that could be bonded for corosync? How do I point corosync to this port or bond?
You can put UI IP on vmbr0. LACP with 4 Ports is unusual, typical you have 2 ports in lacp (working fine). As I already said, corosync should not be put on a interface that has a lot of traffic, as this can increase latency which causes node fencing (-> reboot of node -> which means downtime). So you are technically able to do the things you are asking for, but it is not best practice.
If its not possible to add a 1gbit card, you should use a Linux-VLAN-Adapter to separate vmbr0 and UI-traffic via VLAN.
Not sure what you mean about Proxmox backup server. I have a standalone machine for that. It has 4 10g ports and 2 1g ports. I was going to use 1g for management and the 10gs bonded for backup traffic as vmbr0 on that machine OR are you saying "put another card in the nodes for backup"?
Yes 1G mgmt is fine. 10G for Backup aswell, just make sure you are able to ping the pve-nodes from your pbs. Depending on the network you will use for "backup" you need to make sure you can reach the pve nodes.
Noted as well. My raid cards are running in HBA mode. They can handle 2 U.2 drives each or 8 SAS drives each. Would that be preferable over 8 data ssds? In either case are refurbished enterprise drives acceptable? I need to check what my backplanes can handle. They are handling the consumer SSDs fine and did well when setup as 7 drive raid 5 with HS. I'm thinking refurb should be ok, I'm not generating that much data.
Depending on how much data you have, I would go for 8 drives, as youll have more allowed disk-failures. with 2 disks, you can only loose on disk on a node, usually you start ceph with 4 disks per node.
I currently have two two port 10g NICs I had planned for this. Would it be better to bond two ports for VM traffic on vmbr0 and bond the other card for backup traffic? Backups will be nightly but in off hours. In that case ok to use all four on vmbr0 (I've installed but not configured Prox backup yet)
if backups dont interrupt your vm-performance (offhours) you can use the same adapter, but might need separation via linux vlan, because of different IP-subnet for backup then for Management (you said you also want to put mgmt on that) ,,,
Ok very confused here. These would be a ceph pool and backed up nightly. One drive per server. As I understand it these Optanes are enterprise grade. Plugged into riser cards. Can ZFS run on top of ceph? Or will my backup suffice?
You can run ZFS besides ceph, zfs usually has better performance then ceph. so zfs with a single disk will be better then ceph with a single disk per node. you can run zfs async (!) replication via UI and sync the vm between two nodes and also use HA for this vm. BUT its async means if the node fails where the vm is running, the vm will be started on the node where you replicated the data to (but with the data from the last sync!). Depends on how much you write, you can set the zfs-replication window down to 1-5 minutes. so every 5 minutes, the data gets synchronized. if a node fails, you will have X-minutes of data-loss depending on when the last sync was sucessfully completed.
I'm doing clean installs on each node today. Will then see what you say before I proceed. Mainly I am unsure how to separate corosync from management. Initial setup everything just runs on one port over vmbr0 so I really need guidance on utilizing my network hardware in the most correct manner.
As far as I understood you have 2 Ports that you wanna use for:
- vmbr0 (VM-Traffic) (you can put IP for UI on that device)
- Corosync as well (advice use Linux VLAN to separate it from the vmbr0 traffic vlan-wise)
So your total number of nics per NODE is:
- 2x 10Gbit (VM/UI/COROSYNC/BACKUP) Ports
- 2x 10Gbit (CEPH) Ports
is this correct? Or do you have 6 10Gbit Ports? Please make a list of avaiable ports per node.
Idk if you saw but I did list all my hardware a few comments up for you.
I do think my servers would have to run SAS enterprise SSDs. I don't think I have the PCIe lanes to run very many u.2 drives. What I'm fidning online are refurbished drives. I'm pretty sure the backplanes are only 6gbit, the raid/HBA cards are 12gbit or two u.2 but I do not know if those will run in my backplanes.
Yeah I would go for sas-ssds (8x) per node instead of 2x u.2 per node.