First time PVE setup

coreyclamp

Member
Jul 2, 2020
1
0
21
47
I've gotten a lot of mileage out of my lab/home compute environment, but it's running on 10th gen Dell hardware and has finally come time to be put out to pasture. I caught a great deal on some used Dell 12th gen servers, so I went ahead and decided to refresh everything and move from vSphere to Proxmox in the process. After a few weeks of setting up and reinstalling several times, I think I may have it figured out. I do still have some confusion and would like to know what best practices would be in this setup before I start migrating VMs from my vSphere setup.

Server Hardware:
  • 3 x PowerEdge r620 (ProxMox nodes)
    • 2 x 146 GB SAS (proxmox)
    • 6 x 600 GB SAS (ceph OSD)
    • 4 x 1GbE
    • 2 x 10GbE
  • 1 x PowerEdge r720xd
    • FreeNAS - serves NFS, SMB, & iSCSI to both host and guests, is primarily used for IP cam footage, media streaming, & backups)
    • 4 x 1GbE
    • 4 x 10GbE
Network Config:
  • VLANs/Subnets:
    • vlan 10 - Proxmox management traffic (default gateway)
    • vlan 11 - Proxmox/corosync cluster traffic (non-routed subnet)
    • vlan 12 - ceph public (non-routed subnet)
    • vlan 13 - ceph cluster (non-routed subnet)
    • vlan 21 - NFS/SMB services
    • vlan 22 - iSCSI services
    • vlan xx - various subnets VM guests, placement may be on either bond/bridge depending on requirements
  • OVS Bonds:
    • All LACP layer 2 & 3
    • bond0 - 4 x 1GbE
    • bond1 - 2 x 10GbE
  • OVS Bridges
    • vmbr0 - bond0
    • vmbr1 - bond1
  • OVS IntPorts
    • prox_mgmt (vlan 10, vmbr0)
    • prox_cluster (vlan 11, vmbr0)
    • ceph_public (vlan 12 vmbr0)
    • ceph_cluster (vlan 13, vmbr1)
    • stor_nfs (vlan 21, vmbr1)
    • stor_iscsi (vlan 22, vmbr1)

All 1G connections to a Cisco catalyst, 10G links are to a MikroTik CRS317. Switches connected together with a dual 10G.

Ceph-mon and metadata daemons on on each node

Is there something else I should be doing in regards to the network setup, specifically with ceph? Will sharing the dual 10G bond with iscsi/nfs, and some VM traffic have much impact with the OSD replication? I don't plan on really taxing this setup, as the most I have is a home lab (AD domain, db clusters, and a few web/app servers etc.), there is a security cam DVR server that currently runs 16 cameras, which I plan on doubling - but with h.265, it's only going to be processing about .75-1 GB/hr each camera.


Is there anything I missed or that I should be doing different?
 
That should work, but IMHO I suggest you consider:

1. Take one of the 1G links out of the bond and create a separate, dedicated network for PVE (Corosync/Kronosnet) [0]
2. Six 10K SAS drives x 3 nodes can easily saturate a 10G network during recovery (and probably just a rebalance). During these events, you'll likely have issues with the iSCSI and NFS sessions as well, which creates more traffic. Also, layer 2+3 hashing will likely not balance CEPH, NFS, or iSCSI -- these sessions are all layer 3+4. As long as you are aware, you can deal with it. If you can't, break the 10G bond and run 1 x 10G for CEPH (public/cluster) and 1 x 10G external storage (iSCSI, NFS, etc.)

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_cluster_network