Promox VE & HA Storage Recommendation

HE_Cole

Member
Oct 25, 2018
45
1
6
32
Miami, FL
Hello Everyone !

I'm new here and to Promox VE, I am getting ready to deploy a rather large HA cluster and i need some guidance on the correct route for storage given the information provided below.

I plan on purchasing a Promox VE subscription once all my equipment arrives and i go to deploy if i need help.

I start with my planned equipment specs.

I have 10x Dell r710s 92GB ram and Dual 6 core Xeons and 2x 10Gbit fibers cards on each server. I plan to bond them as one interface and use vlans for traffic segregation. Each server has a nice H700 RAID card and 6x 3.5 HDD slots.

So the kinda crappy part is i have a ton of brand new enterprise 2TB SATA drives, by a lot i mean like 60+ brand new drives i am trying to make use of. Long story anyways there is alot of storage options to go by with Promox and i wanted some advice on the route to take.

I have a 37 server openstack environment running with ceph and i am happy with it. So its nice to see Promox support ceph for block storage.

To give a overview of the data on the virtual machines they will be all linux, Centos, Redhat and Ubuntu but quite a few of them.

I plan to run hardware RAID 10 because my drive failure rate is very low.

HA is a must with backups and snapshots support required.

Now i was planning on setting it up my Promox environment up like this.
Remember i have 10 servers to work with and mainly 2TB SATA Drives.
--------------------------------------------------------------------------------------------------------------

I was going to setup 2 servers as my shared storage nodes. I could double up the dual 10GBit cards making 40GBit.

Then the other 8 servers with dual 240GB SSDs in RAID 1 i guess.

I am not sure if i should use Ceph or ZFS over SCSI or what would be optimal for speed and redundancy given my specs above. But if i use Ceph i need a storage for backups and ISO images too.

I am leaning towards ZFS seams to preform better then ceph, but i have never used it but the configs look very straight forward.

I really appreciate the help.
 
Last edited:
Sounds good, but the following recommendations:

In a ideal world you would:

-> Segregate CEPH Traffic physically
-> Segregate corosync Traffic physically (1 GBit is ok)
-> If possible segregate also Traffic for other storage protocols (NFS/CIFS) from other traffic physically
-> If possible (especially with big memory VM's) segregate Migration Traffic

For a starting point with the hardware you have available:

-> 1 10 GBit Interface for frontend Traffic (with VLAN's for logical segregation)
- 1 or more VLAN's for VM's network traffic
- 1 VLAN for Proxmox traffic (management + migration)
-> 1 10 Gbit Interface for CEPH
1 VLAN for CEPH public network

-> 1 1 GBit for first corosync ring





So ideally (this is for maximum redundancy, scale it down on your needs):

-> 2 bonded 10 GBit Interfaces for Proxmox Frontend (VM Traffic) (VLAN's for logically traffic segregation probably)
-> 2 bonded 10 Gbit Interfaces for CEPH public net
-> 2 bonded 10 Gbit Interfaces for CEPH private net
-> 1 GBit for first corosync ring

-> second corosync ring can go on proxmox frontend network

for a starting point to keep down costs:
-> segregate public/private CEPH traffic via VLAN and use the same physical interfaces, so you need 4 10 G Interfaces + 1 Gbit

or if the redundancy needs are not so high and you can not afford the additional 10Gbit Interfaces + Switch ports:

-> 1 10 GBit Interface for frontend Traffic (with VLAN's for logical segregation)
-> 1 10 Gbit Interface for CEPH (with VLAN's for public/private segregation)
-> 1 1 GBit for first corosync ring

CEPH is very latency dependant, so 10 GBit and up is good.

Spinning disks for CEPH give decent streaming performance, but bad random and especially sync write performance, so back them with Enterprise grade SSD's for WAL-DB. They don't need to be large, you can put the WAL-DB's for multiple OSD (on same host of course) on one SSD, so a good 128 GByte SSD per Host will be fine. Make a partition per OSD on the SSD, then you can use pveceph, alternately use the newer ceph-volume tool for creating OSD's.

The storage for backups and ISO's does not need to be redundant, so you can use 1 for ISO and 1 for backup, or use 9 hosts for Proxmox.

One more advice: Do not use NFS for backup, large streaming writes on NFS tend to bog up resources and rendering nodes unresponsive till they get fenced. I have much better experience with using CIFS for backup, a nice samba setup on the storage nodes should do it.
 
  • Like
Reactions: HE_Cole
Sorry, hit the post button before editing was complete:

-> in the minimum configuration use 2 VLAN's for CEPH ( public + private), so you can segregate the traffic physically later easily with out hard reconfiguration

the part with the minimum config is doubled, as I wanted to move it up, sorry for the irritations, editing on a laptop with an trackpad is sometimes a pain in the ass.

I hope i could help, don't hesitate to ask back.
 
  • Like
Reactions: HE_Cole
Hey there !

Thanks for the very good reply and in depth to say the least. I would like to say i am network engineer here in Miami and I like to stick the layers 1-4 hehe normally a systems engineer would deal with this, But i don't have anyone who has delt with promox and i definitively dont want another openstack cluster to deal with promox seams easier to run and maintain then openstack.

Now i may have over stated or understated my goal with this promox deployment.

My main goal is to be able to lose at max 1 servers from a 10 node cluster. "Not counting storage" so i will say now i have 13 servers to work with all the same specs "probably going to lower the RAM in storage only nodes" and my drive failures are normally handled by RAID 10 although i know cpeh does well with its own replication and JBOD.

To give better input into the VM that i will be running, I dont plan on a single VM disk being larger then 600GB. Most will be small 30GB to 240GB. All the VMs will be public facing we dont need local networking inside the VM'S. So i dont see the need for a 10GBit for public wan traffic since for this cluster i am only giving a wan 100mbit uplink that will be a 1gbit when i fill the 100mb up. I do not plan on doing any thing with ceph publicly so i dont see any need for a dedicated link for ceph public. I can double the 10Gb dual fiber cards to 2x per server but seems excessive to me since they are not SSDs and only 6x drives in a single chassis. I have 2x 1Gb Ethernet ports on each server i can use too But i would like to try to stick with a 1 dual 10Gbit card per server and the 2x onboard 1gbit Ethernet.

So given all that, Its 13x servers total

As a outer view,

I will start with 3x servers for shared storage nodes.
Each storage node will have --- CPEH
-----------------------------------------
2x 10Gbit SFP+
2x 1Gbit Ethernet
32GB RAM
6x 2TB 3.5 SATA6
Probably RAID 10 "unless someone recommends otherwise"
-----------------------------------------

Each compute node will have.
I will start with 10x servers for compute.
------------------------------------------
2x 10Gbit SFP+
2x 1Gbit Ethernet
92GB RAM
2x 120GB SSD "probably just RAID 1"
-----------------------------------------

Does that sounds optimal ?


"As a note i ran across a few threads that seam to say ZFS is better then cpeh for promox but i have never used it."

Also if i go the ceph route should i use Ceph/RBD or Ceph/CephFS ? block or file?

I like RAW but with RAW can i still have snapshots on ceph/RBD?

Also is it stable&safe to run cpeh and promox on the same server if i place SSD just for promox os? I like this way since it looks very easy to deploy.

Thanks a bunch all!
 
Last edited:
Hey again !

I wanted to add i am very much considering changing the specs around a bit,

I am think i will start with 5 servers and run Ceph and PVE on each.

------------------------------------------------------------------------
Dell R510 Series Specs are Per server with a total of 5 to start.
Dual 6C Xeons
128GB RAM
Dual 10Gbit SPF+
2x 1Gbit Ethernet for Mgt or Corosync or other.
2x 120GB SSDs for OS
12x 2TB SATA6 Enterprise drives.
-----------------------------------------------------------------------

What do you think? I can get my HA and Ceph all in one cluster this way and the setup is simpler for me as i am network guy not a systems guy.
 
> All the VMs will be public facing we dont need local networking inside the VM'S. So i dont see the need for a 10GBit for public wan traffic since for > this cluster i am only giving a wan 100mbit uplink that will be a 1gbit when i fill the 100mb up.

ok, so 1 GBit for outside net is fine.

> I do not plan on doing any thing with ceph publicly so i dont see any need for a dedicated link for ceph public. I can double the 10Gb dual fiber

maybe some misunderstanding with the terms public/private for CEPH:

-> public network is the network on which the clients (rbd/rados etc.) talks with the OSD's
-> private network is the network on which the OSD's talk together (replication traffic)

you can run them both on same network, but if you run in performance troubles later, it is easier to segregate them physically if you already segregated them logically by VLAN's

> cards to 2x per server but seems excessive to me since they are not SSDs and only 6x drives in a single chassis. I have 2x 1Gb Ethernet ports > on each server i can use too But i would like to try to stick with a 1 dual 10Gbit card per server and the 2x onboard 1gbit Ethernet.

is fine for a starting point, but do the setup in such a manner, that you can change later easily.


> "As a note i ran across a few threads that seam to say ZFS is better then cpeh for promox but i have never used it."

ZFS is fine, but not for a real HA setup. For HA you need shared storage


> Also if i go the ceph route should i use Ceph/RBD or Ceph/CephFS ? block or file?
Use Ceph/RBD, it works fine for VM's and is the only one supported in the API/CLI/GUI currently
CephFS will come in future release as I heard, but for backup/ISO/templates.

> I like RAW but with RAW can i still have snapshots on ceph/RBD?
Sure, Ceph handles them

> Also is it stable&safe to run cpeh and promox on the same server if i place SSD just for promox os? I like this way since it looks very easy to deploy.

It is stable, but again let me warn you: Spinning disks only give you bad disk performance for your VM's. Try to get additional SSD for WAL-DB or run with one SSD for OS and one for OSD journals (WAL-DB)

Besides that your 5 Server setup sounds reasonable.
 
  • Like
Reactions: HE_Cole
Hey There !

:):):):):):) Klaus Steinberger
Your Awesome ! :):):):):)

I am glad the 5x server setup sounds good to you i was hoping it would. I plan on making 11 severs once i'm happy with the 5.

I can have 2x SSD's on each server no problem.
1x OS
1x Wal-DB OSJ journals.
and still 12x separate 2tb HDDs for VMs in JBOD. "sucks dell makes you use a 300+ raid card on my servers ): when you dont need it"
1Gbit Ethernet for WAN
I plan to bond the 2x 10Gbits interfaces.
I will of course use a different vlan for everything i can like cpeh public and private, "I love 802.1q so much"

What should i use the other 1gbit for corosync/managent or something else?

Any ways i am sure i will run into some more questions soon !

Thanks again.
 
yes, use the other 1 GBit for corosync and/or management, don't use it for migration traffic (corosync likes low latency)

ahh, one more point: Does your RAID Controller support real HBA Mode? I hope its not one of the MegaRaid things, which force you to use a Raid0 for a single disk.

HBA mode is much better for CEPH and/or ZFS
 
  • Like
Reactions: HE_Cole
Thats a great question !

They have Dell H700's int hem and your right they don't support JBOD but good news i have a huge pile of h200 raid cards that do, so i will use those.

Thanks again
 
Hey there again!

Thanks for the great advice. I am deploying the environment now i was able to add 2x more 1gbit ethernets making it 4x 1gbits and 2x 10gbits per node now. I will use VLANs to separate everything network wise. Now i had a few questions again please, As i am now deploying ceph and promox.

Could you give me a little guidance on the setup side like below with the ? or if i forgot something.
All nodes will have exactly the same specs 100% same but i have only 5 nodes to start for now.

NETWORK
1Gbit-P1= VLAN 11 Managment
1Gbit-P2= VLAN 12 ?Corosync?
1Gbit-P3= VLAN 13 ?No idea?
1Gbit-P4= - VLAN 10 WAN
10Gbit-P1= - bond1 - VLAN 20 Ceph Public
10Gbit-P2= - bond1 - VLAN 30 Ceph Private

STORAGE
I have 2x SSDs per node and 8x 2TB drives for storage,
SSD1= OS-PVE
SSD2= WAL/DB?

I plan to install ceph monitor AND ceph manager on each node is that ok?
And should i use bluestore? or cpeh filestore they look like they are different and is wal/db and the journals are the same if not can they be on the same SSD since i only have 2 SSD's 1 for OS and 1 for WAL/DB/JOURNALS i hope.?

I am using the guide here for ceph.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html

I will setup the networks and bonds/bridges in promox first of course.

Also any ideas on setting up jumbo packets to help speed up ceph? "might leave this for later"

Do i setup all nodes and add them to a PVE cluster first? then install ceph on each node in the cluster? or a different order?

Any other tips if you think of any?

P.S i got HBA mode working great though on my raid card.

Thank you very very much.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!