planning new cluster

vkhera

Member
Feb 24, 2015
192
14
18
Maryland, USA
I am planning a new proxmox cluster for my office network to replace a handful of aging machines. I have been running a test install of Proxmox for a couple of months with some light load. Now I'm ready to take the plunge and move everything.

The problem I'm solving by going virtual is this: I have 4 physical servers all running FreeBSD. One runs 6 FreeBSD jails. Two of the servers run Postgres databases, one runs a Jira + Confluence environment running multiple Java VMs, and the rest run our own applications written in perl with Apache. None of this is heavy duty. The older machines exhibit problems when rebooting after power outages, and this is what I want to make simple. I also want the advantages of virtual machines being easily moved among physical hosts to simplify infrastructure upgrades.

My plan (limited by budget) is to set up the following:

1 FreeNAS host for NFS backups, and to hold the ISO images. This will also serve as backups for the office desktops.
2 Proxmox servers to share the load, and allow for host migration as needed.

The proxmox hosts I'm looking to build will each have
  • 2 x Intel Xeon E5-2630v3, 2.4 GHz (8-Core, HT, 20MB)
  • 64GB ECC RAM
  • Dual Intel NIC
  • Dual Intel SSD for boot (set up in mirror)
  • 4x 2TB or 3TB data drives

I do not need automatic failover. If one box dies, I'm more than happy to spend the few minutes it takes to recover the virtual machine on the other PVE box from backup.

So my questions are these:
  1. Should I use ZFS for the data drives or should I use LVM and the upcoming 4.0 feature with DRBD replication? I can size the drives as needed if everything is mirrored.
  2. Should I use the on-board SATA controllers on the Supermicro motherboard, or get a SAS RAID card and configure it as JBOD for ZFS, or RAID10 for LVM?
  3. Will the installer let me set up the (z)mirror for the boot drive?
  4. Does proxmox use ZFS snapshot send/receive when migrating virtual machines from one host to another without shared storage?

My personal inclination is to go with ZFS since I know it well (as an advanced FreeBSD admin). I'm only a casual linux admin in that most of my linux systems are "appliances" that are pre-configured.

I will get the enterprise support for this cluster. Is there any pre-sales type of support I can buy to help me configure and plan the cluster?
 
Hi vkhera,

Comment on 1)
You can use DRBD with two nodes already and it works best with a hardware raid controller with a battery backed cache (write back cache) for quicker response times of the I/O. Unfortunately you have to use LVM in this setup to have "real" network raid 1, live migration and so on with clustered LVM. Yet if you do not need this and your backup & restore mechanism is really enough for you, you do not need LVM and can use ZFS for everything. If you plan to increase the number of machines, have you considered to try ceph? It scales perfectly without downtime.

Comment on 2)
If the on-board SATA controller is a "real" one (so software raid) and it is supported on Linux, then yes. If not or if you want better performance I'd suggest you buy a decent hardware raid controller.

Comment on the Hardware:
I personally prefer more GHz over more cores. You'll have 32 vCPUs due to HT per host and it is a lot.

Are you planing to migrate 1:1 (Physical2Virtual) or dejail everything?

Best,
LnxBil
 
Yes it is an obscene number of cores. It is on the low end for a pro system from silicon mechanics with hot swap drives. I usually turn off HT.

is there any special advantage proxmox makes of ZFS or is it equivalent to the other storage options?

i plan to ultimately peel off the jails for easier management. I may not do that on initial migration though.
 
...

The proxmox hosts I'm looking to build will each have
  • 2 x Intel Xeon E5-2630v3, 2.4 GHz (8-Core, HT, 20MB)
  • 64GB ECC RAM
  • Dual Intel NIC
  • Dual Intel SSD for boot (set up in mirror)
  • 4x 2TB or 3TB data drives
Hi,
for such an powerfull server (CPU) I would recommend 128GB RAM. IO and RAM is normaly the bottleneck.

About SSD raid1:
for the system it's IMHO useless. If you use the same SSD, both will die in app. the same time (both are written with the same amounth of data), and the system is fast reinstalled.
I would go for one good SSD (like Intel DC S3700), or use an seperate raidvolume (like 20-100GB, depends on your needs) from your Sata-drives.
I do not need automatic failover. If one box dies, I'm more than happy to spend the few minutes it takes to recover the virtual machine on the other PVE box from backup.
No, if you have one time shared storage with live migration you never ever will miss them!! The fun begins with shared storage!
So my questions are these:
  1. Should I use ZFS for the data drives or should I use LVM and the upcoming 4.0 feature with DRBD replication? I can size the drives as needed if everything is mirrored.
  2. Should I use the on-board SATA controllers on the Supermicro motherboard, or get a SAS RAID card and configure it as JBOD for ZFS, or RAID10 for LVM?
  3. Will the installer let me set up the (z)mirror for the boot drive?
  4. Does proxmox use ZFS snapshot send/receive when migrating virtual machines from one host to another without shared storage?
I have done an test some weeks before with 6*2TB in ZFS-Raidz2 (journal on Intel SSD DC S3700).
Write speed is ok, read speed is much much to low.
And zfs is local storage...
I go back to an fast raid-controller (Areca) with BBU and in this case raid6 (raid10 is much better, but we need more space).
This storage is for an single server, so I use lvm directly on the raid-volume. In your setup you should use two raid-volumes mirrored with drbd between both nodes.
I have some servers with this setup (with drbd-volume up to >5TB) which run very well.

Udo

edit: you can use the drbd which is shipped with pve 3.4 (mirroring between two nodes)
 
Last edited:
Thanks, Udo.

I'm unclear about your suggestion for using two raid volumes mirrored with drbd. With this plan, I'd get 4x 4TB SAS drives for the data. You suggest making a pair of mirrors as separate volumes to mirror with DRBD. What benefit does that have over a single RAID-10 mirrored via DRBD? The amount of storage available is the same still, but I would have to manage two 4TB volumes vs one 8TB volume. -- edit: never mind, I see why in the DRBD wiki page.

I also like your idea of skipping the SSDs and just making a small-ish boot volume on the RAID. That will offset the cost of the RAID card ;). I've not had the issue of simultaneous SSD failure before with my boot volume mirrors, though. I guess I've been lucky given that they are usually from the same manufacturing lot.

I've also reduced the CPU from 8 core to 4 core but bumped the speed to 3.0GHz. This is after all just for a small office development environment...
 
Last edited:
I've also reduced the CPU from 8 core to 4 core but bumped the speed to 3.0GHz. This is after all just for a small office development environment...
Hi,
but make an this with an dual-CPU board sense?
With dual-CPUs you can have trouble with NUMA (e.g. processes switches between CPUs, or the data are on "the wrong CPU-Mem").
Instead of 2 * 4 cores with 3GHz I would take an Single-CPU-Board like Asus Z10PA-U8 with an E5-1650v3 CPU (6 cores with 3.5GHz) or an CPU with more cores, if you realy need this.

This config I have on two systems running since a short time. Performance looks good (regex):
Code:
# pveperf /mnt/local_pve
CPU BOGOMIPS:      83806.68
REGEX/SECOND:      2690932
HD SIZE:           543.34 GB (/dev/mapper/pve_local-data)
BUFFERED READS:    454.94 MB/sec
AVERAGE SEEK TIME: 6.02 ms
FSYNCS/SECOND:     4321.59
DNS EXT:           88.45 ms
DNS INT:           0.39 ms
Udo
 
I run a 16 node proxmox environment. Each host has two Xeon x5650's + 48GB RAM and I have no trouble with dual CPU machines. More cores for me any day... more resources for VM's to share as opposed to fighting for time on the less cores but faster CPU.
 
From most virtualization experts you will also hear that cores in general is more important than clock speed. Obviously corner cases exists where clock speed overrules number of cores but generally number of cores is more important for scaling number of stabil VM's.
 
Hmm .. what storage are you (@mir) using? I never had problems with computational power in any cluster environment I used (virtualized or not), but I/O is always the only bottleneck, then memory. Yet, as stated already, load and requirements differ.

mir: Have you enabled NUMA? As described earlier, this is a real problem which can be dealt with by soft partitioning. NUMA-Handling has been introduced in Proxmox 3.4 and is described here:
http://forum.proxmox.com/threads/21313-numa-config-option
 
I have no experience using multi socket with Proxmox. My experience is from work with ESXi 5.5 where we have a setup of 16 nodes with dual socket E5-2695 v2 (12 cores each meaning 24 cores in total on each node and 48 threads). Each node has 396 GB RAM. For storage we use a EMC SAN with 4 vnx-5300 controllers. And yes, we have enabled NUMA.

lscpu from a VM:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 15
Stepping: 11
CPU MHz: 2398.870
BogoMIPS: 4797.74
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 15360K
NUMA node0 CPU(s): 0,1
 
I run a 16 node proxmox environment. Each host has two Xeon x5650's + 48GB RAM and I have no trouble with dual CPU machines. More cores for me any day... more resources for VM's to share as opposed to fighting for time on the less cores but faster CPU.
Hi,
I have also pve-nodes with 4+2 CPU sockets (up to 32 cores), but for this scenario (app. 8 cores) I would use an single CPU-Board (i think it has an better price/power rating).
And about NUMA: yes, since pve3.4 you can also select NUMA for the VMs which is quite usefull.
But if you also use ceph on the nodes, you will have perhaps less trouble with only one CPU. There are some suggestions about this on the ceph mailing list - of course there are also a lot of people which are running ceph on dual cpus hosts whithout trouble.

Udo
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!