Cluster in a box

jinjer

Renowned Member
Oct 4, 2010
204
7
83
Hi all,

I've been working on a idea for a proxmox based "cluster in a box".

A couple of years ago, I installed something similar on a Intel modular server that it's still running. The main problem is the very low performance of the disk subsystem which is a real bottleneck.

What I would like to build is an updated version of that, with a single chassis housing a bunch of independent nodes connected only via bonded gigabit links. For the hardware, I've been eyeing the fattwin nodes from supermicro (like the SYS-F617R2-RT+) where you get several nodes with some disks attached to each node and no way to configure or share the storage between nodes.

A part from sharing the power supplies and the chassis, the nodes are totally independent.
I think they would make a perfect fit for a clustered filesystem like ceph or gluser.

To leverage the hardware, all nodes need to run the storage-cluster part of the system and also the virtualization part too.

With dual processor nodes and ram going all the way up to 512GB per node, I think this type of install could be possible and would make for easy colocation worldwide: Just ship the chassis pre-configured and get it connected to the net and power and you're set.

Does this sound like a good idea?
 
yes, that would be the only way to use the disks in the computing nodes.

Ideally ceph would be disk-intensive and proxmox would be cpu intensive = better use of the hardware.
he ram could be expanded to whatever is necessary for ceph + kvm/openvz.

There are some question marks however:

1. I don't know how much cpu is required by ceph. There is some "oprofile" thing built in to obtain this data, but I have no real cluster to run it on.
2. Ceph works best on ununtu (12.04lts is recommended). Wheezy is almost identical to ubuntu lts. Maybe ceph can run properly on the same metal as proxmox.
 
I have not tried (yet) to install ceph on proxmox 3.1.
The wiki says that it's not currently possible.
The ceph team has a "howto" for installing ceph on debian wheezy.

Is the wiki not update or is it that ceph won't run on 3.1 (perhaps because of the kernel?)
 
You keep referring to Wheezy but actually you should refer to Redhat since support for ceph requires kernel support and Proxmox uses a Redhat kernel.

See: http://ceph.com/docs/next/install/rpm/
Thank you for pointing this out. I refer to wheezy because we're running a wheezy based distro with a redhat kernel. I guess a recompile from source for ceph will be necessary anyway. And possibly a recompile of the source packages for debian will be an easier thing to do than using the rpm-based stuff.

I've done this with zfs (i.e. make the kernel and other modules for proxmox using the debian stuff). Still have to try the same thing for ceph.
 
Theres a list of minimum requirements for ceph HERE, and I remember from a talk that inktank gave that you should estimate about 1GHz on 1 core per OSD(=disk).

For a node with 4 disks and 1 SSD (ceph OSD cache) that means 3GB of memory and 4GHz/core of "processing power" that will be required by ceph. Now obviously that doesnt take up all the ressources of modern servers but its still something to consider since you need to keep the total cpu usage below 50% (since you should always have enough capacity to compensate for a complete node failure).
 
"4GHz/core of "processing power" You mean a Xeon 4GHz ?? I have not heard or seen any reference to a 4GHz xeon. Are you sure they didn't say you should count on using the 1GHz per unit for ceph (1 GHz per socket) given you run all the parts on the same host?
 
yes thats what I meant. if you have a 4GHz cpu, itd fully use one of its cores. plus whatever computing power the mon needs
 
We currently run some test here to see if running OSDs on PVE nodes make sense. But we have no results so far - still waiting for hardware.

This is what i started my life with CEPH. Initially i thought why not use Proxmox nodes which are up and running any way to be also used as CEPH node. Had major failure in several stages. Keep in mind this is just based on my hardware and experience.

#1 was resource comsumption. Any time CEPH went to recovery mode things slowed down "significantly". Proxmox lost connection with CEPH cluster many times.
#2 Reboot issue. Any time i had to reboot the node, ALL had to restarted both proxmox and CEPH, even if the issue was for the CEPH only.
#3 Physical Server limitation. Initially Proxmox nodes were just servers with 1 or 2 SSD to run OS. So they were all of them 1U chasis. To add more HDD, i had to change 2U chasis with more hdd bays. Although it would have save space by combining both cluster in one node.
#4 NIC nightmare. Since i tried to keep CEPH traffic away from Proxmox traffic, i had to try to setup NIC in a very very complex way.

There were other issues i cannot think of. At the end it just was not worth it. To keep life simple and ensure maximum uptime, separate cluster option cannot be beat in my opinion. With separate CEPH Cluster, i now even have option to extend storage cluster node over remote distance.

Sent from my ASUS Transformer Pad TF700T using Tapatalk
 
#1 was resource comsumption. Any time CEPH went to recovery mode things slowed down "significantly". Proxmox lost connection with CEPH cluster many times.

Sure, such setup is only for low performance requirements. But proxmox should not loose connection if you use separate network connections - I will test that.

#2 Reboot issue. Any time i had to reboot the node, ALL had to restarted both proxmox and CEPH, even if the issue was for the CEPH only.

I fully agree with that argument, but my hope is that ceph will get more stable in future.
 
Sure, such setup is only for low performance requirements. But proxmox should not loose connection if you use separate network connections - I will test that.

Yep, all traffic were going in and out through 1 NIC. And i should clarify what i meant by loosing connection. It was only the pvestatd which could not be accessed. The cluster itself was running though.


I fully agree with that argument, but my hope is that ceph will get more stable in future.
I really did not and still do not have stability issue. Its going on forever without breaking. But because there were update involved, i had to restart the node, once for CEPH and other for Proxmox. Reboot in CEPH is no issue, since even if i reboot a CEPH node whole ceph cluster keeps going. Proxmox on the other hand, either need to migrate all running VM to another node, or restart all VMs along with proxmox node.
I/O performance is the only thing adding negative points for CEPH at this moment.


Sent from my ASUS Transformer Pad TF700T using Tapatalk
 
I made some more investigation on running ceph+kvm on the same hardware, and according to ceph documentation it's a big no to run mon/osd with virtualization or other concurrent processes, specially when running on the same disks.

My take is that ceph will not work properly on limited resources and that gluster will be the way to go for such situation.

What I'm envisioning right now is a zfs based, gluster fs replicated, nfs served shared storage for openvz containers and zfs+gluster+libgfapi for KVM virtual machines.

I would setup each node to be a self-served node (mounting it's own filesystems via local gluster/nfs) and rely on gluster only for remote replication and high availability.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!