Need tips for performance improvements in cluster

tores

Member
Nov 12, 2014
8
0
21
Hi folks!
I want to start by apologizing for what might be a long and confusing post, but here goes:

We currently have a cluster with 3 frontends running Proxmox, using a shared Nexenta SAN for storage. The storage server is connected to our switches using 10Gb SFP and the frontends using 2x1Gb, 1 for internet and one for local network to the storage server.
It was a foolish idea to only have one storage server, but that decision was made before I started in the company so I sadly had no control over that, which brings me shortly to our issues...
Lately the storage server has become somewhat unstable, randomly rebooting a few times. That of course leaves all of the VMs running on the frontends to kernel panic and stall, with a reset from Proxmox required once the storage backend is up and running again. The reason for this is still not determined, but we have now started looking into our options.

The frontends themselves are not too old, so it is not an option to buy new ones yet. What we have thought of is to fill them up with disks (each frontend has 6 empty disk slots) and run CEPH or something like that for redundancy. This of course means buying disks and 10Gb NIC, but we still have many years left of use for these frontends. Does this sound like a good option? Or would any of you have done something else? We then get local storage instead, also reducing I/O to the Nexenta-platform which we also believe is working as a bottleneck currently for some of the VMs running.

If this is deemed the best solution, what is then the simplest way of migrating the storage from iscsi on the shared Nexenta now, on to CEPH (or whatever we choose)? When we first migrated all of the servers from our old platform to Proxmox/Nexenta, we just rsync'd all the data from each VM to the corresponding newly created VM on Proxmox, but I guess there has to be a simpler option?

Thanks in advance for all hints, clues and I'm sorry again for the long wall of text!
And also, thanks for an amazing product in Proxmox and what seems like a very helpful community (have been lurking and reading a while).

Regards,
Tore
 
Hi,

yes you can use Ceph.
But yo have also keep in mind, you need the required resources (CPU, Memory, Network, Disks) on the PVE nodes.

General on small ceph clusters like 3 nodes you should use SSD only.
Also you need 10GBit for ceph only.
see also
http://docs.ceph.com/docs/luminous/start/hardware-recommendations/

Migration can be done online with "move disk"