Proxmox hardware compatibility - Dell servers and storage

RogerFinch73 · Aug 7, 2024

Hi

I may buy a few PowerEdge 450 servers, with Broadcom 57414 Dual Port 10/25GbE SFP28, OCP NIC 3.0 networking, 512Gb ram, and 1x Intel® Xeon® Gold 5318Y 2.1G, 24C/48T, 11.2GT/s, 36M Cache, Turbo, HT (165W) DDR4-2933 linking them to a PowerVault ME5 storage array with dual controllers 25GbE SFP28 and a bunch of disks

I can only find the minimum compatibility list online for ProxMox - will it support this latest kit?

is anyone running this type of setup? or am I force to go to VMware?

thanks
Roger

bbgeek17 · Aug 7, 2024

Hi @RogerFinch73 , Proxmox uses Debian userland and Ubuntu Kernel. You should check related resources for hardware compatibility.

I suspect you will be fine.

Good luck

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Falk R. · Aug 7, 2024

RogerFinch73 said:
Hi

I may buy a few PowerEdge 450 servers, with Broadcom 57414 Dual Port 10/25GbE SFP28, OCP NIC 3.0 networking, 512Gb ram, and 1x Intel® Xeon® Gold 5318Y 2.1G, 24C/48T, 11.2GT/s, 36M Cache, Turbo, HT (165W) DDR4-2933 linking them to a PowerVault ME5 storage array with dual controllers 25GbE SFP28 and a bunch of disks

I can only find the minimum compatibility list online for ProxMox - will it support this latest kit?

is anyone running this type of setup? or am I force to go to VMware?

thanks
Roger

Hi,

I have a few customers with similar setups.
Most of them had previously used this hardware with vSphere. For new clusters I generally configure local disk with ZFS replica for very small setups or Ceph from 3 nodes.

If the hardware has already been procured, you can of course also use this, but the iSCSI configuration is not as user-friendly as with VMware, as this setup is rather unusual.

The DELL servers generally run stress-free and we have always installed the same network cards. If the hardware has not yet been purchased, it is better to switch from the ME to internal NVMe's.

guruevi · Aug 7, 2024

I actually just inherited a PowerVault ME5, I would not recommend buying that for anything, it is a proprietary RAID controller, so the data, disks, controller etc cannot just be swapped over to something else in the future, you need to replace a disk, expect to pay 4x market value of the physical drive for the replacement. I can't even get the expansion to work as a generic SAS JBOD, it doesn't provide SMART statistics on individual drives without going through its interface etc.

The Dell kit we have (we have PowerEdge 650 through 750 and everything in between) works flawlessly, I like it better than the SuperMicro stuff because of iDRAC, better overall sensors, great management tools, great sales team.

I would say if you need a shared storage fabric, use NVMe as others say with Ceph, they have models with close to a PB of storage capacity per node, if you need large slow storage and the 12 3.5" isn't enough, get their generic SAS JBOD with spinning disks and use Ceph (or ZFS if you don't need a shared fabric). Saves you a ton of money and avoids lock-in. The Dell kit is overall nice, just don't get locked into it. You want to be able to go SuperMicro or HPE in the future if nothing else to put pressure on getting a better deal.

RogerFinch73 · Aug 8, 2024

guruevi said:
I actually just inherited a PowerVault ME5, I would not recommend buying that for anything, it is a proprietary RAID controller, so the data, disks, controller etc cannot just be swapped over to something else in the future, you need to replace a disk, expect to pay 4x market value of the physical drive for the replacement. I can't even get the expansion to work as a generic SAS JBOD, it doesn't provide SMART statistics on individual drives without going through its interface etc.

The Dell kit we have (we have PowerEdge 650 through 750 and everything in between) works flawlessly, I like it better than the SuperMicro stuff because of iDRAC, better overall sensors, great management tools, great sales team.

I would say if you need a shared storage fabric, use NVMe as others say with Ceph, they have models with close to a PB of storage capacity per node, if you need large slow storage and the 12 3.5" isn't enough, get their generic SAS JBOD with spinning disks and use Ceph (or ZFS if you don't need a shared fabric). Saves you a ton of money and avoids lock-in. The Dell kit is overall nice, just don't get locked into it. You want to be able to go SuperMicro or HPE in the future if nothing else to put pressure on getting a better deal.

so a bunch of PowerEdge servers with fast disks and Ceph should do? I'll just need a 25Gb NIC and switch to link them all together. The PowerVault ME5 wouldn't be needed then - would save me a lot of money.

I am replacing two Dell VxRail clusters with vSAN, and 25Gb networking - might be able to reuse the hardware, but they may be locked in with BIOS etc to VMware.. hard to know without breaking it to see.

RogerFinch73 · Aug 8, 2024

Falk R. said:
Hi,

I have a few customers with similar setups.
Most of them had previously used this hardware with vSphere. For new clusters I generally configure local disk with ZFS replica for very small setups or Ceph from 3 nodes.

If the hardware has already been procured, you can of course also use this, but the iSCSI configuration is not as user-friendly as with VMware, as this setup is rather unusual.

The DELL servers generally run stress-free and we have always installed the same network cards. If the hardware has not yet been purchased, it is better to switch from the ME to internal NVMe's.

great, I'll have a look a getting more storage in the servers then, and avoid the ME5 altogether if I can run 3 nodes or 4 nodes with Ceph instead.

will need to investigate Ceph a bit more.

anyone have experience of using it in production with 3 or 4 nodes? (are upgrades straightforward to perform on ProxMox as a result?)

guruevi · Aug 8, 2024

I have a new cluster with 5 and an older cluster that grew from 3 to 12 nodes. Proxmox and Ceph is dead easy to manage, the default is to replicate the data between 3 nodes although if you have 5 nodes or more you could change that with erasure coding. Get the subscription for the enterprise repo. There is now even a VMware conversion program.

The Dell 100G switches are actually pretty cheap as is an 100G card and the Dell R760 came standard with 25/40G networking in the OCP slot. Talk to your sales person.

RogerFinch73 · Aug 8, 2024

guruevi said:
I have a new cluster with 5 and an older cluster that grew from 3 to 12 nodes. Proxmox and Ceph is dead easy to manage, the default is to replicate the data between 3 nodes although if you have 5 nodes or more you could change that with erasure coding. Get the subscription for the enterprise repo. There is now even a VMware conversion program.

The Dell 100G switches are actually pretty cheap as is an 100G card and the Dell R760 came standard with 25/40G networking in the OCP slot. Talk to your sales person.

I shall take a look at those ones, cheers.

Ceph - do the disks need to be in a RAID set first, or does that manage it all?

I need to start reading...

guruevi · Aug 8, 2024

No, the best way is to give it raw disks and let it handle the replication. So each of my nodes has 12 NVMe 2TB disks, you just hand them to Proxmox as-is and get ~8TB of usable space per node. When a single drive goes down or an entire node (and large clusters can handle domains configured by entire rack, entire pdu etc as well) then you are guaranteed the data is still on at least 2 other disks and will then rebuild that instantly.

RogerFinch73 · Aug 8, 2024

guruevi said:
No, the best way is to give it raw disks and let it handle the replication. So each of my nodes has 12 NVMe 2TB disks, you just hand them to Proxmox as-is and get ~8TB of usable space per node. When a single drive goes down or an entire node (and large clusters can handle domains configured by entire rack, entire pdu etc as well) then you are guaranteed the data is still on at least 2 other disks and will then rebuild that instantly.

cool. OK. our vSAN was setup by a 3rd party so never saw that bit done. we have 8x1.75Tb SSDs on one cluster and 8x3.6Tb Hybrid drives on the other cluster. 70Tb and 145Tb free space.

I'll need ~50Tb on the two new ProxMox ones I think. I'll need to do the math. each vm has a disk on 2 nodes and a witness on a 3rd, in case of failure. looks like Ceph does 3 copies regardless, so I'll need to size it accordingly.

thanks for the advice.

guruevi · Aug 8, 2024

Yes, there are other models using erasure coding (similar to RAID6) but you should always have copies on 3 (getting your data in case of 2 simultaneous node failures) and it is a good idea to have the "RAID" across nodes instead of across disks. So you could do k=3,m=2 and get a RAID6-like with 5 nodes where 3 nodes handle data and 2 handle the erasure code, but obviously now you need to access "at least" 3 nodes instead of 1 to get your data, driving up latency and CPU usage (just like regular RAID, same problem with disk latency, even on NVMe can become noticeable) hence why mirroring (3-way mirror like Ceph) is the gold standard even on other hypervisor platforms.

All depends on your budget, performance demands etc, storage is relatively cheap in comparison with the cost of downtime in most cases, my cluster has 100% uptime for over 5 years now despite every few months updating and rebooting every node, upgrading, removing, repairing nodes etc, I just found a VM that has reached 1200 days of uptime (we do updates, but we also have kpatch, so it never needs rebooting).

Proxmox supports other shared storage like SMB, NFS, iSCSI (if you want to use the old VMware storage still for some data stores) but everything has its pros and cons (latency, bandwidth, failure modes, redundancy, backup), that is entirely up to you how you decide to go about this. You could even do ZFS or LVM on each node and then replicate snapshots of the VMs automatically on a 15m basis, provided you can potentially lose 15m of data, for some people that is an option and ZFS/LVM storage with spinning disks.

Ceph focuses on high throughput and data reliability with costs relatively higher because it ideally has at most 12-24 disks per node (we have 8 or 12 depending on when it was purchased), so use it with NVMe storage and you get to scale to potential Terabit throughputs with just a handful of nodes. Then we replicate that to an offsite Proxmox Backup Server with spinning disks, which allows for continuous backups and live restore.

RogerFinch73 · Aug 8, 2024

guruevi said:
Yes, there are other models using erasure coding (similar to RAID6) but you should always have copies on 3 (getting your data in case of 2 simultaneous node failures) and it is a good idea to have the "RAID" across nodes instead of across disks. So you could do k=3,m=2 and get a RAID6-like with 5 nodes where 3 nodes handle data and 2 handle the erasure code, but obviously now you need to access "at least" 3 nodes instead of 1 to get your data, driving up latency and CPU usage (just like regular RAID, same problem with disk latency, even on NVMe can become noticeable) hence why mirroring (3-way mirror like Ceph) is the gold standard even on other hypervisor platforms.

All depends on your budget, performance demands etc, storage is relatively cheap in comparison with the cost of downtime in most cases, my cluster has 100% uptime for over 5 years now despite every few months updating and rebooting every node, upgrading, removing, repairing nodes etc, I just found a VM that has reached 1200 days of uptime (we do updates, but we also have kpatch, so it never needs rebooting).

Proxmox supports other shared storage like SMB, NFS, iSCSI (if you want to use the old VMware storage still for some data stores) but everything has its pros and cons (latency, bandwidth, failure modes, redundancy, backup), that is entirely up to you how you decide to go about this. You could even do ZFS or LVM on each node and then replicate snapshots of the VMs automatically on a 15m basis, provided you can potentially lose 15m of data, for some people that is an option and ZFS/LVM storage with spinning disks.

Ceph focuses on high throughput and data reliability with costs relatively higher because it ideally has at most 12-24 disks per node (we have 8 or 12 depending on when it was purchased), so use it with NVMe storage and you get to scale to potential Terabit throughputs with just a handful of nodes. Then we replicate that to an offsite Proxmox Backup Server with spinning disks, which allows for continuous backups and live restore.

our current servers have 4x 10Gb connections, 2 to 2 switches and they are connected at 100Gb together.

what switchgear are you running?

I was thinking of 2x 25Gb per node now, 1 per switch - both switches connected on the backplane. we have Dell currently, but the rest of the campus is Meraki - which offers great visibility on the traffic and has the HA connection on the back, rather than use up more ports on the front.

thanks

guruevi · Aug 8, 2024

The standard for new server is a 25/40G network card, they will run at 10, but they can do more.

We use a pair of 100G of these: https://www.dell.com/en-us/shop/ipovw/networking-s-series-25-100gbe for the backend (Ceph and cluster) network with an Intel X8xx series, they are connected in a single plane with 400G between the two switches, each server has LACP between the two switches, then it goes to dual 10/25/40G (Broadcom or Intel OCP NIC) Juniper or Cisco at the datacenter level on the front-end (the customer facing network) also using LACP to A/B sides.

Reason being is that our frontend network is very noisy and busy with hundreds of neighbors and VMs talking across multiple subnets/VLANs, Ceph cluster CAN work on that, but we saw the need to reserve a ToR 100G switch with the PowerSwitches because of the mixed networks, the inherent latency due to the datacenter design, the fact we are doing HPC with GPU etc.

The PowerSwitch is a Linux-based, ONF/OVS-compatible switch, so it integrates as well in Proxmox's SDN, although that is for most people a bit of overkill, I only use it to create VLAN for things like Kubernetes clusters (so you define a network in the Proxmox GUI, the network interface in the Proxmox GUI to the VM and state it is connected to a specific network and it will automatically set up the VLAN on the physical node - https://pve.proxmox.com/pve-docs/chapter-pvesdn.html

RogerFinch73 · Aug 8, 2024

guruevi said:
The standard for new server is a 25/40G network card, they will run at 10, but they can do more.

We use a pair of 100G of these: https://www.dell.com/en-us/shop/ipovw/networking-s-series-25-100gbe for the backend (Ceph and cluster) network with an Intel X8xx series, they are connected in a single plane with 400G between the two switches, each server has LACP between the two switches, then it goes to dual 10/25/40G (Broadcom or Intel OCP NIC) Juniper or Cisco at the datacenter level on the front-end (the customer facing network) also using LACP to A/B sides.

Reason being is that our frontend network is very noisy and busy with hundreds of neighbors and VMs talking across multiple subnets/VLANs, Ceph cluster CAN work on that, but we saw the need to reserve a ToR 100G switch with the PowerSwitches. The PowerSwitch is a Linux-based, ONF/OVS-compatible switch, so it integrates as well in Proxmox's SDN, although that is for most people a bit of overkill, I only use it to create VLAN for things like Kubernetes clusters (so you add the network interface in the Proxmox GUI and state it is connected to a specific network and it will automatically set up the VLAN on the physical node which can have its own (software-defined) DHCP servers etc).

cool. makes perfect sense. our current VMware is like that - different switches for storage chat / vmotion etc. with LACP links to join them together. VMs on different NICs and front facing switches

thanks for the confirmation on all this. I was a bit worried it wouldn't work, but you have nailed it, and are much larger than we could get - most of our VM workload is off to cloud so we've dropped down to ~60 vms now, with more to go.

bye bye VMware

Falk R. · Aug 9, 2024

RogerFinch73 said:
great, I'll have a look a getting more storage in the servers then, and avoid the ME5 altogether if I can run 3 nodes or 4 nodes with Ceph instead.

will need to investigate Ceph a bit more.

anyone have experience of using it in production with 3 or 4 nodes? (are upgrades straightforward to perform on ProxMox as a result?)

I have several installations with Ceph at customers and managing the storage is much easier and more flexible.
You also have to pay less attention to updates and upgrades than with external storage solutions.

The only problem with Ceph is that you need to understand the technology for proper sizing and many people underestimate the network.
For example, I also do all 3 node clusters with 100 GBit these days.

Search

Search

Proxmox hardware compatibility - Dell servers and storage

RogerFinch73

New Member

bbgeek17

Distinguished Member

Falk R.

Distinguished Member

guruevi

Active Member

RogerFinch73

New Member

RogerFinch73

New Member

guruevi

Active Member

RogerFinch73

New Member

guruevi

Active Member

RogerFinch73

New Member

guruevi

Active Member

RogerFinch73

New Member

guruevi

Active Member

RogerFinch73

New Member

Falk R.

Distinguished Member