I ran a Proxmox cluster for a while a couple of years back and was reasonably happy running on ZFS as local storage for VMs, but wanted native support for Veeam so I moved to Hyper-V. I'm not nearly as happy with Hyper-V, and am at the point where I'll either install reinstall my VMs on bare hardware, or move back to Proxmox. I'm liking what I'm seeing with CEPH nowadays, and so I thought I'd reach out and see what y'all think.
My overall goal is to minimize downtime for the primary server I host, which is an online forum (recently migrated from Hyper-V to bare metal). Right now we're running at about 75% of our normal load, and I'm seeing about 125 transactions per second on that server. Other VMs I'd like to run are one that just hosts a few low demand Wordpress sites, a 2016 Server VM that really just handles authentication, and a Linux VM to perform monitoring of the VMs. Basically, I don't need a ton of IOPS from the CEPH cluster, and I'm really more interested in failover should I be out of town and hardware fails.
My cluster right now is:
So, my questions:
My overall goal is to minimize downtime for the primary server I host, which is an online forum (recently migrated from Hyper-V to bare metal). Right now we're running at about 75% of our normal load, and I'm seeing about 125 transactions per second on that server. Other VMs I'd like to run are one that just hosts a few low demand Wordpress sites, a 2016 Server VM that really just handles authentication, and a Linux VM to perform monitoring of the VMs. Basically, I don't need a ton of IOPS from the CEPH cluster, and I'm really more interested in failover should I be out of town and hardware fails.
My cluster right now is:
- 3 Dell T30's with low-end Xeon processors, 64G RAM in each, 2 10G network ports in each
- 1 Dell T30 with a non-Xeon processor and one 10G network port, also with 64G RAM
- Redundant 10G switches
- A single gigabit switch.
So, my questions:
- I know my hardware is far from powerful, but I believe it's sufficient. Or at least, it's proven sufficient (overpowered) so far using Hyper-V with iSCSI mounts. Is there any reason to think the CEPH/Proxmox combo won't offer comparable performance? Like I said - low demand on the storage subsystem most of the time - right now (again, 75% of my normal users on) I'm seeing less than 30 IOPS on the iSCSI device (peaked at 62/s in the last 10 minutes) for everything other than my forum, which is only showing an additional 125 tps right now.
- Are datacenter SSDs a necessity for my use case, or can I get by with something cheaper?
- What's the advisability of just using the 10G network for all communications - Proxmox-to-Proxmox communication, storage, and a VLAN for the DMZ machines? Does using the 10G network for just storage make the most sense? (I ask because if I can do it all on the 10G network I can essentially remove the gigabit switch as a single point of failure should I choose to add a second firewall later on, and firmware updates on the switches will be easier as the active/passive 10G links should keep everything running through switch reboots...)
- Is there going to be a problem mixing CPU types in the cluster? That non-Xeon is there not as a migration target but as an additional CEPH host, and a host to restart a VM on if one of the other machines should fail. I don't see an issue, but I don't know if there are issues I'm not considering here.
- Is a 2 hard drive ZFS mirror going to be fine for the Proxmox host machines themselves? I'd think this would be fine, but again - y'all are the experts here.
Last edited: