[Request for Advice] Proxmox VE + Ceph Private Cloud Setup - Seeking Input from Experienced Users!

ddoom

New Member
May 14, 2025
1
0
1
Hello everyone,


I'm currently working on a project to migrate our legacy on-premise infrastructure to a private cloud solution. I'm planning to repurpose our existing Dell R750 servers (3 nodes) without purchasing new hardware and would greatly appreciate advice from those with practical experience.


Current Infrastructure​


  • Servers: 3 × Dell R750
  • All servers are currently hosting production services
  • Migration plan: Backup (P2V) → Reinstall Proxmox VE → Configure Ceph cluster → Restore VMs
  • Backup solution: Planning to use a Veeam-like solution, also considering Proxmox Backup Server

Planned Architecture (Currently Testing)​


  • Virtualization: Proxmox VE 8.2 (Planning HA cluster)
  • Storage:
    • Ceph version: Reef (18.2.4, latest stable release)
    • OSD disks: Samsung U.2 NVMe 7.68TB × 6 (2 per node)
    • Pool configuration: Replication (size=3, min_size=2)
  • Network:
    • Dedicated 10G network (Considering Mikrotik CRS312-4C+8XG-RM vs other brands)
  • Expected workload: Approximately 20 VMs (Linux-based DB, Web, API servers, etc.)

Questions and Concerns​


  1. When repurposing production servers, are there any unexpected issues or important considerations during the P2V → physical server rebuilding process?
  2. Regarding Ceph configuration:
    • With replication (3 copies), how is the recovery speed during failures and storage efficiency in your experience?
    • Has anyone experienced performance bottlenecks or overhead with replicated pools even in NVMe-based environments?
  3. In Proxmox VE + Ceph environments:
    • How much system resources (CPU, RAM) does Ceph actually consume? What's the real-world resource usage percentage in your experience?
    • In HA configurations, what's the actual failover recovery time when failures occur?
  4. For those using Mikrotik CRS312-4C+8XG-RM:
    • Could you share your experiences regarding performance, heat management, and traffic handling stability?
  5. Any "must-do tips" or "things I regret not doing" from those who have built similar setups?

I'm currently building a test environment and running various tests with the latest Ceph Reef (18.x). Whether this project succeeds or fails, I promise to share a detailed write-up of the entire process once completed!


Thank you in advance for your advice and experiences!
 
Three nodes is the absolute minimum to work with Ceph. For me there were some more pitfalls. Be sure to know (and possibly ignore them intentionally) before you go productive: https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671

With replication (3 copies), how is the recovery speed during failures and storage efficiency in your experience?
10 GBit/s is considered the lowest acceptable speed. My Homelab used a dedicated 2.5GBit/s and it worked fine (actually faster than I expected) - but it felt slow under several conditions.

Regarding storage efficiency: you will store each bit three times. That's it.

Compare it with ZFS: if I have three nodes with a mirrored ZFS pool and replication to the two other nodes I do store each bit six times. And yes, that's what I do now.

Keep in mind that neither Ceph nor ZFS can use the full capacity - you should always stay below ~80% (rule of thumb, ymmv).

How much system resources (CPU, RAM) does Ceph actually consume?
For my (now gone) productive setup: https://forum.proxmox.com/threads/f...a-_very_-small-cluster.159671/#-problem-6-ram

In HA configurations, what's the actual failover recovery time when failures occur?
The very same as in any other configuration: expect ~two minutes before a Resource on a crashed node is starting on another node.

Any "must-do tips" or "things I regret not doing" from those who have built similar setups?
Think about and actually build a backup system, preferable by using a PBS. Automate it. Test an actual "restore". Then test it again... https://pbs.proxmox.com/docs/installation.html#recommended-server-system-requirements


Important: have fun :-)
 
Last edited: