Hello everyone,
I'm currently working on a project to migrate our legacy on-premise infrastructure to a private cloud solution. I'm planning to repurpose our existing Dell R750 servers (3 nodes) without purchasing new hardware and would greatly appreciate advice from those with practical experience.
I'm currently building a test environment and running various tests with the latest Ceph Reef (18.x). Whether this project succeeds or fails, I promise to share a detailed write-up of the entire process once completed!
Thank you in advance for your advice and experiences!
I'm currently working on a project to migrate our legacy on-premise infrastructure to a private cloud solution. I'm planning to repurpose our existing Dell R750 servers (3 nodes) without purchasing new hardware and would greatly appreciate advice from those with practical experience.
Current Infrastructure
- Servers: 3 × Dell R750
- All servers are currently hosting production services
- Migration plan: Backup (P2V) → Reinstall Proxmox VE → Configure Ceph cluster → Restore VMs
- Backup solution: Planning to use a Veeam-like solution, also considering Proxmox Backup Server
Planned Architecture (Currently Testing)
- Virtualization: Proxmox VE 8.2 (Planning HA cluster)
- Storage:
- Ceph version: Reef (18.2.4, latest stable release)
- OSD disks: Samsung U.2 NVMe 7.68TB × 6 (2 per node)
- Pool configuration: Replication (size=3, min_size=2)
- Network:
- Dedicated 10G network (Considering Mikrotik CRS312-4C+8XG-RM vs other brands)
- Expected workload: Approximately 20 VMs (Linux-based DB, Web, API servers, etc.)
Questions and Concerns
- When repurposing production servers, are there any unexpected issues or important considerations during the P2V → physical server rebuilding process?
- Regarding Ceph configuration:
- With replication (3 copies), how is the recovery speed during failures and storage efficiency in your experience?
- Has anyone experienced performance bottlenecks or overhead with replicated pools even in NVMe-based environments?
- In Proxmox VE + Ceph environments:
- How much system resources (CPU, RAM) does Ceph actually consume? What's the real-world resource usage percentage in your experience?
- In HA configurations, what's the actual failover recovery time when failures occur?
- For those using Mikrotik CRS312-4C+8XG-RM:
- Could you share your experiences regarding performance, heat management, and traffic handling stability?
- Any "must-do tips" or "things I regret not doing" from those who have built similar setups?
I'm currently building a test environment and running various tests with the latest Ceph Reef (18.x). Whether this project succeeds or fails, I promise to share a detailed write-up of the entire process once completed!
Thank you in advance for your advice and experiences!