Migrating my entire cluster to new hardware

cybertinus

New Member
Jan 2, 2024
8
1
3
The Netherlands
Hello all,

Currently I'm running my Proxmox cluster with 3 nodes. The first node has 2 Intel Xeon Silver 4214 CPU's in it, and the other two nodes have 2 Intel Xeon Silver 4210 CPU's. Each host has 4 Samsung PM893's 1TB (S-ATA SSDs) for the Ceph cluster in them.
In other words, this is all old crap to run in production :D. So, I bought some new servers. The new servers have 1 AMD Epyc 9355P CPU in it, and 5 Samsung PM9A3's 2TB (PCIe 4th gen NVme's). 4 of them will be used for Ceph, the 5th will be my backup fleecing device.

My plan for the migration is as follows:
  1. Install the new servers with Proxmox 9
  2. Add those new servers to my existing Proxmox cluster
  3. Create OSDs for the 4 PM9A3's and add them to the normal Ceph storage pool.
  4. Remove the old OSDs from the pool, one by one, so all the data will be migrated from the old SSDs to the new NVMe's
  5. Add Ceph monitors, managers and MDSses on the new cluster nodes
  6. Remove the Ceph monitors, managers and MDSses from the old cluster nodes
  7. Turn the VMs off and migrate them to the new hypervisor (I use 'host' as CPU for the VMs, so I need to power them off when I migrate them from Intel to AMD CPUs)
  8. Start the VMs again on the new cluster nodes
  9. Remove the old nodes from the cluster
  10. Physically remove the old nodes from the rack
I will enable the 'norebalance' flag when adding the new OSDs, so Ceph won't go moving the PGs at the moment a NVMe OSD comes online, it will only start doing that when all the NVMes are added.

Is this a decent plan? Can I make optimizations? Can I speed up the Ceph migrations without having 13 re-balance operations (1 time when all the NVMe are added, 12 times for each removal of an old SSD)? Shoot! :)
 
I'm not an expert but i would rather make a new cluster with the new machines and offline migrate the VM's from the old cluster to the new one with PDM.
Or power off VM, create vzdump backup and restore that on the new cluster.