How to replace the Proxmox OS disk, for a node in a cluster?


Active Member
Apr 3, 2018
We have a 3-node Proxmox cluster, with Ceph HA storage.

The primary OS disk (i.e. Proxmox) is one of the nodes is failing. We are hoping to replace it - but will be going from a 2.5" SSD to a M.2 SSD.

What is the safest way of doing this?

Will a 1:1 copy of the disk suffice, and we simply change boot order?

Or do we have to do a fresh install, and somehow re-integrate it back into both the Proxmox cluster, as well as the Ceph cluster?
Also - I just realised the new M.2 disk is smaller (i.e. 1TB) versus the old 2.5" disk (2TB).

So a 1:1 bit copy won't work.

Is there another way to migrate a Proxmox installation from one disk to another?

Or an easy way to reinstall Proxmox, say, and re-import all the configuration, including cluster and Ceph details?
Hey there,

From a live iso, rsync should do it, reinstall the bootloader including recreating grub.cfg, and update /etc/fstab / with the new UUID of the new disk?

OK - so boot to a live Linux DVD/CD.

And then run rsync?

How does this work though, if Proxmox is setup on an LVM drive etc?
It's entirely up to you, if you want to continue to use LVM, then you would obviously create the VG, PV, LV before hand. If you want to drop LVM, you can just format using whatever FS you want. They rsync data and make sure everything points to the new disk. Alternatively, you can also look into

Right - so I can re-create the LVM setup, then use something like pvmove.

I was thinking of doing ZFS, actually.

Is there some way of re-installing via the ProxMox installer (to get partitions setup), then somehow overwriting it with the config from the old disk? Or is that a stupid idea?
safest way is to eject the node from the cluster, reinstall proxmox, and rejoin.
Alternatively you can use partclone instead of 1:1 copy to copy each partition and copy restore any disk images from backup assuming you have any on the boot device once you've rejoined the cluster.
OK, I'm happy to reinstall Proxmox and rejoin.

I noticed Proxmox 5.4 just came out - so we could create a new USB installer for that.

This node was node #1 (i.e. the first node in the cluster) - but I assume that's not relevant now.

I can the rejoin the Proxmox cluster. How about on the Ceph side - how should I rejoin it there?
how should I rejoin it there?
IF you named the node the same, be aware that you'll have ssh key conflicts you will need to resolve manually.
IF you named the node differently, you'll need to clean up your crush map to remove mention of the old node, but then the process is the same:
# pveceph init.​
if you only have 3 nodes, you also want to create a monitor:
# pveceph createmon​
once you rejoined the cluster, perform the following on each node as root, where [nodename] is the name of the replaced node:
sed -i '/[nodename]/d' /root/.ssh/known_hosts
ssh [nodename]
(will prompt you to accept the node signature)
I was able to do this last night. Do you think this is worth documenting in the Proxmox wiki?

First part was to remove the node from Ceph.
  1. From the Web UI, I went to Ceph, then Monitor, and removed the Ceph monitor daemon on that node.
  2. I then went to Ceph, CephFS, and removed the MDS (Metadata Server) that was on that node as well.
  3. I then went to Ceph, OSD and for the four OSDs on that node, I clicked on "Out", then "Stop", then "Destroy".
Second part was to remove the node (following the steps here):

I ran pvecm nodes to list the nodes:
root@vwnode2:~# pvecm nodes

Membership information
    Nodeid      Votes Name
         1          1
         2          1 (local)
         3          1
I then shutdown the node.

There are two quirks (or maybe bugs?) here:
  1. When you run pvecm nodes again, with the node shutdown, that shutdown node doesn't appear at all - even as an "offline" mode. There doesn't appear to be any flags to show it?
  2. The value in the "Name" column is the IP address, but not the actual "name" of the node you need to pass to pvecm delnode. It's just I happened to know what it was in this case.
Does anybody know if this is expected behaviour, or if I should file bugs for these two?

After the node is shutdown, I then delete it:
root@vwnode2:~# pvecm delnode vwnode1
Killing node 1

The third part is removing the SSH key - alexskysilk@ mentioned above about removing it from /root/.ssh/known_hosts - however, in my case, this file didn't exist. Because I am running a Proxmox cluster, hte known_hosts file actually seems to be in the shared PVE cluster filesystem. Hence, on one of the remaining two nodes, I edited /etc/pve/priv/known_hosts, and removed the line for that node.

I then restarted the node, and reinstalled Proxmox from scratch.

(I did notice at this point I was getting a whole bunch of errors about invalid options in a cephpoolname_cephfs.secret file at the console. Not sure why this was, but it seemed to go away once I got Ceph running again).

I booted it up, configured the network interfaces and static IPs as before (including for my Ceph interfaces), and was able to rejoin the cluster.

For Ceph, it was simply running:
pveceph install
pveceph init
I then re-created the Ceph monitor daemon and CephFS MDS via the Web UI.

For the OSDs, I had to do this manually via the command-line, as I wanted four OSDs per disk (which isn't possible via the Web UI - maybe that's a feature request).

ceph-volume lvm zap --destroy /dev/nvme0n1
ceph-volume lvm batch --osds-per-device 4 /dev/nvme0n1


The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!