Proper way of migrating to new boot drive

Macsloverd

New Member
Dec 23, 2024
4
0
1
Dear All,

I have a very delicate situation and I have tried to search everywhere I can but to no avail. It now comes to the time to post a thread to ask for help, thanks in advance.

Here is my setup:
(Production server, requires minimal downtime, member of a cluster of 3)
PVE8.2.2
1 SATA HDD for PVE/boot, nothing there but the PVE itself;
8 SAS HDD in RAIDZ + 2 SATA SSD as log and cache, ZFS, for only VMs;

The problem:
The boot drive now is failing. It is still running, but ramdom things are happening. I need to replace the boot drive with a SATA SSD which is smaller than the original HDD.

1. I can't use DD or clonezilla, the original HDD refuses to read at a certain point, bad blocks;
2. I can't just copy the whole system on file level to migrate to new SSD since the PVE is in a LVM (default PVE installation);
3. I need to keep all configurations and VMs, which means no matter what I do, I need to be able to restore to current setup without losing any (cluster and VMs) data;
4. I am able to DD the boot partitions (sda1, sda2) to the new SSD and copy /etc and /var/lib/pve-cluster/config.db to a backup;

Question:
What is my best method to replace the original HDD to the new 40GB-smaller-SSD without losing any configuration? I can put in addtional drives, since there are empty slots; I can shutdown the server for a hour or so; I can do file-copys (for now) although I don't know if I can do a full copy.

I am desperate and running out of time. Please, any help would be apreciated.
 
Don't try to duplicate data from the old drive as it might already have (silent) data corruption and you are wearing it out faster.
It's easy to make mistakes in each of the following steps, so I advise you have backups of everything and I take no resposibility for my mistakes:
Make a copy of all the files in /etc/ (including /etc/pve/) for reference later (to a USB memory stick). Unfortunately, this needs to be done from a running Proxmox otherwise the /etc/pve/ is empty. These files are human readable and you can see corruption if it has happened.
Install a fresh Proxmox on the new drive (make sure not to wipe/override existing drives) and configure it like you did before. Don't copy the old files over the new files, just manually configure storage based on the previous configuration.
After that you can copy the VM/CT configuration files from /etc/pve/qemu-server/ and /etc/pve/lxc/ to restore your VMs/CTs.
There are probably other things like backup schedules and firewall configurations that you also need to (manually) configure again.
 
Don't try to duplicate data from the old drive as it might already have (silent) data corruption and you are wearing it out faster.
It's easy to make mistakes in each of the following steps, so I advise you have backups of everything and I take no resposibility for my mistakes:
Make a copy of all the files in /etc/ (including /etc/pve/) for reference later (to a USB memory stick). Unfortunately, this needs to be done from a running Proxmox otherwise the /etc/pve/ is empty. These files are human readable and you can see corruption if it has happened.
Install a fresh Proxmox on the new drive (make sure not to wipe/override existing drives) and configure it like you did before. Don't copy the old files over the new files, just manually configure storage based on the previous configuration.
After that you can copy the VM/CT configuration files from /etc/pve/qemu-server/ and /etc/pve/lxc/ to restore your VMs/CTs.
There are probably other things like backup schedules and firewall configurations that you also need to (manually) configure again.
Thank you for your quick reply!

The PVE is still running, and I have copied the entire /etc to a backup.

I can do a fresh install on the new SDD, I am only not so sure about:
1. how to "re-attach" the ZFS volume;
2. how to restore/import the VMs back to the new PVE install;
3. how to restore the "member-identity" of the cluster;
 
The PVE is still running, and I have copied the entire /etc to a backup.
Make sure to check that /etc/pve/ is there also and not empty.
I can do a fresh install on the new SDD, I am only not so sure aout:
1. how to "re-attach" the ZFS volume;
Configure it as a storage, like you did the first time after creating it.
2. how to restore/import the VMs back to the new PVE install;
Copy the configuration files (as I said before) or restore from backup.
3. how to restore the "member-identity" of the cluster;
Good question, Sorry, I did not realize you were running a cluster and running it in production.

Please forget everything I said. Migrate the VM/CTs to another node (or forget about them en restore from backup). Remove the node from the cluster. Wipe the disks. Install a fresh Proxmox (and recreate and setup any storage) and add the new node (with new name and new IP address) to the cluster. Just like you would do then a node goes dead. This is the safest and supported way to do it.
 
Make sure to check that /etc/pve/ is there also and not empty.

Configure it as a storage, like you did the first time after creating it.

Copy the configuration files (as I said before) or restore from backup.

Good question, Sorry, I did not realize you were running a cluster and running it in production.

Please forget everything I said. Migrate the VM/CTs to another node (or forget about them en restore from backup). Remove the node from the cluster. Wipe the disks. Install a fresh Proxmox (and recreate and setup any storage) and add the new node (with new name and new IP address) to the cluster. Just like you would do then a node goes dead. This is the safest and supported way to do it.
You are a greate man!

So, if I understand you correctly, I should just:
1. Migrate all VMs to another node;
2. Fresh-install a new PVE on this server;
3. Re-attache the ZFS or if not possible, just break and re-do everything;
3. Add this server to the Cluster;
4. Migrate the VMs back to this node.

Am I missing anything?
 
So, if I understand you correctly, I should just:
1. Migrate all VMs to another node;
Remove the node from the cluster (while it is still running): https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node
2. Fresh-install a new PVE on this server;
Make sure to not reuse the name and IP address.
3. Re-attache the ZFS or if not possible, just break and re-do everything;
I think the information about the storage is part of the cluster. Maybe selecting the new node for the existing storage might be enough. I'm also assuming the nodes in your cluster are similar and you don't need to do anything special for this one.
3. Add this server to the Cluster;
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_join_node_to_cluster
4. Migrate the VMs back to this node.

Am I missing anything?
Did you not do this before (as practice) for when a node dies (which will happen eventually)?
 
Remove the node from the cluster (while it is still running): https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_remove_a_cluster_node

Make sure to not reuse the name and IP address.

I think the information about the storage is part of the cluster. Maybe selecting the new node for the existing storage might be enough. I'm also assuming the nodes in your cluster are similar and you don't need to do anything special for this one.

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_join_node_to_cluster

Did you not do this before (as practice) for when a node dies (which will happen eventually)?
I am currently running these steps, finger crossed...

No, never done this before. And I don't know if this is lucky or unlucky, For working with proxmox, I have never running into a situation that requires "replacing" a node. Power down happened and migration kicked-in automatically, that's the closest situation that I have ever encountered with.

Thank you for your help! I will report the outcome closely.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!