-------------- UPDATE --------------
Summary:
If anyone comes to this problem, please follow @leesteken 's method. To make it as step-by-step:
1. my VMs on the failing node are linked clones, so first do a backup of the VMs and recover them one by one. This will make them "standalone";
2. migrate all the VMs to another node;
3. remove all backups, shared storage, repication job, etc. which are related to this failing node from the cluster;
4. remove the failing node from the cluster;
6. since I was doing this through iDRAC, I have no way to physically remove the failing HDD or Disable it (HBA controller is not from Dell), I have to boot into a live CD first to delete all partitions from the the failing HDD to prevent unexpected mounting;
7. boot from Proxmox installation ISO and do a fresh install with a new host name (!important);
8. after successful installation of Proxmox, the first boot detected the ZFS pool but unable to mount it;
9. in the web GUI of Proxmox, the ZFS pool cannot be shown in "Disks->ZFS";
10. since I have migrated all VMs, I just lost interest to recover the pool. I wiped all 8 SAS HDD and created a new pool with the ORIGINAL NAME;
11. copy the "Join Information" and paste it in the freshly installed node;
12. configure the replication job, backup, etc. back to the original;
13. migrate back the VMs;
14. Done!
And finally, thanks to @leesteken 's help, I could solve this situation with a lot less trouble.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Dear All,
I have a very delicate situation and I have tried to search everywhere I can but to no avail. It now comes to the time to post a thread to ask for help, thanks in advance.
Here is my setup:
(Production server, requires minimal downtime, member of a cluster of 3)
PVE8.2.2
1 SATA HDD for PVE/boot, nothing there but the PVE itself;
8 SAS HDD in RAIDZ + 2 SATA SSD as log and cache, ZFS, for only VMs;
The problem:
The boot drive now is failing. It is still running, but ramdom things are happening. I need to replace the boot drive with a SATA SSD which is smaller than the original HDD.
1. I can't use DD or clonezilla, the original HDD refuses to read at a certain point, bad blocks;
2. I can't just copy the whole system on file level to migrate to new SSD since the PVE is in a LVM (default PVE installation);
3. I need to keep all configurations and VMs, which means no matter what I do, I need to be able to restore to current setup without losing any (cluster and VMs) data;
4. I am able to DD the boot partitions (sda1, sda2) to the new SSD and copy /etc and /var/lib/pve-cluster/config.db to a backup;
Question:
What is my best method to replace the original HDD to the new 40GB-smaller-SSD without losing any configuration? I can put in addtional drives, since there are empty slots; I can shutdown the server for a hour or so; I can do file-copys (for now) although I don't know if I can do a full copy.
I am desperate and running out of time. Please, any help would be apreciated.
Summary:
If anyone comes to this problem, please follow @leesteken 's method. To make it as step-by-step:
1. my VMs on the failing node are linked clones, so first do a backup of the VMs and recover them one by one. This will make them "standalone";
2. migrate all the VMs to another node;
3. remove all backups, shared storage, repication job, etc. which are related to this failing node from the cluster;
4. remove the failing node from the cluster;
6. since I was doing this through iDRAC, I have no way to physically remove the failing HDD or Disable it (HBA controller is not from Dell), I have to boot into a live CD first to delete all partitions from the the failing HDD to prevent unexpected mounting;
7. boot from Proxmox installation ISO and do a fresh install with a new host name (!important);
8. after successful installation of Proxmox, the first boot detected the ZFS pool but unable to mount it;
9. in the web GUI of Proxmox, the ZFS pool cannot be shown in "Disks->ZFS";
10. since I have migrated all VMs, I just lost interest to recover the pool. I wiped all 8 SAS HDD and created a new pool with the ORIGINAL NAME;
11. copy the "Join Information" and paste it in the freshly installed node;
12. configure the replication job, backup, etc. back to the original;
13. migrate back the VMs;
14. Done!
And finally, thanks to @leesteken 's help, I could solve this situation with a lot less trouble.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Dear All,
I have a very delicate situation and I have tried to search everywhere I can but to no avail. It now comes to the time to post a thread to ask for help, thanks in advance.
Here is my setup:
(Production server, requires minimal downtime, member of a cluster of 3)
PVE8.2.2
1 SATA HDD for PVE/boot, nothing there but the PVE itself;
8 SAS HDD in RAIDZ + 2 SATA SSD as log and cache, ZFS, for only VMs;
The problem:
The boot drive now is failing. It is still running, but ramdom things are happening. I need to replace the boot drive with a SATA SSD which is smaller than the original HDD.
1. I can't use DD or clonezilla, the original HDD refuses to read at a certain point, bad blocks;
2. I can't just copy the whole system on file level to migrate to new SSD since the PVE is in a LVM (default PVE installation);
3. I need to keep all configurations and VMs, which means no matter what I do, I need to be able to restore to current setup without losing any (cluster and VMs) data;
4. I am able to DD the boot partitions (sda1, sda2) to the new SSD and copy /etc and /var/lib/pve-cluster/config.db to a backup;
Question:
What is my best method to replace the original HDD to the new 40GB-smaller-SSD without losing any configuration? I can put in addtional drives, since there are empty slots; I can shutdown the server for a hour or so; I can do file-copys (for now) although I don't know if I can do a full copy.
I am desperate and running out of time. Please, any help would be apreciated.
Last edited: