Hello,
I'll prefix by saying that I'm a half moron even trying to attempt stuff like this.
With that out of the way, I'll explain a bit of circumstance:
I recently moved home, and I still have a NUC Proxmox 7.4 node `tinypve01` in my old house (grandma's) and I have a new server at my new house.
These two are bridged with L2 over IP with two Cisco 2901s with xconnect, and it works fine, latency is not through the roof, and the networks goes through well...
Now my idea was to shut down node `pve01` from my cluster, bring it to the new home, and fire it up connected to the xconnect bridged interface and start migrating the VMs for zero downtime 80km move of all my infrastructure, since everything (firewall included) is virtualized, and the routers are nothing but a layer 2 bridge, with just one of them having the actual IP (already the new one @ new home).
The problem is that when I've gone for booting up my old node, some shit happened with debian and it went into initram preboot (busybox), and I have literally no idea on how to fix it. I tried fsck from an archiso bootable usb but nothing, fsck would succeed, I would be able to see all the files from the archiso, but booting it was a no go.
I decided to screw it and reinstall, backed up all my VMs under /var/lib/vz to an external USB, backed up the configs from /etc/pve from tinypve01, and deleted the folder pve01 from /etc/nodes from the last node (had to quorate via pvecm expected 1) and tried to re-join the node.
First mistake: I used the same hostname and IP address of the node I just deleted, pve01, ip 10.60.0.201 - no bueno, it tried to join, but it was stuck after "waiting for quorum - OK", nothing more happened, and I could see from the console that there was a stuck task and a call trace
Second mistake: I tried to join it as a new v8 node, while my cluster was almost entirely v7 with the new addition of pve02 as v8 a couple of days before moving.
So after not being able to join because of quorum or because of the IP or the same hostname or the version (which means, after trying and reinstalling a fair couple of times)...
I decided to play things safe, follow the procedure detailed in "detaching a node without reinstalling" PVE guide except the last bits, I removed the cluster completely from the still working tinypve01, reinstalled pve01 completely, this time 7.4 (same version as tinypve01), renaming it pve03, giving it the different IP address 10.60.0.203 and tried to join a "new cluster" I just created from the last node up in the old house... aand it still doesn't join... it still gets stuck at the line "waiting for quorum - OK", wich judging by `pvecm status` from tinypve01 seems ok, but the new node is stuck, and still spits out this "call trace" for this stuck task on console out.
If it wasn't clear, after trying to even nuke and rebuild the cluster from scratch, I'm at a loss.
Yes I know I'm a f*ing idiot for trying shit like this.
But how else you migrate 6 VMs over 80km with zero downtime?
(yes, I also know that the total combined time of me failing various stuff here and there is probably more that the 1h~ish it takes to drive from point A to point B and I could just shut down tinypve01 and bring it to the new home by this point...)
But I wanna also be able to create a cluster, you know... lol
Any and all advice, even the stupidest, even calling me names (I deserve it), is highly appreciated.
Please don't say "just f* yourself, drive that 1 stupid hour and go get your old node", I already know and will probably do it at one point or another
Thanks in advance,
Bryan.
I'll prefix by saying that I'm a half moron even trying to attempt stuff like this.
With that out of the way, I'll explain a bit of circumstance:
I recently moved home, and I still have a NUC Proxmox 7.4 node `tinypve01` in my old house (grandma's) and I have a new server at my new house.
These two are bridged with L2 over IP with two Cisco 2901s with xconnect, and it works fine, latency is not through the roof, and the networks goes through well...
Now my idea was to shut down node `pve01` from my cluster, bring it to the new home, and fire it up connected to the xconnect bridged interface and start migrating the VMs for zero downtime 80km move of all my infrastructure, since everything (firewall included) is virtualized, and the routers are nothing but a layer 2 bridge, with just one of them having the actual IP (already the new one @ new home).
The problem is that when I've gone for booting up my old node, some shit happened with debian and it went into initram preboot (busybox), and I have literally no idea on how to fix it. I tried fsck from an archiso bootable usb but nothing, fsck would succeed, I would be able to see all the files from the archiso, but booting it was a no go.
I decided to screw it and reinstall, backed up all my VMs under /var/lib/vz to an external USB, backed up the configs from /etc/pve from tinypve01, and deleted the folder pve01 from /etc/nodes from the last node (had to quorate via pvecm expected 1) and tried to re-join the node.
First mistake: I used the same hostname and IP address of the node I just deleted, pve01, ip 10.60.0.201 - no bueno, it tried to join, but it was stuck after "waiting for quorum - OK", nothing more happened, and I could see from the console that there was a stuck task and a call trace
Second mistake: I tried to join it as a new v8 node, while my cluster was almost entirely v7 with the new addition of pve02 as v8 a couple of days before moving.
So after not being able to join because of quorum or because of the IP or the same hostname or the version (which means, after trying and reinstalling a fair couple of times)...
I decided to play things safe, follow the procedure detailed in "detaching a node without reinstalling" PVE guide except the last bits, I removed the cluster completely from the still working tinypve01, reinstalled pve01 completely, this time 7.4 (same version as tinypve01), renaming it pve03, giving it the different IP address 10.60.0.203 and tried to join a "new cluster" I just created from the last node up in the old house... aand it still doesn't join... it still gets stuck at the line "waiting for quorum - OK", wich judging by `pvecm status` from tinypve01 seems ok, but the new node is stuck, and still spits out this "call trace" for this stuck task on console out.
If it wasn't clear, after trying to even nuke and rebuild the cluster from scratch, I'm at a loss.
Yes I know I'm a f*ing idiot for trying shit like this.
But how else you migrate 6 VMs over 80km with zero downtime?
(yes, I also know that the total combined time of me failing various stuff here and there is probably more that the 1h~ish it takes to drive from point A to point B and I could just shut down tinypve01 and bring it to the new home by this point...)
But I wanna also be able to create a cluster, you know... lol
Any and all advice, even the stupidest, even calling me names (I deserve it), is highly appreciated.
Please don't say "just f* yourself, drive that 1 stupid hour and go get your old node", I already know and will probably do it at one point or another
Thanks in advance,
Bryan.
Last edited: