Clustering: Any way to work round "a joining node can't have any guests"?

Pyromancer

Member
Jan 25, 2021
24
5
8
47
The question: I see from the documentation at https://pve.proxmox.com/wiki/Cluster_Manager that "All existing configuration in /etc/pve is overwritten when joining a cluster. In particular, a joining node cannot hold any guests, since guest IDs could otherwise conflict, and the node will inherit the cluster’s storage configuration."

I have a node with a very large (3.2TB) guest, and I need to add the node to our main cluster. I know I could back it up, join, and then restore it however that's going to cause a significant outage, so am wondering if doing this would work - note the guest's VIMID is unique and won't conflict with anything in the cluster.

1. Stop the guest.
2. Remove its config file from /etc/pve/qemu-server, after copying and pasting it to Notepad here on my command machine first.
3. Join the node to the cluster.
4. Recreate the config file in /etc/pve/qemu-server, which should cause it to reappear in the GUI
5. Boot the guest.

Would this work?

Full details:
We had a three node cluster, two large SuperMicro machines with 50TB of disk and over 300 gig of memory each, and a smaller blade node with 10TB of disk and 94 gig of memory.
We were upgrading the whole cluster to v7.4-16 (from 7.0-11) in preparation for moving to v8.
The blade server was upgraded with no problem, and rebooted.
Then attempted to upgrade the first of the large machines, initial upgrade went OK but when we rebooted it, the boot SSD failed, node lost.
I've now rebuilt it (with hardware raid1 boot disks, after attempting to use ZFS raidz1 boot proved impossible), and managed to re-import its main ZFS pool, tank1, which meant that we were able to get the large guest running on it by copying the VM's config file back from our backups - as well as backing up the VMs, we also back up the /etc/pve/qemu-server directories for each machine so we have the configs. Placing the config file in /etc/pve/qemu-server made the guest appear in the GUI and it then booted normally in response to the start command.

I've removed the old version of this node from the cluster using pvecm delnode and then deleting its directory structure under /etc/pve/nodes/<name>, so now I want to re-add it to the cluster, so that I can start migrating production VMs off the second large machine to it, in order to then upgrade that host to 7.4-16, ready to upgrade the whole cluster to v8 in a week or two.

The very large guest is a virtual XenServer running multiple guests of its own, hence the huge disk allocation. Long term we plan to replace these Xen VMs with new hosts built directly on Proxmox, and abolish the virtual hypervisor, but for now they are needed, and we'd prefer to avoid excessive downtime.
 
1. Stop the guest.
2. Remove its config file from /etc/pve/qemu-server, after copying and pasting it to Notepad here on my command machine first.
3. Join the node to the cluster.
4. Recreate the config file in /etc/pve/qemu-server, which should cause it to reappear in the GUI
5. Boot the guest.

Would this work?
Yes, conceptually this should work. There may be a few more things to "clean up" before the join, ie as a precaution you may want to save and clear storage.cfg, then add it back later.
Also make sure that there is no ID conflict exist now. You can also test out the procedure by installing a 3 virtualized PVEs and configuring it to be as close to production as possible, with lower sizing of course.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Yes, conceptually this should work. There may be a few more things to "clean up" before the join, ie as a precaution you may want to save and clear storage.cfg, then add it back later.
Also make sure that there is no ID conflict exist now. You can also test out the procedure by installing a 3 virtualized PVEs and configuring it to be as close to production as possible, with lower sizing of course.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Thanks for replying, based on that I went ahead and did as proposed, and it worked perfectly. I took a backup of storage.cfg as well but it proved not to be needed as both main nodes have identical layouts, with zfs pool /tank1 as the main storage for VMs. Shut down the VM, copied then removed the .conf file, added the node to the cluster, re-created the .conf file, VM immediately reappared in the GUI, started it, XenServer came up perfectly, and was able to restart all of its VMs.

Thanks for the assistance!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!