I have a 3-node Proxmox 3.4 HA cluster running since a few months and need to upgrade to Proxmox 4.0.
I'm not very happy with uprading Wheezy to Jessie in-place (bad experience), so I'd prefer to reinstall each server one by one using a clean Jessie installation (in order to sleep better afterwards..).
I do have an idea how this could be done with minimal downtime for the VMs (<10 minutes), but please have a look and tell me if there are any pitfalls...
Current situation:
- three Proxmox 3.4 HA nodes (paid subscription)
- Dell PowerEdge R730 Hardware hosted at Hetzner
- 64 GB RAM, 4 TB RAID10 HDD each
- redundant 1 Gbit LAN (192.168.1.x/24)
- GlusterFS used for shared storage (in 3x replication mode), accessed by Proxmox via localhost NFS (not native GlusterFS API)
- about ~13 VMs, all in HA mode
- no OpenVZ containers
- external Backup Storage, also accessed via NFS
- currently the system load is low and can be handled by a single physical server
Most VMs are used for internal purposes and could accept a longer downtime (max 1 day), but some are critical and downtime must be kept at a minimum.
My upgrade plan:
A-1) shut down node #1 (the first that is about to be upgraded)
A-2) remove node #1 from the Proxmox cluster (pvevm delnode "metal1")
A-3) remove node #1 from the Gluster volume/cluster (gluster volume remove-brick ... && gluster peer detach "metal1")
A-4) install Debian Jessie on node #1, overwriting all data on the HDD - with same Network settings and hostname as before
A-5) install Proxmox 4.0 on node #1
A-6) install Gluster on node #1 and add it back to the Gluster volume (gluster volume add-brick ...) => shared storage will be complete again (spanning 3.4 and 4.0 nodes)
A-7) configure the Gluster volume as shared storage in Proxmox 4 (node #1)
A-8) configure the external Backup storage on node #1 (Proxmox 4)
Then, for each VM (starting with some less-critical ones):
B-1) stop the VM in the Proxmox 3.4 cluster
B-2) backup the VM
B-3) restore the VM on the Proxmox 4.0 node (node#1)
B-4) start the VM on node#1
B-5) check if it is working correctly
IMHO this should move the VMs from one cluster to "another" and still allow LAN communication between VMs during that operation no matter where they are (4.0 VMs talking to 3.4 VMs and vice-verse).
The remaining 3.4 HA cluster (still having quorum) would be without and VMs left running now, so I can just shut down them, install Debian Jessie + Proxmox 4.0 and rebuild the cluster.
Finally, activate HA again and cross fingers.
Do you see any problem with this plan? Please note that I'll have Proxmox 3.4 HA and Proxmox 4.0 on the same subnet - could that cause any unwanted side-effects? Any issues with the subscription key being re-used?
Some tests using VirtualBox were promising, but it's hard to test without real hardware.
Thanks for any hint in advance...
I'm not very happy with uprading Wheezy to Jessie in-place (bad experience), so I'd prefer to reinstall each server one by one using a clean Jessie installation (in order to sleep better afterwards..).
I do have an idea how this could be done with minimal downtime for the VMs (<10 minutes), but please have a look and tell me if there are any pitfalls...
Current situation:
- three Proxmox 3.4 HA nodes (paid subscription)
- Dell PowerEdge R730 Hardware hosted at Hetzner
- 64 GB RAM, 4 TB RAID10 HDD each
- redundant 1 Gbit LAN (192.168.1.x/24)
- GlusterFS used for shared storage (in 3x replication mode), accessed by Proxmox via localhost NFS (not native GlusterFS API)
- about ~13 VMs, all in HA mode
- no OpenVZ containers
- external Backup Storage, also accessed via NFS
- currently the system load is low and can be handled by a single physical server
Most VMs are used for internal purposes and could accept a longer downtime (max 1 day), but some are critical and downtime must be kept at a minimum.
My upgrade plan:
A-1) shut down node #1 (the first that is about to be upgraded)
A-2) remove node #1 from the Proxmox cluster (pvevm delnode "metal1")
A-3) remove node #1 from the Gluster volume/cluster (gluster volume remove-brick ... && gluster peer detach "metal1")
A-4) install Debian Jessie on node #1, overwriting all data on the HDD - with same Network settings and hostname as before
A-5) install Proxmox 4.0 on node #1
A-6) install Gluster on node #1 and add it back to the Gluster volume (gluster volume add-brick ...) => shared storage will be complete again (spanning 3.4 and 4.0 nodes)
A-7) configure the Gluster volume as shared storage in Proxmox 4 (node #1)
A-8) configure the external Backup storage on node #1 (Proxmox 4)
Then, for each VM (starting with some less-critical ones):
B-1) stop the VM in the Proxmox 3.4 cluster
B-2) backup the VM
B-3) restore the VM on the Proxmox 4.0 node (node#1)
B-4) start the VM on node#1
B-5) check if it is working correctly
IMHO this should move the VMs from one cluster to "another" and still allow LAN communication between VMs during that operation no matter where they are (4.0 VMs talking to 3.4 VMs and vice-verse).
The remaining 3.4 HA cluster (still having quorum) would be without and VMs left running now, so I can just shut down them, install Debian Jessie + Proxmox 4.0 and rebuild the cluster.
Finally, activate HA again and cross fingers.
Do you see any problem with this plan? Please note that I'll have Proxmox 3.4 HA and Proxmox 4.0 on the same subnet - could that cause any unwanted side-effects? Any issues with the subscription key being re-used?
Some tests using VirtualBox were promising, but it's hard to test without real hardware.
Thanks for any hint in advance...
Last edited: