Moving pve cluster to a new switch

brucexx

Renowned Member
Mar 19, 2015
223
7
83
I have a cluster in production that I need to move to a new switch. We are using MCLAG redundant interfaces and need to move 6 nodes from on switch cluster to the other. That should not take long but I was wondering if I should execute on each node pvecm expected -1 before I do that, any advice ?

Also once executed pvecm expected -1 , how to disable/negate that command so the node does not expect to be -1.

Thank you
 
pvecm expected -1
That's a typo, right? There is no way to go for "-1", as far as I know.

What I do know: you can't lower that value as long as all nodes are active. You would get
Code:
~# pvecm expected 3
Unable to set expected votes: CS_ERR_INVALID_PARAM

Perhaps the "maintenance mode" is what you are looking for - just a hint, I am not really sure how that one works ;-)
 
Network wise, this is a simple as interconnecting both switchs stacks so devices on the old MLAG switches see those moved to the new MLAG swich using the same VLAN's. Things may get slightly complicated if using Ceph, as you will have to keep in mind Ceph's quorum too, but that can be sorted out just by moving nodes one by one and wating for Ceph to resync.

If you force quorum at both "sides" of the same cluster by using pvecm expected, you are forcing a split brain and when both "sides" rejoin you'll potentially have to deal with conflicts in pmxcfs that would have to be solved manually. pvecm expected should be used only in one side of the cluster while the other side is fully powered off and then wait for such side to come back eventually and rejoin the cluster.
 
  • Like
Reactions: UdoB
yes, but I have 3 networks and separate interface for:

1. PVE cluster - keeping PVE quorum
2. Ceph public - facing the Proxmox hypervisors
3. Ceph privet - syncking storage accros Ceph servers

I am running ceph on separate servers but ceph is installed on Proxmox that is working just to manage it.

Which one is used to keep ceph cluster running ?

I will be moving only the PVE cluster network , the rest Ceph public and ceph private is staying on a different 10Gb switch.

Per VictorSTS how much time do I have when moving he node (one at the time) to the new switch before that nod realizes that it has been separated from other PVE nodes ?

Also I should disable the HA for VMs running on the PVE node being moved to minimize chances of them being started on another mode - is there like a maintenance mode that would disable that ?

thank you

.
 
So... this has escalated quickly from a "I have a cluster in production that I need to move to a new switch" to "I have a cluster with Ceph, with separate networks for quorum, Ceph Public and Ceph Cluster, from which will only move to another switch the Ceph Cluster network of all servers in the PVE Cluster, but the Ceph OSD are only on a subset of all the hosts of the PVE Cluster" (if I understand correctly your posts).

Sorry, but it's very very hard and time consuming to provide accurate answers in this forum if I don't have all the information, specially on a critical change like this.

Anyway, my first recomendation still applies:
Network wise, this is a simple as interconnecting both switchs stacks so devices on the old MLAG switches see those moved to the new MLAG swich using the same VLAN's.
So when you move a server to the new switch it will see the others still in the old switch.

Which one is used to keep ceph cluster running ?
Ceph Public it's used for monitor quorum and client access to OSD, CephFS, etc. Ceph Cluster is used for OSD replicas/recovery/backfill only. Depending on your pool configuration you will need to be able to write to at least to n OSD for the I/O to finish. So you need both networks working for Ceph to work.

how much time do I have when moving he node (one at the time) to the new switch before that nod realizes that it has been separated from other PVE nodes ?
Basically zero. Once a node with OSD's loses connection with the cluster network it will not be able to create replicas for the primary PGs it holds, so I/O for those PGs will block, so make sure to mark those OSD down before moving that host. If you connect old and new switch between them, the operation will be as fast as moving the cable and marking the OSD up again. Use the noout flag so Ceph won't rebalance OSD's if they are down more than the default 10 minutes.

Also I should disable the HA for VMs running on the PVE node being moved to minimize chances of them being started on another mode - is there like a maintenance mode that would disable that ?
You must leave that node empty, just in case. By your posts I don't expect HA to fence the node, given that the PVE network (corosync quorum) will not be moved, but the safest thing to do is not to have any VM in the server you are working on.
 
Yes, I see it gets complicated.

I have 6 nodes Proxmox hypervisor and a cluster with 4 nodes Ceph running on Proxmox and another 4 nodes Ceph cluster running on Proxmox. So total of 3 separate clusters with the Proxmox hepervisor only running the VMs and the remaining two nodes only use Ceph (installed under Proxmox). I connected everything with Proxmox cluster network , Ceph Public network and Ceph private network.

I am only moving Proxmox Cluster network to the new switch , both Ceph Public and Ceph private stay on its dedicated 10Gbps Switches/network.

So the only affected network would be the Proxmox cluster which I thought it only to keep quorum for Proxmox , is that correct ?

Thank you
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!