Moving pve cluster to a new switch

brucexx · Oct 24, 2024

I have a cluster in production that I need to move to a new switch. We are using MCLAG redundant interfaces and need to move 6 nodes from on switch cluster to the other. That should not take long but I was wondering if I should execute on each node pvecm expected -1 before I do that, any advice ?

Also once executed pvecm expected -1 , how to disable/negate that command so the node does not expect to be -1.

Thank you

UdoB · Oct 24, 2024

brucexx said:
pvecm expected -1

That's a typo, right? There is no way to go for "-1", as far as I know.

What I do know: you can't lower that value as long as all nodes are active. You would get

Code:

~# pvecm expected 3
Unable to set expected votes: CS_ERR_INVALID_PARAM

Perhaps the "maintenance mode" is what you are looking for - just a hint, I am not really sure how that one works ;-)

UdoB · Oct 24, 2024

brucexx said:
how to disable/negate that command

As soon as an absent node is online again the "expected" values rises automatically - at least that's my observation

VictorSTS · Oct 24, 2024

Network wise, this is a simple as interconnecting both switchs stacks so devices on the old MLAG switches see those moved to the new MLAG swich using the same VLAN's. Things may get slightly complicated if using Ceph, as you will have to keep in mind Ceph's quorum too, but that can be sorted out just by moving nodes one by one and wating for Ceph to resync.

If you force quorum at both "sides" of the same cluster by using pvecm expected, you are forcing a split brain and when both "sides" rejoin you'll potentially have to deal with conflicts in pmxcfs that would have to be solved manually. pvecm expected should be used only in one side of the cluster while the other side is fully powered off and then wait for such side to come back eventually and rejoin the cluster.

brucexx · Oct 24, 2024

Is ceph quorum using pvx cluster network ? I though that it is using ceph private network along with data syncking.

VictorSTS · Oct 24, 2024

Ceph quorum uses your Ceph Public network only.

brucexx · Oct 25, 2024

yes, but I have 3 networks and separate interface for:

1. PVE cluster - keeping PVE quorum
2. Ceph public - facing the Proxmox hypervisors
3. Ceph privet - syncking storage accros Ceph servers

I am running ceph on separate servers but ceph is installed on Proxmox that is working just to manage it.

Which one is used to keep ceph cluster running ?

I will be moving only the PVE cluster network , the rest Ceph public and ceph private is staying on a different 10Gb switch.

Per VictorSTS how much time do I have when moving he node (one at the time) to the new switch before that nod realizes that it has been separated from other PVE nodes ?

Also I should disable the HA for VMs running on the PVE node being moved to minimize chances of them being started on another mode - is there like a maintenance mode that would disable that ?

thank you

.

VictorSTS · Oct 25, 2024

So... this has escalated quickly from a "I have a cluster in production that I need to move to a new switch" to "I have a cluster with Ceph, with separate networks for quorum, Ceph Public and Ceph Cluster, from which will only move to another switch the Ceph Cluster network of all servers in the PVE Cluster, but the Ceph OSD are only on a subset of all the hosts of the PVE Cluster" (if I understand correctly your posts).

Sorry, but it's very very hard and time consuming to provide accurate answers in this forum if I don't have all the information, specially on a critical change like this.

Anyway, my first recomendation still applies:

Network wise, this is a simple as interconnecting both switchs stacks so devices on the old MLAG switches see those moved to the new MLAG swich using the same VLAN's.

So when you move a server to the new switch it will see the others still in the old switch.

brucexx said:
Which one is used to keep ceph cluster running ?

Ceph Public it's used for monitor quorum and client access to OSD, CephFS, etc. Ceph Cluster is used for OSD replicas/recovery/backfill only. Depending on your pool configuration you will need to be able to write to at least to n OSD for the I/O to finish. So you need both networks working for Ceph to work.

brucexx said:
how much time do I have when moving he node (one at the time) to the new switch before that nod realizes that it has been separated from other PVE nodes ?

Basically zero. Once a node with OSD's loses connection with the cluster network it will not be able to create replicas for the primary PGs it holds, so I/O for those PGs will block, so make sure to mark those OSD down before moving that host. If you connect old and new switch between them, the operation will be as fast as moving the cable and marking the OSD up again. Use the noout flag so Ceph won't rebalance OSD's if they are down more than the default 10 minutes.

brucexx said:
Also I should disable the HA for VMs running on the PVE node being moved to minimize chances of them being started on another mode - is there like a maintenance mode that would disable that ?

You must leave that node empty, just in case. By your posts I don't expect HA to fence the node, given that the PVE network (corosync quorum) will not be moved, but the safest thing to do is not to have any VM in the server you are working on.

brucexx · Oct 26, 2024

Yes, I see it gets complicated.

I have 6 nodes Proxmox hypervisor and a cluster with 4 nodes Ceph running on Proxmox and another 4 nodes Ceph cluster running on Proxmox. So total of 3 separate clusters with the Proxmox hepervisor only running the VMs and the remaining two nodes only use Ceph (installed under Proxmox). I connected everything with Proxmox cluster network , Ceph Public network and Ceph private network.

I am only moving Proxmox Cluster network to the new switch , both Ceph Public and Ceph private stay on its dedicated 10Gbps Switches/network.

So the only affected network would be the Proxmox cluster which I thought it only to keep quorum for Proxmox , is that correct ?

Thank you

brucexx · Oct 29, 2024

Moving the PVE cluster network one node at the time had no effect on overall cluster or ceph in this setup, Ceph did not even notice.

Proxmox did notice, I saw moving nodes offline but I disabled HA on all VMs for the time being.Whole operation completed successfully.

Thx.

pvps1 · Oct 29, 2024

is service uptime that crucial that you cannot habe a downtime of 15min at 0300?

that would minimize complexity to zero....

#keepitsimple

brucexx · Oct 30, 2024

pvps1 said:
is service uptime that crucial that you cannot habe a downtime of 15min at 0300?

that would minimize complexity to zero....

#keepitsimple

Yes , VoIP hosting , some clients have 24/7 service. Uptime is paramount.

Thank you all for advise. Proxmox has been a stable environment for us for over 10 years now starting with 3.x version and updating over the years.

Search

Search

Moving pve cluster to a new switch

brucexx

Renowned Member

UdoB

Distinguished Member

UdoB

Distinguished Member

VictorSTS

Famous Member

brucexx

Renowned Member

VictorSTS

Famous Member

brucexx

Renowned Member

VictorSTS

Famous Member

brucexx

Renowned Member

brucexx

Renowned Member

pvps1

Renowned Member

brucexx

Renowned Member