Shutdown of the Hyper-Converged Cluster (CEPH)

albert_a · Apr 5, 2020

Hi,

Can someone explain how to shutdown the hyper-converged cluster properly?
I suppose the steps should be as follows:

1. Shutdown all VMs on every node
2. Set the following flags:
# ceph osd set noout
# ceph osd set nobackfill
# ceph osd set norecover
3. Shutdown the nodes only after All the VMs on all nodes was shut down.
4. Restore CEPH flags when all the nodes boot up.

If this sequence is correct, then I have a second question.
What is the proper way to perform step 1? How to shutdown both HA-managed and non-HA managed VMs on the node?

- Stopping of pve-manager is not allowed:
# systemctl stop pve-manager
Failed to stop pve-manager.service: Operation refused, unit pve-guests.service may be requested by dependency only (it is configured to refuse manual start/stop).
See system logs and 'systemctl status pve-manager.service' for details.

- Stopping via pvesh only affects non-HA VMs:
# pvesh create /nodes/localhost/stopall

Best regards,
Albert

Alwin · Apr 6, 2020

Well, you usually don't shutdown the whole cluster, especially since you have/want HA.

albert_a said:
# ceph osd set noout

I don't recommend to set this, since all nodes will boot again and may or may not start properly.

albert_a said:
What is the proper way to perform step 1? How to shutdown both HA-managed and non-HA managed VMs on the node?

You can use freeze as shutdown policy, so services don't move.
https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_node_maintenance

albert_a · Apr 8, 2020

Alwin said:
I don't recommend to set this, since all nodes will boot again and may or may not start properly.

Thanks for advice, it might be reasonable in some circumstances. Currently I have to set it even if some nodes fail to start, and remove the flag only during in the period of minimum cluster load.

Alwin said:
Well, you usually don't shutdown the whole cluster, especially since you have/want HA.

You are not serious. Do you?) There are numerous reasons when you need to shut down the cluster. Force majeure, security reasons, maintenance, energy saving, staff negligence, and much more, no talking about power outages within small companies.

Alwin said:
You can use freeze as shutdown policy, so services don't move.
https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_node_maintenance

I can. But how is it it related to my questions?

OK, taking into account the fact that nobody gives the answers, and also my own searches, it's clear that Proxmox does NOT support hyper-converged clusters natively. Although they are mentioned in the manual. I think some scripting is required to make Proxmox support them.

Alwin · Apr 8, 2020

albert_a said:
Thanks for advice, it might be reasonable in some circumstances. Currently I have to set it even if some nodes fail to start, and remove the flag only during in the period of minimum cluster load.

Even with a whole cluster shutdown/boot up, the OSD recovery load should not impact the operation of the cluster.

albert_a said:
I can. But how is it it related to my questions?

Your question was:

albert_a said:
How to shutdown both HA-managed and non-HA managed VMs on the node?

With the shutdown policy set to freeze, services (VM/CT) will not move to other nodes while you shut down all nodes in a cluster. All VM/CT will be shut down (as long as ACPI call or guest-agent work) as well. You can also trigger a bulk action before the shutdown.

albert_a said:
OK, taking into account the fact that nobody gives the answers, and also my own searches, it's clear that Proxmox does NOT support hyper-converged clusters natively. Although they are mentioned in the manual. I think some scripting is required to make Proxmox support them.

What are your specifics that say otherwise? Running Proxmox VE + Ceph is hyper-converged.

kwinz · Jun 16, 2020

Alwin said:
Well, you usually don't shutdown the whole cluster, especially since you have/want HA.

I have done some research, but I am still confused as to how I can turn off a Proxmox-HE cluster with Ceph,
from a script that runs on low UPS battery safely and without race conditions.

There has to be a better answer than "never shutdown the whole cluster".

Is it as simple as setting VM migration policy to "freeze" and then running "shutdown" on each host node?
Or will that trigger a quorum race condition if not all nodes shutdown before they notice they lost quorum.

PS: backlink to a reddit thread that I also found interresting: https://www.reddit.com/r/homelab/comments/5rb6vi/cluster_shutdown_script_for_pve_ceph_cluster/
There the author stops pve-manager.service, contrary to what albert posted, that this gives him " Failed to stop pve-manager.service:" errors.

martin.bork · Apr 15, 2021

Hi Albert,

I have now the same question, because we are moving into a new building. So PVE cluster have to shut down and reboot at the new location.

So I like to ask: Was your solution successful?

best regards
Martin Bork

PlayStation · May 12, 2021

https://ceph.io/planet/how-to-do-a-ceph-cluster-maintenance-shutdown/

ph0x · May 12, 2021

Well, this seems a bit overblown for a hyper converged setup, since with a shutdown of the nodes you automatically shut down the Ceph services as well.
In my opinion it should be enough to shut down all VMs and reboot every node.

martin.bork · May 12, 2021

Thank you for your response, ph0x and PayStaion.

But the question is the shutdown and later boot up of al nodes at the same time. So shutdown on by one and move all together physical.

Shutting down and boot again - for kernel updates - one after the other that is easy.

Also the definition:

Shutdown your service nodes one by one
Shutdown your OSD nodes one by one
Shutdown your monitor nodes one by one
Shutdown your admin node

In Proxmox, what is the admin and service node - any?

Monitor is clear OSD no problem. For me it seams that this question have to be clear and the proxmox team shoult bring clarity.

np-prxmx · Jun 4, 2021

Hi,
anyone has found the answer?. Which is the best procedure?

thanks

jasonsansone · Jun 4, 2021

OP's originally stated method is the best method. If you don't set Ceph flags, the cluster will begin to rebalance as soon as OSD are marked out. If there is more than a five minute time between first shutdown and last shutdown of all the nodes, or similarly on restart, you will be dealing with an unnecessary amount of rebalancing. Granted this depends on your Ceph failure domain in the crushmap, but it defaults to the node (host) when using Proxmox. Once all nodes are back up and the cluster is stable, clear the Ceph flags. The very small amount of backfilling and rebalancing will then occur.

Caveat: This assumes 3x replicated pools and not EC. You may be in for a totally different animal with EC pools.

np-prxmx · Jun 4, 2021

jasonsansone said:
OP's originally stated method is the best method

Sorry, but don't understand. What do you mean?
Thanks

jasonsansone · Jun 4, 2021

albert_a said:
1. Shutdown all VMs on every node
2. Set the following flags:
# ceph osd set noout
# ceph osd set nobackfill
# ceph osd set norecover
3. Shutdown the nodes only after All the VMs on all nodes was shut down.
4. Restore CEPH flags when all the nodes boot up.

np-prxmx · Jun 4, 2021

Thanks. But alwin said that " ceph osd set noout " : I don't recommend to set this, since all nodes will boot again and may or may not start properly.

So, i need to do or not?.

Thanks

jasonsansone · Jun 4, 2021

Based on guidance from Ceph and experience, this is the best method. If you do not set the Ceph flags and a node does not boot properly within five minutes (this is a tunable which could be changed from default), the OSD's on that node will be marked out. The rest of the Ceph cluster will begin to rebalance. When the last node finally comes back online, you will have a large amount of unnecessary rebalancing occurring. The OSD's didn't fail and they have good data, but Ceph doesn't know they were unavailable as opposed to truly failed. Using flags is the Ceph method for performing maintenance. You don't "have" to use Ceph flags, but not doing so will trigger lots of data movement you can avoid.

np-prxmx · Jun 4, 2021

jasonsansone said:
Based on guidance from Ceph and experience, this is the best method. If you do not set the Ceph flags and a node does not boot properly within five minutes (this is a tunable which could be changed from default), the OSD's on that node will be marked out. The rest of the Ceph cluster will begin to rebalance. When the last node finally comes back online, you will have a large amount of unnecessary rebalancing occurring. The OSD's didn't fail and they have good data, but Ceph doesn't know they were unavailable as opposed to truly failed. Using flags is the Ceph method for performing maintenance. You don't "have" to use Ceph flags, but not doing so will trigger lots of data movement you can avoid.

Ok, perfect. Thanks for your time.

jsterr · Jan 27, 2022

Usually you cant shutdown VMs because HA will automatically move them to a different node. So how to shut dem down properly with HA enabled?

Moayad · Jan 27, 2022

Hi

We recommend stopping the pve-ha-crm & pve-ha-lrm services at the time of maintenance.
FYI, the enhancement is opened in our Bugzilla [0]

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=3839

np-prxmx · Jan 31, 2022

Moayad said:
Hi

We recommend stopping the pve-ha-crm & pve-ha-lrm services at the time of maintenance.
FYI, the enhancement is opened in our Bugzilla [0]

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=3839

Thanks for your work!

b.miller · May 4, 2022

I'm currently configuring NUT safe-shutdown scripts on Prox cluster w/ CEPH & HA. I am assuming this is the recommended procedure for graceful shutdown?

Shutdown of the Hyper-Converged Cluster (CEPH)

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Active Member

Member

Member

Renowned Member

Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Active Member

Renowned Member

Proxmox Staff Member

Active Member

Member