What is the right procedure for doing maintenance and upgrade to PVE and Ceph?

alessice

Active Member
Sep 18, 2015
15
1
43
Hi,

I'm running a PVE cluster with 8 node (Supermicro with Intel Xeon), each running PVE 5.4 and Ceph in hyper converged mode. Ceph is configured with 3 monitor, replica x 3, 6 OSD per node with a total of 48 and 2048 PG. Each OSD is an Intel SSD D3-S4510 960GB.

Now it's time to do some regular maintenance and upgrade, so my question is how to do it without any issue? For example, is sufficient to move all VM from a node to others, run apt-get update/apt-get upgrade and reboot the node?

From PVE side this is sufficient, but for Ceph is ok to reboot a node? Should be set some command before to reboot a node or will do all Ceph automatically?

Any tips or hint for regular maintenance is appreciated.
Thanks
 
apt-get upgrade
please never do an 'apt-get upgrade' but either an 'apt-get dist-upgrade' or 'apt dist-upgrade' or use the gui ;)
the reason is sometimes we add new dependecies and apt-get upgrade does not handle this

From PVE side this is sufficient, but for Ceph is ok to reboot a node? Should be set some command before to reboot a node or will do all Ceph automatically?
you can set the 'noout' flag so that the osds of that node do not get marked as out to prevent rebalancing
 
Thanks,

so an "# ceph osd set noout" before start the upgrade is sufficient, and a "# ceph osd unset noout" after the reboot.
 
Thanks,

so an "# ceph osd set noout" before start the upgrade is sufficient, and a "# ceph osd unset noout" after the reboot.

You could, but I suggest waiting until at least all your nodes are done. Here's is my process:

Code:
# Node maintenance

    # stop and wait for scrub and deep-scrub operations

ceph osd set noscrub
ceph osd set nodeep-scrub

ceph status

    # set cluster in maintenance mode with :

ceph osd set noout


    # for node 1..N

    #  migrate VMs and CTs off node

(GUI or CLI)

    # run updates

apt update && pveupgrade

reboot  # if required, e.g., kernel update

    # wait for node to come back on line and quorate

    # next N

# restore

ceph osd unset noout


    # when all the PGs are active, re-enable the scrub and deep-scrub operations

ceph status

ceph osd unset noscrub
ceph osd unset nodeep-scrub


    # done
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!