Hi all, meant to report back after doing the upgrade at the end of August... The upgrade went OK with no loss of service.
We had VM's on an EC pool with a 3/2 replication cache pool. Just to be on the safe side we created a new 2/1 temp pool and migrated VM's to it, and left some test VM's on the EC pool to see what would happen.
So before the upgrade, we had:
- 6 nodes, 5 @ 12.2.5 and 1 @ 12.2.7.
- several replication pools and 2 EC pools with cache tier.
- a mixture of SSD and WD Gold disks.
- 3 mon hosts (all 12.2.5).
Pre-upgrade we did the following:
- created a new 2/1 replication pool and migrated production VM's to it.
- created some test VM's on the EC pool to monitor uptime during the upgrade.
During the upgrade we did the following:
- set noout, no-scrub, no-deepscrub to stop unnecessary IO and balancing.
- ran an apt-get update & apt-get dist upgrade on the 3 mon hosts (one at a time) and rebooted them.
- this upgraded both Proxmox and Ceph.
- ran apt-get update & apt-get dist upgrade on the remaining 3 nodes (these ran the EC pools), again one at a time, and rebooted.
- confirmed all VM's still online, including those on the EC pools.
- unset noout, no-scrub, and no-deepscrub.
- Ceph showed HEALTH_OK as soon as the pools re-syncd after noout was removed.
Post-upgrade we did the following;
- moved the VM's from the temp pool back to the EC with cache pools.
- removed the temp 2/1 replication pool.
All in, it was much less painless as I thought (and the Ceph documentation implied). No data loss and no loss of service.
Thanks to all on here for your comments and pointers,
Stu.