[SOLVED] proxmox ceph upgrade nautilus -> octopus: Ceph no longer functional

hellfire

Renowned Member
Aug 17, 2016
79
47
83
46
Hi,

I've done some major upgrades on a three node proxmox hyperconverged setup
  • VM-Backups were made before.
  • Proxmox 5.x to 6.x (via step by step guide from wiki). Upgrade was successfully. All nodes were rebooted afterwards. Status was perfect
  • Ugrade from Ceph Luminous to Ceph Nautilus(via step by step guide from wiki). All nodes were rebooted afterwards. Status was perfect
  • Upgrade from Ceph Nautilus to Ceph Octopus(via step by step guide from wiki). After that Ceph is no longer functioning.
Given the status below, is there any hope to get the current ceph pool online again?
If not, what would be the best way to get to a working state again?

What is to say about the current situation besides the detailed information below:
  • "ceph osd status" command is no longer working. If I enter it, the command is stuck and can only be interrupted with Ctrl+C
  • I restarted all osds on all nodes at the same time. I assume this was not good.

Current status Information

Versions


https://nopaste.debianforum.de/41274

proxmox status

https://nopaste.debianforum.de/41275

ceph status

Code:
ceph status

  cluster:
    id:     bcfe05fc-6690-4743-850e-ff80837b7cdc
    health: HEALTH_WARN
            noout flag(s) set
            4 osds down
            2 hosts (8 osds) down
            Reduced data availability: 129 pgs inactive
            Degraded data redundancy: 179886/269829 objects degraded (66.667%), 129 pgs degraded, 129 pgs undersized
            1 slow requests are blocked > 32 sec
            1 slow ops, oldest one blocked for 1249 sec, osd.7 has slow ops

  services:
    mon: 3 daemons, quorum kvm10,kvm11,kvm12 (age 21m)
    mgr: kvm10(active, since 20m), standbys: kvm11, kvm12
    osd: 12 osds: 4 up, 8 in
         flags noout

  data:
    pools:   2 pools, 129 pgs
    objects: 89.94k objects, 335 GiB
    usage:   1006 GiB used, 20 TiB / 21 TiB avail
    pgs:     100.000% pgs not active
             179886/269829 objects degraded (66.667%)
             129 undersized+degraded+peered

ceph health detail

https://nopaste.debianforum.de/41276
 
Last edited:
I tried to restart some of the osds via console with a command like this:

Code:
systemctl restart ceph-osd@1

This had no effect.

I consulted the logs in /var/log/ceph and I did not find anything, that seemed helpful for a diagnosis or a possible solution to me.

I also checked journalctl of ceph-osd@... and that seemed fine.

I'll try it again from the web interface.
 
Last edited:
A Ceph specialist reviewed the situation and the error from my side was, that I did not execute one essential command(which was actually in the wiki!):

Code:
ceph osd require-osd-release octopus
 
Yeah, missing steps or changing their order from the upgrade how-to can result in various issues.

But glad you could solve it! I added a note to the wiki to highlight that this command is important especially when one comes from Ceph Luminous or older.
 
  • Like
Reactions: hellfire

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!