[SOLVED] proxmox ceph upgrade nautilus -> octopus: Ceph no longer functional

hellfire · Feb 17, 2021

Hi,

I've done some major upgrades on a three node proxmox hyperconverged setup

VM-Backups were made before.
Proxmox 5.x to 6.x (via step by step guide from wiki). Upgrade was successfully. All nodes were rebooted afterwards. Status was perfect
Ugrade from Ceph Luminous to Ceph Nautilus(via step by step guide from wiki). All nodes were rebooted afterwards. Status was perfect
Upgrade from Ceph Nautilus to Ceph Octopus(via step by step guide from wiki). After that Ceph is no longer functioning.

Given the status below, is there any hope to get the current ceph pool online again?
If not, what would be the best way to get to a working state again?

What is to say about the current situation besides the detailed information below:

"ceph osd status" command is no longer working. If I enter it, the command is stuck and can only be interrupted with Ctrl+C
I restarted all osds on all nodes at the same time. I assume this was not good.

Current status Information

Versions

https://nopaste.debianforum.de/41274

proxmox status

https://nopaste.debianforum.de/41275

ceph status

Code:

ceph status

  cluster:
    id:     bcfe05fc-6690-4743-850e-ff80837b7cdc
    health: HEALTH_WARN
            noout flag(s) set
            4 osds down
            2 hosts (8 osds) down
            Reduced data availability: 129 pgs inactive
            Degraded data redundancy: 179886/269829 objects degraded (66.667%), 129 pgs degraded, 129 pgs undersized
            1 slow requests are blocked > 32 sec
            1 slow ops, oldest one blocked for 1249 sec, osd.7 has slow ops

  services:
    mon: 3 daemons, quorum kvm10,kvm11,kvm12 (age 21m)
    mgr: kvm10(active, since 20m), standbys: kvm11, kvm12
    osd: 12 osds: 4 up, 8 in
         flags noout

  data:
    pools:   2 pools, 129 pgs
    objects: 89.94k objects, 335 GiB
    usage:   1006 GiB used, 20 TiB / 21 TiB avail
    pgs:     100.000% pgs not active
             179886/269829 objects degraded (66.667%)
             129 undersized+degraded+peered

ceph health detail

https://nopaste.debianforum.de/41276

t.lamprecht · Feb 18, 2021

Hi,

hellfire said:
4 osds down
2 hosts (8 osds) down

Did you try to manually start+in those OSDs over the webinterface now again?

What does the journal/syslog says?

hellfire · Feb 18, 2021

I tried to restart some of the osds via console with a command like this:

Code:

systemctl restart ceph-osd@1

This had no effect.

I consulted the logs in /var/log/ceph and I did not find anything, that seemed helpful for a diagnosis or a possible solution to me.

I also checked journalctl of ceph-osd@... and that seemed fine.

I'll try it again from the web interface.

hellfire · Feb 18, 2021

I started the stopped osds via GUI. Nothing happens. state is as before.

hellfire · Feb 18, 2021

A Ceph specialist reviewed the situation and the error from my side was, that I did not execute one essential command(which was actually in the wiki!):

Code:

ceph osd require-osd-release octopus

t.lamprecht · Feb 18, 2021

Yeah, missing steps or changing their order from the upgrade how-to can result in various issues.

But glad you could solve it! I added a note to the wiki to highlight that this command is important especially when one comes from Ceph Luminous or older.

hellfire · Feb 18, 2021

I'm not sure, but somehow I read it and unconsiously must have classified this as not important. So thanks for the additional hint.

Search

Search

[SOLVED] proxmox ceph upgrade nautilus -> octopus: Ceph no longer functional

hellfire

Renowned Member

t.lamprecht

Proxmox Staff Member

hellfire

Renowned Member

hellfire

Renowned Member

hellfire

Renowned Member

t.lamprecht

Proxmox Staff Member

hellfire

Renowned Member