Ceph down after upgrade to Pacific

GoZippy

Member
Nov 27, 2020
112
2
23
45
www.gozippy.com
To be honest - I did not even look to see what upgrades happened till it was too late.

Octo to Pacific upgrade happened apparently with the automatic gui updates... I did not read the notes and now all my cluster ceph pool is dead as a rock.

I noticed timeout after timeout...

I manually installed the new chrony on each node thinking maybe out of sync or something...

then started seeing that none of the ceph OSDs were visible - getting ceph timeout 500 errors all over the place...

Noticed monitors were all in unknown state and no quorum

No managers were listed anymore

ceph -s times out
nothing seeds to work anymore

tried following along with upgrade but I think it may be too late...

I have 9 nodes

monitors on 1,2,3,7 and I can't remember which had the managers...

anyhow I have a ton of erros spewing in syslog and ceph commands are pretty much all stuck...

ceph: No mds server is up or the cluster is laggy

bottom line - ceph is dead... where can I start to try to recover?
1645600972657.png

1645601000738.png


Code:
2022-02-21T11:23:39.813869-0600 osd.0 (osd.0) 4244644 : cluster [WRN] slow request osd_op(client.471789829.0:1612332 7.dc 7.2b14fadc (undecoded) ondisk+retry+write+known_if_redirected e967374) initiated 2022-02-21T02:30:29.797312-0600 currently delayed
2022-02-21T11:23:39.813873-0600 osd.0 (osd.0) 4244645 : cluster [WRN] slow request osd_op(client.471789829.0:1612332 7.dc 7.2b14fadc (undecoded) ondisk+retry+write+known_if_redirected e967508) initiated 2022-02-21T06:30:29.844746-0600 currently delayed
2022-02-21T11:23:39.813876-0600 osd.0 (osd.0) 4244646 : cluster [WRN] slow request osd_op(client.471789829.0:1612332 7.dc 7.2b14fadc (undecoded) ondisk+retry+write+known_if_redirected e967645) initiated 2022-02-21T10:30:29.892240-0600 currently delayed
2022-02-21T11:23:40.212664-0600 mon.node2 (mon.0) 2000851 : cluster [INF] mon.node2 is new leader, mons node2,stack1,node7 in quorum (ranks 0,1,3)
2022-02-21T11:23:40.219565-0600 mon.node2 (mon.0) 2000852 : cluster [DBG] monmap e14: 4 mons at {node2=[v2:10.0.1.2:3300/0,v1:10.0.1.2:6789/0],node7=[v2:10.0.1.7:3300/0,v1:10.0.1.7:6789/0],node900=[v2:10.0.90.0:3300/0,v1:10.0.90.0:6789/0],stack1=[v2:10.0.1.1:3300/0,v1:10.0.1.1:6789/0]}
2022-02-21T11:23:40.219633-0600 mon.node2 (mon.0) 2000853 : cluster [DBG] fsmap cephfs:1 {0=node2=up:active} 2 up:standby
2022-02-21T11:23:40.219653-0600 mon.node2 (mon.0) 2000854 : cluster [DBG] osdmap e967678: 14 total, 4 up, 10 in
2022-02-21T11:23:40.220140-0600 mon.node2 (mon.0) 2000855 : cluster [DBG] mgrmap e649: stack1(active, since 5d), standbys: node2, node7
2022-02-21T11:23:40.228388-0600 mon.node2 (mon.0) 2000856 : cluster [ERR] Health detail: HEALTH_ERR 1 MDSs report slow metadata IOs; mon node7 is very low on available space; mon stack1 is low on available space; 1/4 mons down, quorum node2,stack1,node7; 6 osds down; 1 host (7 osds) down; Reduced data availability: 169 pgs inactive, 45 pgs down, 124 pgs peering, 388 pgs stale; 138 slow ops, oldest one blocked for 61680 sec, osd.0 has slow ops
2022-02-21T11:23:40.228404-0600 mon.node2 (mon.0) 2000857 : cluster [ERR] [WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
2022-02-21T11:23:40.228409-0600 mon.node2 (mon.0) 2000858 : cluster [ERR]     mds.node2(mds.0): 7 slow metadata IOs are blocked > 30 secs, oldest blocked for 78670 secs
2022-02-21T11:23:40.228413-0600 mon.node2 (mon.0) 2000859 : cluster [ERR] [ERR] MON_DISK_CRIT: mon node7 is very low on available space
2022-02-21T11:23:40.228416-0600 mon.node2 (mon.0) 2000860 : cluster [ERR]     mon.node7 has 1% avail
2022-02-21T11:23:40.228422-0600 mon.node2 (mon.0) 2000861 : cluster [ERR] [WRN] MON_DISK_LOW: mon stack1 is low on available space
2022-02-21T11:23:40.228428-0600 mon.node2 (mon.0) 2000862 : cluster [ERR]     mon.stack1 has 8% avail
2022-02-21T11:23:40.228432-0600 mon.node2 (mon.0) 2000863 : cluster [ERR] [WRN] MON_DOWN: 1/4 mons down, quorum node2,stack1,node7
2022-02-21T11:23:40.228437-0600 mon.node2 (mon.0) 2000864 : cluster [ERR]     mon.node900 (rank 2) addr [v2:10.0.90.0:3300/0,v1:10.0.90.0:6789/0] is down (out of quorum)
2022-02-21T11:23:40.228443-0600 mon.node2 (mon.0) 2000865 : cluster [ERR] [WRN] OSD_DOWN: 6 osds down
2022-02-21T11:23:40.228449-0600 mon.node2 (mon.0) 2000866 : cluster [ERR]     osd.8 (root=default,host=node900) is down
2022-02-21T11:23:40.228454-0600 mon.node2 (mon.0) 2000867 : cluster [ERR]     osd.9 (root=default,host=node900) is down
2022-02-21T11:23:40.228460-0600 mon.node2 (mon.0) 2000868 : cluster [ERR]     osd.10 (root=default,host=node900) is down
2022-02-21T11:23:40.228466-0600 mon.node2 (mon.0) 2000869 : cluster [ERR]     osd.11 (root=default,host=node900) is down
2022-02-21T11:23:40.228471-0600 mon.node2 (mon.0) 2000870 : cluster [ERR]     osd.12 (root=default,host=node900) is down
2022-02-21T11:23:40.228477-0600 mon.node2 (mon.0) 2000871 : cluster [ERR]     osd.13 (root=default,host=node900) is down
2022-02-21T11:23:40.228483-0600 mon.node2 (mon.0) 2000872 : cluster [ERR] [WRN] OSD_HOST_DOWN: 1 host (7 osds) down
2022-02-21T11:23:40.228488-0600 mon.node2 (mon.0) 2000873 : cluster [ERR]     host node900 (root=default) (7 osds) is down
2022-02-21T11:23:40.228527-0600 mon.node2 (mon.0) 2000874 : cluster [ERR] [WRN] PG_AVAILABILITY: Reduced data availability: 169 pgs inactive, 45 pgs down, 124 pgs peering, 388 pgs stale
2022-02-21T11:23:40.228534-0600 mon.node2 (mon.0) 2000875 : cluster [ERR]     pg 7.cd is stuck inactive for 21h, current state stale+down, last acting [0]
2022-02-21T11:23:40.228539-0600 mon.node2 (mon.0) 2000876 : cluster [ERR]     pg 7.ce is stuck peering for 21h, current state peering, last acting [0,7]
2022-02-21T11:23:40.228544-0600 mon.node2 (mon.0) 2000877 : cluster [ERR]     pg 7.cf is stuck stale for 21h, current state stale+active+clean, last acting [6,3,8]
2022-02-21T11:23:40.228550-0600 mon.node2 (mon.0) 2000878 : cluster [ERR]     pg 7.d0 is stuck stale for 21h, current state stale+active+clean, last acting [12,2,6]
2022-02-21T11:23:40.228555-0600 mon.node2 (mon.0) 2000879 : cluster [ERR]     pg 7.d1 is stuck stale for 21h, current state stale+active+clean, last acting [9,1,2]
2022-02-21T11:23:40.228561-0600 mon.node2 (mon.0) 2000880 : cluster [ERR]     pg 7.d2 is stuck stale for 21h, current state stale+active+clean, last acting [3,9,2]
2022-02-21T11:23:40.228567-0600 mon.node2 (mon.0) 2000881 : cluster [ERR]     pg 7.d3 is stuck peering for 21h, current state peering, last acting [0,6]
2022-02-21T11:23:40.228574-0600 mon.node2 (mon.0) 2000882 : cluster [ERR]     pg 7.d4 is stuck stale for 21h, current state stale+active+clean, last acting [8,6,1]
2022-02-21T11:23:40.228580-0600 mon.node2 (mon.0) 2000883 : cluster [ERR]     pg 7.d5 is stuck stale for 21h, current state stale+active+clean, last acting [13,6,7]
2022-02-21T11:23:40.228585-0600 mon.node2 (mon.0) 2000884 : cluster [ERR]     pg 7.d6 is stuck stale for 21h, current state stale+active+clean, last acting [11,1,3]


And now I see mon node space issues... all is installed to root partition I believe.. anyhow that has been an ongoing issue... I have proxmox installed to 80GB SSD and cannot sort how to expand root partition more than 18 or 20 gb it setup when originally installing PM. Had several issues over the year upgrading PM and other items... but I do not even know where to start on the ceph monitor space allocation...

Help is much appreciated as I am still learning all this fun stuff.

Did I kill my cluster because of that automated update?
 
any way to recover osd and get managers back and rescue map?

Nodes can see each other fine - just missing managers for Ceph and no osd are showing up.

ceph -s hangs
timeout on any gui screen and most ceph commands

Code:
root@node900:/etc/pve/nodes/node2# ha-manager status
quorum OK
master node5 (active, Thu Feb 24 22:40:22 2022)
lrm node2 (active, Thu Feb 24 22:40:25 2022)
lrm node3 (active, Thu Feb 24 22:40:27 2022)
lrm node4 (active, Thu Feb 24 22:40:24 2022)
lrm node5 (idle, Thu Feb 24 22:40:28 2022)
lrm node7 (idle, Thu Feb 24 22:40:28 2022)
lrm node8 (idle, Thu Feb 24 22:40:30 2022)
lrm node900 (idle, Thu Feb 24 22:40:30 2022)
lrm stack1 (idle, Thu Feb 24 22:40:28 2022)
service ct:901 (node3, disabled)
service ct:9103 (node4, disabled)
service vm:105 (node4, stopped)
service vm:108 (node3, stopped)
service vm:99203 (node2, stopped)
root@node900:/etc/pve/nodes/node2#


Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.0.1.1/16
     fsid = xxxxxxxxxxxxxxxxxxxxxxbbb
     mon_allow_pool_delete = true
     mon_host = 10.0.1.2 10.0.1.1 10.0.1.6 10.0.1.5 10.0.90.0 10.0.1.7
     ms_bind_ipv4 = true
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.0.1.1/16

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.node2]
     host = node2
     mds_standby_for_name = pve

[mds.node7]
     host = node7
     mds standby for name = pve

[mds.node900]
     host = node900
     mds_standby_for_name = pve

[mds.stack1]
     host = stack1
     mds_standby_for_name = pve

[mon.node2]
     public_addr = 10.0.1.2

[mon.node5]
     public_addr = 10.0.1.5

[mon.node6]
     public_addr = 10.0.1.6

[mon.node7]
     public_addr = 10.0.1.7

[mon.node900]
     public_addr = 10.0.90.0

[mon.stack1]
     public_addr = 10.0.1.1

1645764410878.png


node syslog just reports no mds server is up and then lots of timeout errors...

its like the upgrade deleted my data map and managers all went bye bye...

the physical disks still show osd 0-14 or whatever... but no osd stores are shown anywhere...

how can I rebuild the map and get a manager to start?

any help appreciated
1645764669790.png


1645764687384.png

1645764709412.png
 
That all sounds not good. I had troubles upgrading to Proxmox7 last weekend. Lot of troubles.

Unfortunally I can´t solve your problem, but have hints to look for:

*Check your network.

In my case Ceph was dead like Elvis .... matter was, that update to Debian 10 and 11 (I came from Proxmox 5) changed interface naming to "predictable names"

https://wiki.debian.org/NetworkInterfaceNames

That leads in my case to a no functional Ceph network. All was in place, but ceph was unable to communicate .

My german tread about it: https://forum.proxmox.com/threads/n...e-netzwerk-interfaces-nicht-mehr-hoch.105612/

*You should boot one node and analyse SYSLOG from start to end and try to fix every error one after the other.
I find some problems there I fixed

*Space issue: The problem could be the new "pg autoscale" function. There is a little warning in the Upgrade Dokumentation with a little link. I advice you to read it, because it can lead to a big problem. That was the matter for me to disable this function.

Here you can find a solution by installing addition ssd to system and linking database dir temp. to this

https://forum.proxmox.com/threads/c...think-twice-before-enabling-auto-scale.80105/


* At last I had in my case a lot of packet collisions. Source was the advice from the upgrade documentation to hardcode a MAC adress to the vm bridge (as one option - the one i choosed) .
After deleting this hard coded mac all my problems are gone.

* Maybe read the doc: https://pve.proxmox.com/wiki/Ceph_Octopus_to_Pacific Section "Known issues" - here are some situations that could fit your problem:

I decided not to upgrade to pacific because of this...

I wish you luck - and maybe you can keep us informed
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!