'ceph orch upgrade...' causes an rbd outage on a proxmox cluster

Rollback · Feb 6, 2023

Hi everyone,

I have a ceph test cluster and a proxmox test cluster (for try upgrade in test before the prod).
My ceph cluster is made up of three servers running debian 11, with two separate networks (cluster_network and public_network, in VLANs).
In ceph version 16.2.10 (cephadm with docker).
Each server has one MGR, one MON and 8 OSDs.
cluster:
id: xxx
health: HEALTH_OK

services:
mon: 3 daemons, quorum ceph01,ceph03,ceph02 (age 2h)
mgr: ceph03(active, since 77m), standbys: ceph01, ceph02
osd: 24 osds: 24 up (since 7w), 24 in (since 6M)

data:
pools: 3 pools, 65 pgs
objects: 29.13k objects, 113 GiB
usage: 344 GiB used, 52 TiB / 52 TiB avail
pgs: 65 active+clean

io:
client: 1.3 KiB/s wr, 0 op/s rd, 0 op/s wr

The proxmox cluster is also made up of 3 servers running proxmox 7.2-7 (with proxmox ceph pacific which is on 16.2.9 version). The ceph storage used is RBD (on the ceph public_network). I added the RBD datastores simply via the GUI.

So far so good. I have several VMs, on each of the proxmox.

When I update ceph to 16.2.11, that's where things go wrong.
I don't like when the update does everything for me without control, so I did a "staggered upgrade", following the official procedure (https://docs.ceph.com/en/pacific/cephadm/upgrade/#staggered-upgrade). As the version I'm starting from doesn't support staggered upgrade, I follow the procedure (https://docs.ceph.com/en/pacific/ce...ports-staggered-upgrade-from-one-that-doesn-t).
When I do the "ceph orch redeploy" of the two standby MGRs, everything is fine.
I do the "sudo ceph mgr fail", everything is fine (it switches well to an mgr which was standby, so I get an MGR 16.2.11).
However, when I do the "sudo ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.11 --daemon-types mgr", it updates me the last MGR which was not updated (so far everything is still fine), but it does a last restart of all the MGRs to finish, and there, the proxmox visibly loses the RBD (happens almost instantly) and turns off all my VMs.
Here is the message in the proxmox syslog:
Feb 2 16:20:52 pmox01 QEMU[436706]: terminate called after throwing an instance of 'std::system_error'
Feb 2 16:20:52 pmox01 QEMU[436706]: what(): Resource deadlock avoided
Feb 2 16:20:52 pmox01 kernel: [17038607.686686] vmbr0: port 2(tap102i0) entered disabled state
Feb 2 16:20:52 pmox01 kernel: [17038607.779049] vmbr0: port 2(tap102i0) entered disabled state
Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Succeeded.
Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Consumed 43.136s CPU time.
Feb 2 16:20:53 pmox01 qmeventd[446872]: Starting cleanup for 102
Feb 2 16:20:53 pmox01 qmeventd[446872]: Finished cleanup for 102

For ceph, everything is fine, it does the update, and tells me everything is OK in the end.
Ceph is now on 16.2.11 and the health is OK.

When I redo a downgrade of the MGRs, I have the problem again and when I start the procedure again, I still have the problem. It's very reproducible.
According to my tests, the "sudo ceph orch upgrade" command always gives me trouble, even when trying a real staggered upgrade from and to version 16.2.11 with the command:
sudo ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.11 --daemon-types mgr --hosts ceph01 --limit 1

After some tests, by passing the datastore in proxmox conf in kernel rbd (krbd) and by putting the IPs separated by commas (separated by spaces before), the VMs do not shut down.
I suppose, it's the librbd which cause the problem but I didn't find any changelog on Proxmox 7.* with ceph-common 16.2.9 (Proxmox version).
I would prefer not to be blocked on the krbd.

Does anyone have an idea?

Thank you everyone !
Rollback.

Search

Search

'ceph orch upgrade...' causes an rbd outage on a proxmox cluster

Rollback

Member

We value your privacy