Monitors won't start after upgrading.

waynevaghi · Jan 10, 2022

Hi All,

I am relatively new to Proxmox so any help is appreciated.
I have a 3 node Proxmox cluster. It was on version 6.3, but I upgraded to the cluster to 6.4 and everything is working correctly.
Next I upgraded one node from 6.4 to 7.1 but now the monitors won't start for the Ceph.
In the GUI, by the monitors it says a newer version was installed but the old version is still running and I must restart.
I have rebooted the node a couple of times as well as just trying to reboot the services but it doesn't help.

I have live VM's on the other 2 nodes that are used in production so I cannot upgrade the other nodes until I can migrate the VM's away but I can't do it because of this issue.

Any help on how to resolve this is welcome. I will provide some information from the system now:

root@ITAhost1:~# ceph mon dump
dumped monmap epoch 3
epoch 3
fsid 3aeca8ac-a7c5-491a-a10d-2ef1947cb0ad
last_changed 2021-05-18T19:29:36.088508+0200
created 2021-05-18T18:02:21.938621+0200
min_mon_release 15 (octopus)
0: [v2:10.18.215.11:3300/0,v1:10.18.215.11:6789/0] mon.ITAhost1
1: [v2:10.18.215.13:3300/0,v1:10.18.215.13:6789/0] mon.ITAhost3
2: [v2:10.18.215.12:3300/0,v1:10.18.215.12:6789/0] mon.ITAhost2

From the one of the hosts on 6.4:

root@ITAhost1:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.157-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-11
pve-kernel-helper: 6.4-11
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.15-pve1~bpo10
ceph-fuse: 15.2.15-pve1~bpo10
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1

From the 7.1 host:

root@ITAhost2:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.4: 6.4-11
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
openvswitch-switch: 2.15.0+ds1-2
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

dcsapak · Jan 10, 2022

did you follow: https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0 ?
especially: https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus ?

waynevaghi · Jan 10, 2022

Hi, thank you for the reply.
Yes I followed those documents but only one node has been Proxmox 7 and the new Ceph.
I can't do the other nodes as I can't migrate the VM's away because of the upgraded node being unavailable.

czechsys · Jan 10, 2022

So you upgraded one node to PVE7 and upgraded ceph to Octopus too. There's the problem.

Before PVE team reply, my possible theoretical solutions :
1] downgrade ceph on the PVE7 node
or
2] stop VMs, backup VMs, upgrade rest of the cluster.

No warranty from me for any point written above.

itNGO · Jan 10, 2022

waynevaghi said:
Hi, thank you for the reply.
Yes I followed those documents but only one node has been Proxmox 7 and the new Ceph.
I can't do the other nodes as I can't migrate the VM's away because of the upgraded node being unavailable.

Hi,
Guide at especially: https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus says:

Upgrade on each Ceph cluster node

Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.

apt update
apt full-upgrade

After the update you still run the old Nautilus binaries.

So I guess first step is to bring all Nodes to Octopus while they are still PVE 6.3 nodes.... and then Update PVE 6 to 7...

Not sure if you can continue CEPH update while one Node is already on PVE7 and not joining CEPH....

Maybe better restore 3rd node from Backup to 6.3 or reinstall and rejoin the cluster......

waynevaghi · Jan 10, 2022

Thank you for the reply.

So my options it seems is to downgrade ceph on the upgraded node or upgrade ceph on the other nodes and then ceph will function again on all nodes and I can do the rest of the PVE 6.3 upgrades to 7.

I would assume the ceph upgrade is disruptive to the vm's on the node?

itNGO · Jan 10, 2022

waynevaghi said:
Thank you for the reply.

So my options it seems is to downgrade ceph on the upgraded node or upgrade ceph on the other nodes and then ceph will function again on all nodes and I can do the rest of the PVE 6.3 upgrades to 7.

I would assume the ceph upgrade is disruptive to the vm's on the node?

In the state where your cluster currently is I would at first do a fresh Backup.
Ans consider to do the rest of the Upgrade while the VMs are turned OFF.
But maybe "staff" has a better idea what to do next. If you have a Proxmox-Subscription I would call support directly about this. Cause sound like VMs are important, so better be save than sorry...

waynevaghi · Jan 10, 2022

Thank you.

I have a couple of problems. We don't have a Proxmox subscription and secondly one of the VM's is the sites router. If this vm is off then I lose connection to the site.
So if I need to downgrade ceph on the pve7 node to keep system uptime I will need to go this route.
Is there a document on how to downgrade?

itNGO · Jan 10, 2022

waynevaghi said:
Is there a document on how to downgrade?

Evade the PVE7 node from Cluster and reinstall 6.x from STICK/ISO and rejoin cluster..... reconfigure/install CEPH on node....

Search

Search

Monitors won't start after upgrading.

waynevaghi

Member

Attachments

dcsapak

Proxmox Staff Member

waynevaghi

Member

czechsys

Renowned Member

itNGO

Famous Member

Upgrade on each Ceph cluster node

waynevaghi

Member

itNGO

Famous Member

waynevaghi

Member

itNGO

Famous Member

We value your privacy

Monitors won't start after upgrading.

Member

Attachments

Proxmox Staff Member

Member

Renowned Member

Famous Member

Upgrade on each Ceph cluster node​

Member

Famous Member

Member

Famous Member

We value your privacy

Upgrade on each Ceph cluster node