Monitors won't start after upgrading.

waynevaghi

Member
Jan 11, 2021
16
0
6
38
Hi All,

I am relatively new to Proxmox so any help is appreciated.
I have a 3 node Proxmox cluster. It was on version 6.3, but I upgraded to the cluster to 6.4 and everything is working correctly.
Next I upgraded one node from 6.4 to 7.1 but now the monitors won't start for the Ceph.
In the GUI, by the monitors it says a newer version was installed but the old version is still running and I must restart.
I have rebooted the node a couple of times as well as just trying to reboot the services but it doesn't help.

I have live VM's on the other 2 nodes that are used in production so I cannot upgrade the other nodes until I can migrate the VM's away but I can't do it because of this issue.

Any help on how to resolve this is welcome. I will provide some information from the system now:

root@ITAhost1:~# ceph mon dump
dumped monmap epoch 3
epoch 3
fsid 3aeca8ac-a7c5-491a-a10d-2ef1947cb0ad
last_changed 2021-05-18T19:29:36.088508+0200
created 2021-05-18T18:02:21.938621+0200
min_mon_release 15 (octopus)
0: [v2:10.18.215.11:3300/0,v1:10.18.215.11:6789/0] mon.ITAhost1
1: [v2:10.18.215.13:3300/0,v1:10.18.215.13:6789/0] mon.ITAhost3
2: [v2:10.18.215.12:3300/0,v1:10.18.215.12:6789/0] mon.ITAhost2


From the one of the hosts on 6.4:

root@ITAhost1:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.157-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-11
pve-kernel-helper: 6.4-11
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.15-pve1~bpo10
ceph-fuse: 15.2.15-pve1~bpo10
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1


From the 7.1 host:

root@ITAhost2:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.4: 6.4-11
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
openvswitch-switch: 2.15.0+ds1-2
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
 

Attachments

  • Screenshot 2022-01-10 at 12.00.10.png
    Screenshot 2022-01-10 at 12.00.10.png
    369.2 KB · Views: 2
Hi, thank you for the reply.
Yes I followed those documents but only one node has been Proxmox 7 and the new Ceph.
I can't do the other nodes as I can't migrate the VM's away because of the upgraded node being unavailable.
 
So you upgraded one node to PVE7 and upgraded ceph to Octopus too. There's the problem.

Before PVE team reply, my possible theoretical solutions :
1] downgrade ceph on the PVE7 node
or
2] stop VMs, backup VMs, upgrade rest of the cluster.

No warranty from me for any point written above.
 
Hi, thank you for the reply.
Yes I followed those documents but only one node has been Proxmox 7 and the new Ceph.
I can't do the other nodes as I can't migrate the VM's away because of the upgraded node being unavailable.
Hi,
Guide at especially: https://pve.proxmox.com/wiki/Ceph_Nautilus_to_Octopus says:

Upgrade on each Ceph cluster node​

Upgrade all your nodes with the following commands. It will upgrade the Ceph on your node to Octopus.

apt update
apt full-upgrade

After the update you still run the old Nautilus binaries.

So I guess first step is to bring all Nodes to Octopus while they are still PVE 6.3 nodes.... and then Update PVE 6 to 7...

Not sure if you can continue CEPH update while one Node is already on PVE7 and not joining CEPH....

Maybe better restore 3rd node from Backup to 6.3 or reinstall and rejoin the cluster......
 
Thank you for the reply.

So my options it seems is to downgrade ceph on the upgraded node or upgrade ceph on the other nodes and then ceph will function again on all nodes and I can do the rest of the PVE 6.3 upgrades to 7.

I would assume the ceph upgrade is disruptive to the vm's on the node?
 
Thank you for the reply.

So my options it seems is to downgrade ceph on the upgraded node or upgrade ceph on the other nodes and then ceph will function again on all nodes and I can do the rest of the PVE 6.3 upgrades to 7.

I would assume the ceph upgrade is disruptive to the vm's on the node?
In the state where your cluster currently is I would at first do a fresh Backup.
Ans consider to do the rest of the Upgrade while the VMs are turned OFF.
But maybe "staff" has a better idea what to do next. If you have a Proxmox-Subscription I would call support directly about this. Cause sound like VMs are important, so better be save than sorry...
 
Thank you.

I have a couple of problems. We don't have a Proxmox subscription and secondly one of the VM's is the sites router. If this vm is off then I lose connection to the site.
So if I need to downgrade ceph on the pve7 node to keep system uptime I will need to go this route.
Is there a document on how to downgrade?
 
Is there a document on how to downgrade?
Evade the PVE7 node from Cluster and reinstall 6.x from STICK/ISO and rejoin cluster..... reconfigure/install CEPH on node....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!