Help for old cluster

jan010

Member
Jul 25, 2022
14
0
6
Hy guys,
a few months ago, I inherited from the old IT manager who followed a company a cluster (3 nodes) with proxmox 5.4 and ceph luminous (YES... so old).
I read the guides "Upgrade from 5.x to 6.0" and "Ceph Luminous to Nautilus" but, before upgrading I would like to ask some of you if and what problems I could encounter. Problems with the repos, with the names of the nics, others?
After switching to version 6 I would like to bring pve at least to 7 and Ceph Pacific.
Being a production cluster with about 12 vm I would not want to block everything.
Thanks for any valuable information.
- pve-manager/5.4-15/d0ec33c6 (running kernel: 4.15.18-30-pve)
- ceph version 12.2.13 luminous (stable)

 
I haven't been in this space long enough to have seen much of those old versions, but maybe a few general tips (some might be obvious though)

- Have backups and check that the backups are working.
- Know that you can run proxmox itself as a VM too, so you could install proxmox 5 and Lumnious in a test-cluster (either on proxmox itself or some other virtualization-software) to "test out" the process.
- I personally prefer to "override" the default port names [1], so that even if an upgrade changes what the system can see about the device, it doesn't change the name (anymore), so maybe you can do that before you start upgrading, so you at least know that that won't be an issue in your cluster.
- Upgrade one of the nodes at the time, and you can migrate VM's between different versions, but probably best to only migrate "up" (so from a 5.4 to a 5.5 system and not from 5.5 to 5.4). It SHOULD work mind you, just if you can avoid it, why run the risk.

[1] https://pve.proxmox.com/wiki/Network_Configuration#_naming_conventions section "Overriding network device names", I usually name the ethernet-device after the device it is in and the port on it's device (so end1p2 for device 1 port 2 for example, the "en" makes it so that proxmox can see the device still)
 
I haven't been in this space long enough to have seen much of those old versions, but maybe a few general tips (some might be obvious though)

- Have backups and check that the backups are working.
- Know that you can run proxmox itself as a VM too, so you could install proxmox 5 and Lumnious in a test-cluster (either on proxmox itself or some other virtualization-software) to "test out" the process.
- I personally prefer to "override" the default port names [1], so that even if an upgrade changes what the system can see about the device, it doesn't change the name (anymore), so maybe you can do that before you start upgrading, so you at least know that that won't be an issue in your cluster.
- Upgrade one of the nodes at the time, and you can migrate VM's between different versions, but probably best to only migrate "up" (so from a 5.4 to a 5.5 system and not from 5.5 to 5.4). It SHOULD work mind you, just if you can avoid it, why run the risk.

[1] https://pve.proxmox.com/wiki/Network_Configuration#_naming_conventions section "Overriding network device names", I usually name the ethernet-device after the device it is in and the port on it's device (so end1p2 for device 1 port 2 for example, the "en" makes it so that proxmox can see the device still)
thanks for your suggestions and sorry for my delay...
- backup VM and node configuration (etc/pve)... yes of course
- i dont know if is possible for me... i need to find a test enviroment... but is right, thanks.
- "I personally prefer to "override" the default port names"... yes, me too.

I hope that there aren't problems with the repos so old (debian and Pve)... thanks.
 
I think the biggest problem will be the availability of the repositories: those for pve 6.4 are still available, but for debian "buster" I will have to use the archives in source.list, (like deb http://archive.debian.org/debian/ buster main), right?
And for ceph?
I'm preparing a similar cluster on pve (nodes as VM) to perform an upgrade test, but I can't install the "old" ceph luminous... will I have to download the packages manually? which ones?

thanks to anyone who can help me
 
Proxmox provides Ceph packages for some time now. Luminous are at http://download.proxmox.com/debian/ceph-luminous. Something like "deb [URL]http://download.proxmox.com/debian/ceph-luminous[/URL] buster main" should work to install it again in your staging environment.

Did that same upgrade path from 5.x twice, on two different clusters both with Ceph Luminous. One is currently on PVE7.4, the other is on PVE8.2. Don't remember any mayor issues. AFAIR, there are some Ceph settings that had to be changed in the process, so be very sure you follow the docs in each upgrade both for PVE and for Ceph.

If you use containers on PVE7, keep in mind CGroup2 was introduced and it's incompatible with some versions of systemd of older Linux distros (i.e. Ubuntu 16.04) [1]. There's a compatibility mode for PVE7 that is planned to be removed in future kernel versions, so plan to update/replace those containers at some point.

For VM's, with QEMU5.2 (introduced with PVE6.4) a new virtual hardware layout was introduced [2] and usually Windows detects "new" network cards and you have to manually move the IP configuration from the "old" to the "new". If you are on QEMU5.2 this will not affect on the upgrade.

[1] https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Old_Container_and_CGroupv2
[2] https://pve.proxmox.com/wiki/Upgrad...tual_Machines_with_Windows_and_Static_Network
 
Proxmox provides Ceph packages for some time now. Luminous are at http://download.proxmox.com/debian/ceph-luminous. Something like "deb [URL]http://download.proxmox.com/debian/ceph-luminous[/URL] buster main" should work to install it again in your staging environment.

Did that same upgrade path from 5.x twice, on two different clusters both with Ceph Luminous. One is currently on PVE7.4, the other is on PVE8.2. Don't remember any mayor issues. AFAIR, there are some Ceph settings that had to be changed in the process, so be very sure you follow the docs in each upgrade both for PVE and for Ceph.

If you use containers on PVE7, keep in mind CGroup2 was introduced and it's incompatible with some versions of systemd of older Linux distros (i.e. Ubuntu 16.04) [1]. There's a compatibility mode for PVE7 that is planned to be removed in future kernel versions, so plan to update/replace those containers at some point.

For VM's, with QEMU5.2 (introduced with PVE6.4) a new virtual hardware layout was introduced [2] and usually Windows detects "new" network cards and you have to manually move the IP configuration from the "old" to the "new". If you are on QEMU5.2 this will not affect on the upgrade.

[1] https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Old_Container_and_CGroupv2
[2] https://pve.proxmox.com/wiki/Upgrad...tual_Machines_with_Windows_and_Static_Network

Hi,
Thanks for your valuable information, I had read on the ceph guide about some changes as well as for pve (corosync 2 to 3).
The network card ip change has already happened to me with other upgrades (without ceph).
There are no container in this cluster.

In the next few days I will try and then I will update you.

Thanks a lot
 
Corosync changed from v2 to v3 with the upgrade of PVE3 to PVE4. It should had been upgraded at that point. A new install of PVE4 or 5 will setup corosync v3.

Make 100% sure you are using corosync v3 before anything else, as there were breaking changes and you may need to adapt the config file manually to restore quorum after the upgrade. Also fully remove HA from all VMs before playing with corosync/quorum. Luckly for you, corosync does not affect Ceph quorum. Ah! backup as much and as often as possible, just in case!
 
Corosync changed from v2 to v3 with the upgrade of PVE3 to PVE4. It should had been upgraded at that point. A new install of PVE4 or 5 will setup corosync v3.

Make 100% sure you are using corosync v3 before anything else, as there were breaking changes and you may need to adapt the config file manually to restore quorum after the upgrade. Also fully remove HA from all VMs before playing with corosync/quorum. Luckly for you, corosync does not affect Ceph quorum. Ah! backup as much and as often as possible, just in case!

In this cluster corosync is 2.4.4 (from pveversio):

proxmox-ve: 5.4-2 (running kernel: 4.15.18-30-pve)
pve-manager: 5.4-15 (running version: 5.4-15/d0ec33c6)
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-12-pve: 4.15.18-36
corosync: 2.4.4-pve1

And yes... Like in the 5to6 upgrade guide "always upgrade to Corosync 3 first"

thanks
 
I was too curious, so I wanted to try to install ceph but it seems that there is some problem with the repo/packages...
now, unfortunately, I have to stop, tomorrow if I can find some time I will try again.

Code:
start installation
Reading package lists... Done
Building dependency tree       
Reading state information... Done
ceph-fuse is already the newest version (12.2.13-pve1~bpo9).
gdisk is already the newest version (1.0.1-1).
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 ceph : Depends: ceph-mgr (= 12.2.13-pve1~bpo9) but it is not going to be installed
        Depends: ceph-mon (= 12.2.13-pve1~bpo9) but it is not going to be installed
        Depends: ceph-osd (= 12.2.13-pve1~bpo9) but it is not going to be installed
 ceph-common : Depends: librbd1 (= 12.2.13-pve1~bpo9) but 10.2.11-2 is to be installed
               Depends: python-cephfs (= 12.2.13-pve1~bpo9) but 10.2.11-2 is to be installed
               Depends: python-prettytable but it is not installable
               Depends: python-rados (= 12.2.13-pve1~bpo9) but 10.2.11-2 is to be installed
               Depends: python-rbd (= 12.2.13-pve1~bpo9) but 10.2.11-2 is to be installed
               Depends: python-rgw (= 12.2.13-pve1~bpo9) but it is not going to be installed
               Depends: libcurl3 (>= 7.28.0) but it is not installable
               Depends: libleveldb1v5 but it is not installable
 ceph-mds : Depends: ceph-base (= 12.2.13-pve1~bpo9) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
apt failed during ceph installation (25600)
 
Seems your repos aren't properly configured and it is trying to install Ceph Jewel v10.2 instead of Ceph Luminous v12.2. Check apt repos and check with apt-cache policy what and from it is trying to get Ceph packages.
 
Seems your repos aren't properly configured and it is trying to install Ceph Jewel v10.2 instead of Ceph Luminous v12.2. Check apt repos and check with apt-cache policy what and from it is trying to get Ceph packages.
Hi,
thanks for the info, but I can't try, I'll be out of the office until September 27th.
I'll update you.
thanks a lot
 
whats the output of
cat /etc/*release
/etc/apt/sources.list
/etc/apt/sources.list.d/* (all files and content)
Hi,
thanks for the info, but I can't try, I'll be out of the office until September 27th.
I'll update you.
thanks a lot
 
Hi guys,
thanks to your suggestions and a bit of work on the repos, Luminous has installed.
Now I will try to reconfigure Ceph as the cluster in production so in the next few days I will try to update.

Thanks to all... I will keep you updated
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!