Help for old cluster

jan010 · Aug 28, 2024

Hy guys,
a few months ago, I inherited from the old IT manager who followed a company a cluster (3 nodes) with proxmox 5.4 and ceph luminous (YES... so old).
I read the guides "Upgrade from 5.x to 6.0" and "Ceph Luminous to Nautilus" but, before upgrading I would like to ask some of you if and what problems I could encounter. Problems with the repos, with the names of the nics, others?
After switching to version 6 I would like to bring pve at least to 7 and Ceph Pacific.
Being a production cluster with about 12 vm I would not want to block everything.
Thanks for any valuable information.
- pve-manager/5.4-15/d0ec33c6 (running kernel: 4.15.18-30-pve)
- ceph version 12.2.13 luminous (stable)

sw-omit · Aug 28, 2024

I haven't been in this space long enough to have seen much of those old versions, but maybe a few general tips (some might be obvious though)

- Have backups and check that the backups are working.
- Know that you can run proxmox itself as a VM too, so you could install proxmox 5 and Lumnious in a test-cluster (either on proxmox itself or some other virtualization-software) to "test out" the process.
- I personally prefer to "override" the default port names [1], so that even if an upgrade changes what the system can see about the device, it doesn't change the name (anymore), so maybe you can do that before you start upgrading, so you at least know that that won't be an issue in your cluster.
- Upgrade one of the nodes at the time, and you can migrate VM's between different versions, but probably best to only migrate "up" (so from a 5.4 to a 5.5 system and not from 5.5 to 5.4). It SHOULD work mind you, just if you can avoid it, why run the risk.

[1] https://pve.proxmox.com/wiki/Network_Configuration#_naming_conventions section "Overriding network device names", I usually name the ethernet-device after the device it is in and the port on it's device (so end1p2 for device 1 port 2 for example, the "en" makes it so that proxmox can see the device still)

jan010 · Sep 2, 2024

sw-omit said:
I haven't been in this space long enough to have seen much of those old versions, but maybe a few general tips (some might be obvious though)

- Have backups and check that the backups are working.
- Know that you can run proxmox itself as a VM too, so you could install proxmox 5 and Lumnious in a test-cluster (either on proxmox itself or some other virtualization-software) to "test out" the process.
- I personally prefer to "override" the default port names [1], so that even if an upgrade changes what the system can see about the device, it doesn't change the name (anymore), so maybe you can do that before you start upgrading, so you at least know that that won't be an issue in your cluster.
- Upgrade one of the nodes at the time, and you can migrate VM's between different versions, but probably best to only migrate "up" (so from a 5.4 to a 5.5 system and not from 5.5 to 5.4). It SHOULD work mind you, just if you can avoid it, why run the risk.

[1] https://pve.proxmox.com/wiki/Network_Configuration#_naming_conventions section "Overriding network device names", I usually name the ethernet-device after the device it is in and the port on it's device (so end1p2 for device 1 port 2 for example, the "en" makes it so that proxmox can see the device still)

thanks for your suggestions and sorry for my delay...
- backup VM and node configuration (etc/pve)... yes of course
- i dont know if is possible for me... i need to find a test enviroment... but is right, thanks.
- "I personally prefer to "override" the default port names"... yes, me too.

I hope that there aren't problems with the repos so old (debian and Pve)... thanks.

jan010 · Sep 16, 2024

I think the biggest problem will be the availability of the repositories: those for pve 6.4 are still available, but for debian "buster" I will have to use the archives in source.list, (like deb http://archive.debian.org/debian/ buster main), right?
And for ceph?
I'm preparing a similar cluster on pve (nodes as VM) to perform an upgrade test, but I can't install the "old" ceph luminous... will I have to download the packages manually? which ones?

thanks to anyone who can help me

VictorSTS · Sep 16, 2024

Proxmox provides Ceph packages for some time now. Luminous are at http://download.proxmox.com/debian/ceph-luminous. Something like "deb [URL]http://download.proxmox.com/debian/ceph-luminous[/URL] buster main" should work to install it again in your staging environment.

Did that same upgrade path from 5.x twice, on two different clusters both with Ceph Luminous. One is currently on PVE7.4, the other is on PVE8.2. Don't remember any mayor issues. AFAIR, there are some Ceph settings that had to be changed in the process, so be very sure you follow the docs in each upgrade both for PVE and for Ceph.

If you use containers on PVE7, keep in mind CGroup2 was introduced and it's incompatible with some versions of systemd of older Linux distros (i.e. Ubuntu 16.04) [1]. There's a compatibility mode for PVE7 that is planned to be removed in future kernel versions, so plan to update/replace those containers at some point.

For VM's, with QEMU5.2 (introduced with PVE6.4) a new virtual hardware layout was introduced [2] and usually Windows detects "new" network cards and you have to manually move the IP configuration from the "old" to the "new". If you are on QEMU5.2 this will not affect on the upgrade.

[1] https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Old_Container_and_CGroupv2
[2] https://pve.proxmox.com/wiki/Upgrad...tual_Machines_with_Windows_and_Static_Network

jan010 · Sep 16, 2024

VictorSTS said:
Proxmox provides Ceph packages for some time now. Luminous are at http://download.proxmox.com/debian/ceph-luminous. Something like "deb [URL]http://download.proxmox.com/debian/ceph-luminous[/URL] buster main" should work to install it again in your staging environment.

Did that same upgrade path from 5.x twice, on two different clusters both with Ceph Luminous. One is currently on PVE7.4, the other is on PVE8.2. Don't remember any mayor issues. AFAIR, there are some Ceph settings that had to be changed in the process, so be very sure you follow the docs in each upgrade both for PVE and for Ceph.

If you use containers on PVE7, keep in mind CGroup2 was introduced and it's incompatible with some versions of systemd of older Linux distros (i.e. Ubuntu 16.04) [1]. There's a compatibility mode for PVE7 that is planned to be removed in future kernel versions, so plan to update/replace those containers at some point.

For VM's, with QEMU5.2 (introduced with PVE6.4) a new virtual hardware layout was introduced [2] and usually Windows detects "new" network cards and you have to manually move the IP configuration from the "old" to the "new". If you are on QEMU5.2 this will not affect on the upgrade.

[1] https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Old_Container_and_CGroupv2
[2] https://pve.proxmox.com/wiki/Upgrad...tual_Machines_with_Windows_and_Static_Network

Hi,
Thanks for your valuable information, I had read on the ceph guide about some changes as well as for pve (corosync 2 to 3).
The network card ip change has already happened to me with other upgrades (without ceph).
There are no container in this cluster.

In the next few days I will try and then I will update you.

Thanks a lot

VictorSTS · Sep 16, 2024

Corosync changed from v2 to v3 with the upgrade of PVE3 to PVE4. It should had been upgraded at that point. A new install of PVE4 or 5 will setup corosync v3.

Make 100% sure you are using corosync v3 before anything else, as there were breaking changes and you may need to adapt the config file manually to restore quorum after the upgrade. Also fully remove HA from all VMs before playing with corosync/quorum. Luckly for you, corosync does not affect Ceph quorum. Ah! backup as much and as often as possible, just in case!

jan010 · Sep 16, 2024

VictorSTS said:
Corosync changed from v2 to v3 with the upgrade of PVE3 to PVE4. It should had been upgraded at that point. A new install of PVE4 or 5 will setup corosync v3.

Make 100% sure you are using corosync v3 before anything else, as there were breaking changes and you may need to adapt the config file manually to restore quorum after the upgrade. Also fully remove HA from all VMs before playing with corosync/quorum. Luckly for you, corosync does not affect Ceph quorum. Ah! backup as much and as often as possible, just in case!

In this cluster corosync is 2.4.4 (from pveversio):

proxmox-ve: 5.4-2 (running kernel: 4.15.18-30-pve)
pve-manager: 5.4-15 (running version: 5.4-15/d0ec33c6)
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-12-pve: 4.15.18-36
corosync: 2.4.4-pve1

And yes... Like in the 5to6 upgrade guide "always upgrade to Corosync 3 first"

thanks

jan010 · Sep 16, 2024

I was too curious, so I wanted to try to install ceph but it seems that there is some problem with the repo/packages...
now, unfortunately, I have to stop, tomorrow if I can find some time I will try again.

Code:

start installation
Reading package lists... Done
Building dependency tree       
Reading state information... Done
ceph-fuse is already the newest version (12.2.13-pve1~bpo9).
gdisk is already the newest version (1.0.1-1).
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 ceph : Depends: ceph-mgr (= 12.2.13-pve1~bpo9) but it is not going to be installed
        Depends: ceph-mon (= 12.2.13-pve1~bpo9) but it is not going to be installed
        Depends: ceph-osd (= 12.2.13-pve1~bpo9) but it is not going to be installed
 ceph-common : Depends: librbd1 (= 12.2.13-pve1~bpo9) but 10.2.11-2 is to be installed
               Depends: python-cephfs (= 12.2.13-pve1~bpo9) but 10.2.11-2 is to be installed
               Depends: python-prettytable but it is not installable
               Depends: python-rados (= 12.2.13-pve1~bpo9) but 10.2.11-2 is to be installed
               Depends: python-rbd (= 12.2.13-pve1~bpo9) but 10.2.11-2 is to be installed
               Depends: python-rgw (= 12.2.13-pve1~bpo9) but it is not going to be installed
               Depends: libcurl3 (>= 7.28.0) but it is not installable
               Depends: libleveldb1v5 but it is not installable
 ceph-mds : Depends: ceph-base (= 12.2.13-pve1~bpo9) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
apt failed during ceph installation (25600)

VictorSTS · Sep 17, 2024

Seems your repos aren't properly configured and it is trying to install Ceph Jewel v10.2 instead of Ceph Luminous v12.2. Check apt repos and check with apt-cache policy what and from it is trying to get Ceph packages.

alexskysilk · Sep 17, 2024

jan010 said:
so I wanted to try to install ceph but it seems that there is some problem with the repo/packages...

whats the output of
cat /etc/*release
/etc/apt/sources.list
/etc/apt/sources.list.d/* (all files and content)

jan010 · Sep 24, 2024

VictorSTS said:
Seems your repos aren't properly configured and it is trying to install Ceph Jewel v10.2 instead of Ceph Luminous v12.2. Check apt repos and check with apt-cache policy what and from it is trying to get Ceph packages.

Hi,
thanks for the info, but I can't try, I'll be out of the office until September 27th.
I'll update you.
thanks a lot

jan010 · Sep 24, 2024

alexskysilk said:
whats the output of
cat /etc/*release
/etc/apt/sources.list
/etc/apt/sources.list.d/* (all files and content)

Hi,
thanks for the info, but I can't try, I'll be out of the office until September 27th.
I'll update you.
thanks a lot

jan010 · Sep 29, 2024

Hi guys,
thanks to your suggestions and a bit of work on the repos, Luminous has installed.
Now I will try to reconfigure Ceph as the cluster in production so in the next few days I will try to update.

Thanks to all... I will keep you updated

Search

Search

Help for old cluster

jan010

Member

sw-omit

Well-Known Member

jan010

Member

jan010

Member

VictorSTS

Distinguished Member

jan010

Member

VictorSTS

Distinguished Member

jan010

Member

jan010

Member

VictorSTS

Distinguished Member

alexskysilk

Distinguished Member

jan010

Member

jan010

Member

jan010

Member

We value your privacy

Help for old cluster

Member

​

Well-Known Member

Member

Member

Distinguished Member

Member

Distinguished Member

Member

Member

Distinguished Member

Distinguished Member

Member

Member

Member

We value your privacy