[SOLVED] Reinstall from scratch and licenses

Jun 25, 2019
9
1
21
54
Hi all,
Sorry for my dummy question, but I'm new in the licenced community.

I have a 3 nodes Cephfs HA cluster, with the respective 1CPU licence, v7.4.17 at this moment.

If I read correctly the manual, as I'm runnng Ceph Pacific v16, I do need to upgrade Ceph to Quincy v17 before any Proxmox major version upgrade.
Tell me if I'm wrong.
As my ceph cluster is in HEALTH_WARN state, I cant upgrade ceph v16 to v17.

Solution 1: Quick and dirty, stop the VMs, verify all the VM backups, store the node configurations, and reinstall from scratch.
Question 1: In this case, as the hardware is the same, will my licences be valid ?
Question 2: Do I have to "release" them before the reinstall from scratch ?
Question 3: If the hardware is the same, Does the Valid Server ID change when reinstalling?

Solution 2: Kill the Ceph HA Cluster (after VM backup check, of course), try to upgrade ceph, and then upgrade Proxmox.

In light of your informed experience, what do you think is the best strategy?

Best regards, and happy to roam here licenced now.
 
Why is your Ceph cluster in HEALTH_WARN state? If you can fix it, you would be able to just follow the standard upgrade path.

Please provide the output of ceph status, ceph health and maybe ceph health detail.
 
ceph status
Code:
root@node1 # ceph status
  cluster:
    id:     fe6f11d1-88d9-46c8-ab1f-8df40e04e7e4
    health: HEALTH_WARN
            1 daemons have recently crashed
 
  services:
    mon: 1 daemons, quorum px1 (age 43h)
    mgr: px1(active, since 43h)
    mds: 1/1 daemons up, 2 standby
    osd: 8 osds: 8 up (since 43h), 8 in (since 44h)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 44.01k objects, 169 GiB
    usage:   515 GiB used, 5.4 TiB / 5.9 TiB avail
    pgs:     97 active+clean
 
  io:
    client:   7.6 KiB/s wr, 0 op/s rd, 1 op/s wr

ceph health
Code:
# ceph health
HEALTH_WARN 1 daemons have recently crashed

ceph health detail
Code:
# ceph health detail
HEALTH_WARN 1 daemons have recently crashed
[WRN] RECENT_CRASH: 1 daemons have recently crashed
    osd.2 crashed on host px2 at 2023-11-01T18:31:42.334341Z

Perhaps should I erase osd.2 and recreate it ?
 
Last edited:
osd.2 destroyed, recreated
ceph crash ls-new show the ID of the crash event
ceph crash info <crash-id> show the event
ceph crash archive <crash-id> archive it

And, then, HELTH_OK status

I do the v16 to v17 migration now.
 
  • Like
Reactions: mgabriel
As soon as the Ceph cluster is HEALTH_OK, it can be updated in Ceph Quiincy, and, then, I moved each Proxmox node from version 7 to version 8. I am on the 3rd node.

I love Proxmox very much.
 
Good to see you found the solution.

I'd add some remarks and questions:
  • You have only one MON - on a three node cluster, each node should have one MON.
  • You use 8 OSDs on 3 nodes, so I assume there are 2 nodes with 3 OSDs and 1 node with only 2 OSDs. This is at least risky and should be avoided. If 1 OSD on the 2-OSD node fails, the other OSD might run in OSD_FULL status and can cause outage to your ceph.
Best practices for ceph include that you should use at least 4 disks per server and in small clusters you should use similar disk sizes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!