[SOLVED] Ceph mon won't start after upgrade 7to8

Mmicro

Active Member
Feb 28, 2020
3
0
41
25
Hi guys

i have cluster of 3 PVE,
2 Pve upgrade fine 7 to 8,
but last one after upgrade, ceph mon won't start


i have try : Inject a monmap into the monitor like :
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/

i have try remove and rebuid mon like this tread :
https://forum.proxmox.com/threads/ceph-cant-remove-monitor-with-unknown-status.63613/
and : ceph-mon -i <ID> --mkfs

but problem still here.
can you help me plz ?

after many manipulation :
1692809193449.png
mabye my CPU are too old
1692808193613.png
1692808335318.png
1692808372555.png
1692808465203.png
my ceph conf :
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 192.168.100.1/24
     mon_allow_pool_delete = true
     mon_host = 192.168.50.12 192.168.50.13 192.168.50.11
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 192.168.50.11/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.proxmoxlan1]
     host = proxmoxlan1
     mds_standby_for_name = pve

[mds.proxmoxlan2]
     host = proxmoxlan2
     mds_standby_for_name = pve

[mds.proxmoxlan3]
     host = proxmoxlan3
     mds_standby_for_name = pve

[mon.proxmoxlan1]
     public_addr = 192.168.50.11

[mon.proxmoxlan2]
     public_addr = 192.168.50.12

[mon.proxmoxlan3]
     public_addr = 192.168.50.13

and crush map :
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class hdd
device 2 osd.2 class ssd
device 3 osd.3 class hdd
device 4 osd.4 class ssd
device 5 osd.5 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host proxmoxlan1 {
    id -3        # do not change unnecessarily
    id -4 class ssd        # do not change unnecessarily
    id -5 class hdd        # do not change unnecessarily
    # weight 1.81940
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 0.90970
    item osd.1 weight 0.90970
}
host proxmoxlan2 {
    id -7        # do not change unnecessarily
    id -8 class ssd        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 1.81940
    alg straw2
    hash 0    # rjenkins1
    item osd.2 weight 0.90970
    item osd.3 weight 0.90970
}
host proxmoxlan3 {
    id -10        # do not change unnecessarily
    id -11 class ssd        # do not change unnecessarily
    id -9 class hdd        # do not change unnecessarily
    # weight 1.81940
    alg straw2
    hash 0    # rjenkins1
    item osd.4 weight 0.90970
    item osd.5 weight 0.90970
}
root default {
    id -1        # do not change unnecessarily
    id -2 class ssd        # do not change unnecessarily
    id -12 class hdd        # do not change unnecessarily
    # weight 5.45819
    alg straw2
    hash 0    # rjenkins1
    item proxmoxlan1 weight 1.81940
    item proxmoxlan2 weight 1.81940
    item proxmoxlan3 weight 1.81940
}

# rules
rule replicated_rule {
    id 0
    type replicated
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule rule-hdd {
    id 1
    type replicated
    step take default class hdd
    step chooseleaf firstn 0 type host
    step emit
}
rule rule-ssd {
    id 2
    type replicated
    step take default class ssd
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map
 
Last edited:
As was linked in the other thread, the issue that the new version of GCC does some optimization which might not work on old CPUs: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1012935
The Ceph package for Proxmox VE 7 was compiled with an older version of GCC so it does not yet contain that optimization and thus worked.