[SOLVED] Ceph mon won't start after upgrade 7to8

Feb 28, 2020
3
0
21
24
Hi guys

i have cluster of 3 PVE,
2 Pve upgrade fine 7 to 8,
but last one after upgrade, ceph mon won't start


i have try : Inject a monmap into the monitor like :
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/

i have try remove and rebuid mon like this tread :
https://forum.proxmox.com/threads/ceph-cant-remove-monitor-with-unknown-status.63613/
and : ceph-mon -i <ID> --mkfs

but problem still here.
can you help me plz ?

after many manipulation :
1692809193449.png
mabye my CPU are too old
1692808193613.png
1692808335318.png
1692808372555.png
1692808465203.png
my ceph conf :
Code:
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 192.168.100.1/24
     mon_allow_pool_delete = true
     mon_host = 192.168.50.12 192.168.50.13 192.168.50.11
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 192.168.50.11/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.proxmoxlan1]
     host = proxmoxlan1
     mds_standby_for_name = pve

[mds.proxmoxlan2]
     host = proxmoxlan2
     mds_standby_for_name = pve

[mds.proxmoxlan3]
     host = proxmoxlan3
     mds_standby_for_name = pve

[mon.proxmoxlan1]
     public_addr = 192.168.50.11

[mon.proxmoxlan2]
     public_addr = 192.168.50.12

[mon.proxmoxlan3]
     public_addr = 192.168.50.13

and crush map :
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class hdd
device 2 osd.2 class ssd
device 3 osd.3 class hdd
device 4 osd.4 class ssd
device 5 osd.5 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host proxmoxlan1 {
    id -3        # do not change unnecessarily
    id -4 class ssd        # do not change unnecessarily
    id -5 class hdd        # do not change unnecessarily
    # weight 1.81940
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 0.90970
    item osd.1 weight 0.90970
}
host proxmoxlan2 {
    id -7        # do not change unnecessarily
    id -8 class ssd        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 1.81940
    alg straw2
    hash 0    # rjenkins1
    item osd.2 weight 0.90970
    item osd.3 weight 0.90970
}
host proxmoxlan3 {
    id -10        # do not change unnecessarily
    id -11 class ssd        # do not change unnecessarily
    id -9 class hdd        # do not change unnecessarily
    # weight 1.81940
    alg straw2
    hash 0    # rjenkins1
    item osd.4 weight 0.90970
    item osd.5 weight 0.90970
}
root default {
    id -1        # do not change unnecessarily
    id -2 class ssd        # do not change unnecessarily
    id -12 class hdd        # do not change unnecessarily
    # weight 5.45819
    alg straw2
    hash 0    # rjenkins1
    item proxmoxlan1 weight 1.81940
    item proxmoxlan2 weight 1.81940
    item proxmoxlan3 weight 1.81940
}

# rules
rule replicated_rule {
    id 0
    type replicated
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule rule-hdd {
    id 1
    type replicated
    step take default class hdd
    step chooseleaf firstn 0 type host
    step emit
}
rule rule-ssd {
    id 2
    type replicated
    step take default class ssd
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map
 
Last edited:
As was linked in the other thread, the issue that the new version of GCC does some optimization which might not work on old CPUs: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1012935
The Ceph package for Proxmox VE 7 was compiled with an older version of GCC so it does not yet contain that optimization and thus worked.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!