CEPH Probleme nach upgrade von Proxmox 5 auf 7.4.

ChristophMolls · Sep 9, 2023

Hallo zusammen!

Ich habe gestern Abend angefangen von 5.x auf 7.4 zu Patchen.
Ich bin dabei gemäß den guides "5 to 6" und "6 to 7" vorgegangen.
Das upgrade von 5.x auf 7.4 mit Ceph in den entsprechenden zwischenversionen lief problemlos.
7.4 mit CEPH Pacific lief ebenfalls noch.

Da mein Home-lab auf quincy läuft, wollte ich ceph noch updaten...
Erst mit dem update von Ceph von Pacific auf quincy sind die OSDs down und das Storage natürllich nicht verfügbar.
Einige OSDs stehen auf "down/in" andere auf "up/in" andere werden als "unknown" gemeldet.

Es sind 4 nodes mit jeweils einer PCIe NVMe und 4 drehenden Platten.
Die 4 NVMe sind als SSD Storage in CEPH angelegt und die drehenden als 2. HDD Pool.

Ich fahre gerade den Cluster wieder hoch um mit details dienen zu können.

Was benötigt Ihr?

Bin für jede hilfe dankbar!

----Edit---

Screenshot der OSDs

pveversion -v :

Code:

root@pve6:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.4: 6.4-20
pve-kernel-5.0: 6.0-11
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 17.2.6-pve1
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

ChristophMolls · Sep 9, 2023

Ich habe nun die "out" OSDs per GUI wieder auf "in" gesetzt.
Nun sieht es so aus (siehe Screenshot)

Trotzdem kommen die OSDs nicht wieder zurück...

---EDIT---

Ceph Config:

Code:

[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 192.168.1.6/8
     fsid = c4a86bb1-b010-4195-b73e-19bbf1684f25
     mon_allow_pool_delete = true
     mon_host = 10.0.1.30 10.0.1.31 10.0.1.32
     ms_bind_ipv4 = true
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.0.1.35/16

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve6]
     public_addr = 10.0.1.30

[mon.pve7]
     public_addr = 10.0.1.31

[mon.pve8]
     public_addr = 10.0.1.32

CRUSH MAP

Code:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host pve5 {
    id -3        # do not change unnecessarily
    id -4 class ssd        # do not change unnecessarily
    id -11 class hdd        # do not change unnecessarily
    # weight 24.74297
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 2.91100
    item osd.16 weight 5.45799
    item osd.17 weight 5.45799
    item osd.18 weight 5.45799
    item osd.19 weight 5.45799
}
host pve6 {
    id -5        # do not change unnecessarily
    id -6 class ssd        # do not change unnecessarily
    id -12 class hdd        # do not change unnecessarily
    # weight 24.74297
    alg straw2
    hash 0    # rjenkins1
    item osd.1 weight 2.91100
    item osd.12 weight 5.45799
    item osd.13 weight 5.45799
    item osd.14 weight 5.45799
    item osd.15 weight 5.45799
}
host pve7 {
    id -7        # do not change unnecessarily
    id -8 class ssd        # do not change unnecessarily
    id -13 class hdd        # do not change unnecessarily
    # weight 24.74297
    alg straw2
    hash 0    # rjenkins1
    item osd.2 weight 2.91100
    item osd.4 weight 5.45799
    item osd.5 weight 5.45799
    item osd.6 weight 5.45799
    item osd.7 weight 5.45799
}
host pve8 {
    id -9        # do not change unnecessarily
    id -10 class ssd        # do not change unnecessarily
    id -14 class hdd        # do not change unnecessarily
    # weight 24.74297
    alg straw2
    hash 0    # rjenkins1
    item osd.3 weight 2.91100
    item osd.9 weight 5.45799
    item osd.10 weight 5.45799
    item osd.11 weight 5.45799
    item osd.8 weight 5.45799
}
root default {
    id -1        # do not change unnecessarily
    id -2 class ssd        # do not change unnecessarily
    id -15 class hdd        # do not change unnecessarily
    # weight 98.97186
    alg straw2
    hash 0    # rjenkins1
    item pve5 weight 24.74297
    item pve6 weight 24.74297
    item pve7 weight 24.74297
    item pve8 weight 24.74297
}

# rules
rule replicated_rule {
    id 0
    type replicated
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule onlyNvme {
    id 1
    type replicated
    step take default class ssd
    step chooseleaf firstn 0 type host
    step emit
}
rule onlyHdd {
    id 2
    type replicated
    step take default class hdd
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

CEPH Probleme nach upgrade von Proxmox 5 auf 7.4.

ChristophMolls

Member

ChristophMolls

Member

Attachments

We value your privacy