Ceph , Logeinträge Bedeutung

ITT

Renowned Member
Mar 19, 2021
437
109
63
44
Kann mir wer die Bedeutung folgender Logeinträge erklären?
Diese treten immer nach einem reboot eines Nodes auf:

Dec 14 17:50:45 c01-n03 ceph-osd[3191]: 2021-12-14T17:50:45.729+0100 7f064c89df00 -1 osd.7 3318 log_to_monitors {default=true} Dec 14 17:50:45 c01-n03 ceph-osd[3191]: 2021-12-14T17:50:45.729+0100 7f064c89df00 -1 osd.7 3318 mon_cmd_maybe_osd_create fail: 'osd.7 has already bound to class 'nvme', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy Dec 14 17:50:45 c01-n03 ceph-osd[3191]: 2021-12-14T17:50:45.733+0100 7f0644da6700 -1 osd.7 3318 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory Dec 14 17:50:58 c01-n03 ceph-osd[3179]: 2021-12-14T17:50:58.248+0100 7f5d436d9f00 -1 osd.8 3318 log_to_monitors {default=true} Dec 14 17:50:58 c01-n03 ceph-osd[3179]: 2021-12-14T17:50:58.252+0100 7f5d436d9f00 -1 osd.8 3318 mon_cmd_maybe_osd_create fail: 'osd.8 has already bound to class 'nvme', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy Dec 14 17:50:58 c01-n03 ceph-osd[3179]: 2021-12-14T17:50:58.256+0100 7f5d3bbe2700 -1 osd.8 3318 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory Dec 14 17:51:15 c01-n03 ceph-osd[3183]: 2021-12-14T17:51:15.632+0100 7fb3c59dff00 -1 osd.6 3318 log_to_monitors {default=true} Dec 14 17:51:15 c01-n03 ceph-osd[3183]: 2021-12-14T17:51:15.636+0100 7fb3c59dff00 -1 osd.6 3318 mon_cmd_maybe_osd_create fail: 'osd.6 has already bound to class 'nvme', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy Dec 14 17:51:15 c01-n03 ceph-osd[3183]: 2021-12-14T17:51:15.640+0100 7fb3bdee8700 -1 osd.6 3318 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory Dec 14 17:55:05 c01-n03 pmxcfs[3047]: [status] notice: received log
 
Hi,
von was für Hardware, SSDs, nVMEs etc reden wir denn hier?
Deine CRUSH-Map wäre sicherlich auch hilfreich.
 
3 Nodes bestehend aus je:

Supermicro AS -1114S-WN10RT/H12SSW-NTR
CPU: AMD EPYC 7443P
RAM: 512 GB
NVMe: 3 * KIOXIA CD6
SSD: 2 * Micron MTFDDAK
Dual-NWK: Mellanox ConnectX-5 Ex
Dual-NWK: Intel X550T
Dual-NWK: BCM57416 NetXtreme-E

Switche alle redundant (Mellanox) usw....

Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class nvme
device 1 osd.1 class nvme
device 2 osd.2 class nvme
device 3 osd.3 class nvme
device 4 osd.4 class nvme
device 5 osd.5 class nvme
device 6 osd.6 class nvme
device 7 osd.7 class nvme
device 8 osd.8 class nvme

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host c01-n01 {
    id -3        # do not change unnecessarily
    id -4 class nvme        # do not change unnecessarily
    # weight 17.466
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 5.822
    item osd.1 weight 5.822
    item osd.2 weight 5.822
}
host c01-n02 {
    id -5        # do not change unnecessarily
    id -6 class nvme        # do not change unnecessarily
    # weight 17.466
    alg straw2
    hash 0    # rjenkins1
    item osd.3 weight 5.822
    item osd.4 weight 5.822
    item osd.5 weight 5.822
}
host c01-n03 {
    id -7        # do not change unnecessarily
    id -8 class nvme        # do not change unnecessarily
    # weight 17.466
    alg straw2
    hash 0    # rjenkins1
    item osd.6 weight 5.822
    item osd.7 weight 5.822
    item osd.8 weight 5.822
}
root default {
    id -1        # do not change unnecessarily
    id -2 class nvme        # do not change unnecessarily
    # weight 52.397
    alg straw2
    hash 0    # rjenkins1
    item c01-n01 weight 17.466
    item c01-n02 weight 17.466
    item c01-n03 weight 17.466
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map
 
Last edited:
Code:
ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME         STATUS  REWEIGHT  PRI-AFF
-1         52.39709  root default                               
-3         17.46570      host c01-n01                           
 0   nvme   5.82190          osd.0         up   1.00000  1.00000
 1   nvme   5.82190          osd.1         up   1.00000  1.00000
 2   nvme   5.82190          osd.2         up   1.00000  1.00000
-5         17.46570      host c01-n02                           
 3   nvme   5.82190          osd.3         up   1.00000  1.00000
 4   nvme   5.82190          osd.4         up   1.00000  1.00000
 5   nvme   5.82190          osd.5         up   1.00000  1.00000
-7         17.46570      host c01-n03                           
 6   nvme   5.82190          osd.6         up   1.00000  1.00000
 7   nvme   5.82190          osd.7         up   1.00000  1.00000
 8   nvme   5.82190          osd.8         up   1.00000  1.00000
 
Schau dir mal bitte den Thread an.....
https://forum.proxmox.com/threads/fragen-zu-ceph-nach-upgrade-5-4-auf-6-0.56209/
Da geht es darum das CEPH die Disks "falsch" erkennt.... ggf. hilft es dir die Klasse der Disks manuell zu setzen....

Code:
'osd.7 has already bound to class 'nvme', can not reset class to 'ssd'; use 'ceph osd crush rm-device-class <id>' to remove old class first': (16) Device or resource busy
Dec 14 17:50:45 c01-n03 ceph-osd[3191]: 2021-12-14T17:50:45.733+0100 7f0644da6700 -1 osd.7 3318 set_numa_affinity unable to identify public interface '' numa node: (2) No such file or directory

Das NUMA-Thema wird in dem Thread auch behandelt....