Hi,
I've just updated a 3 nodes pve 5.0 cluster with latest luminous packages.
Everything seems to be good after upgrade and reboot but on one node I have weird syslog relative to a "osd.12 service".
But I only have 12 osds from osd.0 to osd.11.
My crush map :
If you have any idea of what's going on
Thanks in advanced guys
I've just updated a 3 nodes pve 5.0 cluster with latest luminous packages.
Everything seems to be good after upgrade and reboot but on one node I have weird syslog relative to a "osd.12 service".
Code:
Oct 12 20:51:32 dc-prox-13 systemd[1]: ceph-osd@12.service: Service hold-off time over, scheduling restart.
Oct 12 20:51:32 dc-prox-13 systemd[1]: Stopped Ceph object storage daemon osd.12.
Oct 12 20:51:32 dc-prox-13 systemd[1]: Starting Ceph object storage daemon osd.12...
Oct 12 20:51:32 dc-prox-13 systemd[1]: Started Ceph object storage daemon osd.12.
Oct 12 20:51:32 dc-prox-13 ceph-osd[7157]: 2017-10-12 20:51:32.820011 7fc26f9e4e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/ceph-12: (2) No such file or directory
Oct 12 20:51:32 dc-prox-13 systemd[1]: ceph-osd@12.service: Main process exited, code=exited, status=1/FAILURE
Oct 12 20:51:32 dc-prox-13 systemd[1]: ceph-osd@12.service: Unit entered failed state.
Oct 12 20:51:32 dc-prox-13 systemd[1]: ceph-osd@12.service: Failed with result 'exit-code'.
But I only have 12 osds from osd.0 to osd.11.
My crush map :
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host dc-prox-06 {
id -3 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
# weight 1.089
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.272
item osd.1 weight 0.272
item osd.2 weight 0.272
item osd.3 weight 0.272
}
host dc-prox-07 {
id -5 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
# weight 1.089
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.272
item osd.5 weight 0.272
item osd.6 weight 0.272
item osd.7 weight 0.272
}
host dc-prox-13 {
id -7 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
# weight 1.089
alg straw2
hash 0 # rjenkins1
item osd.8 weight 0.272
item osd.9 weight 0.272
item osd.10 weight 0.272
item osd.11 weight 0.272
}
root default {
id -1 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 3.266
alg straw2
hash 0 # rjenkins1
item dc-prox-06 weight 1.089
item dc-prox-07 weight 1.089
item dc-prox-13 weight 1.089
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
If you have any idea of what's going on
Thanks in advanced guys