Hi all,
And happy new year to all.
I just took advantage of the new year period, where there were few people in the lab, to finally upgrade my 5.4 clusters to 6.1. I tested the procedure on test clusters, and all went fine. I followed the guide to upgrade from 5.4 to 6.0. I have also ceph clusters, and I followed the guide to upgrade from luminous to nautilus.
https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus
I then went to apply the procedure on a production cluster, and unfortunetaly, I made a little mistake. I forgot to apply the two commands :
ceph-volume simple scan
ceph-volume simple activate --all
on one node. Then I got the error :
So I have the warning message I mentionned in the title. I thought I had applied the same commands on every nodes, but when comparing the history, was forced to conclude it was not the case. I think it is the reason why I get the warning on legacy tunables, because these OSDs have not been upgraded to Nautilus. And then I issued the command : 'ceph osd require-osd-release nautilus'.
It is a three node cluster, with each 8 OSDs. OSDs are filestore type yet. On the first and third node, command history is similar :
But on the second, I did not have the two ceph-volume commands :
So I tried to apply the command 'ceph-volume simple scan' again on node 2, but got this time an error :
At this point I need some advices. Is it still possible to upgrade the OSDs to nautilus (without removing re re-adding the OSDs) ? Should I revert the command 'ceph osd require-osd-release nautilus' to luminous in order to do so ? Is there another possibility ?
Here is the osd tree. I am now in Proxmox 6.1, latest version...
As I had a warning, I entered the command 'ceph config set mon mon_crush_min_required_version firefly', but it did not help.
As you perhaps noted, I have also a little problem with managers, as ceph -s does not display standbys managers. It was already the case on a test cluster (also filestore).
I would be grateful if someone can help solve this issue (which is my fault...).
Thanks in advance
Alain
And happy new year to all.
I just took advantage of the new year period, where there were few people in the lab, to finally upgrade my 5.4 clusters to 6.1. I tested the procedure on test clusters, and all went fine. I followed the guide to upgrade from 5.4 to 6.0. I have also ceph clusters, and I followed the guide to upgrade from luminous to nautilus.
https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus
I then went to apply the procedure on a production cluster, and unfortunetaly, I made a little mistake. I forgot to apply the two commands :
ceph-volume simple scan
ceph-volume simple activate --all
on one node. Then I got the error :
Code:
root@prox1orsay:~# ceph -s
cluster:
id: b5a08127-b65a-430c-ad34-810752429977
health: HEALTH_WARN
crush map has legacy tunables (require firefly, min is hammer)
services:
mon: 3 daemons, quorum 0,1,2 (age 17s)
mgr: prox1orsay(active, since 35m)
osd: 24 osds: 24 up, 24 in
data:
pools: 3 pools, 1188 pgs
objects: 169.70k objects, 658 GiB
usage: 1.9 TiB used, 9.0 TiB / 11 TiB avail
pgs: 1188 active+clean
io:
client: 90 KiB/s wr, 0 op/s rd, 7 op/s wr
So I have the warning message I mentionned in the title. I thought I had applied the same commands on every nodes, but when comparing the history, was forced to conclude it was not the case. I think it is the reason why I get the warning on legacy tunables, because these OSDs have not been upgraded to Nautilus. And then I issued the command : 'ceph osd require-osd-release nautilus'.
It is a three node cluster, with each 8 OSDs. OSDs are filestore type yet. On the first and third node, command history is similar :
Code:
468 ceph osd set noout
469 ceph osd dump | grep ^flags
470 cd /etc/pve
471 mc
472 sed -i 's/luminous/nautilus/' /etc/apt/sources.list.d/ceph.list
473 cd
474 cat /etc/apt/sources.list.d/ceph.list
475 apt update
476 apt list --upgradable
477 apt-dist upgrade
478 apt dist-upgrade
479 systemctl restart ceph-mon.target
480 ceph mon dump | grep min_mon_release
481 systemctl restart ceph-mgr.target
482 ceph -s
483 ceph -s
484 ceph -s
485 systemctl restart ceph-osd.target
486 ceph-volume simple scan
487 ceph-volume simple activate --all
488 ceph -s
489 ceph osd require-osd-release nautilus
490 cep osd unset noout
491 ceph osd unset noout
492 ceph -s
493 ceph config set mon mon_crush_min_required_version firefly
494 ceph -s
495 ceph mon enable-msgr2
But on the second, I did not have the two ceph-volume commands :
Code:
482 ceph osd dump | grep ^flags
483 sed -i 's/luminous/nautilus/' /etc/apt/sources.list.d/ceph.list
484 cat /etc/apt/sources.list.d/ceph.list
485 apt update
486 apt list --upgradable
487 apt dist-upgrade
488 systemctl restart ceph-mon.target
489 systemctl restart ceph-mgr.target
490 systemctl restart ceph-mgr.target
491 ceph -s
492 ps aux | grep ceph.mgr
493 systemctl status ceph-mgr.target
494 systemctl restart ceph-osd.target
495 ceph mon dump
496 ceph-volume simple scan
497 ceph -s
So I tried to apply the command 'ceph-volume simple scan' again on node 2, but got this time an error :
Code:
root@prox2orsay:~# ceph-volume simple scan
stderr: lsblk: /var/lib/ceph/osd/ceph-10: not a block device
stderr: Bad argument "/var/lib/ceph/osd/ceph-10", expected an absolute path in /dev/ or /sys or a unit name: Invalid argument
Running command: /sbin/cryptsetup status /dev/sdd1
--> RuntimeError: --force was not used and OSD metadata file exists: /etc/ceph/osd/10-2543a4e3-d8ef-476f-a6fa-2cfaa0b2fb6b.json
At this point I need some advices. Is it still possible to upgrade the OSDs to nautilus (without removing re re-adding the OSDs) ? Should I revert the command 'ceph osd require-osd-release nautilus' to luminous in order to do so ? Is there another possibility ?
Here is the osd tree. I am now in Proxmox 6.1, latest version...
Code:
root@prox1orsay:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 10.91016 root default
-2 3.63672 host prox1orsay
0 hdd 0.45459 osd.0 up 1.00000 1.00000
3 hdd 0.45459 osd.3 up 1.00000 1.00000
4 hdd 0.45459 osd.4 up 1.00000 1.00000
5 hdd 0.45459 osd.5 up 1.00000 1.00000
6 hdd 0.45459 osd.6 up 1.00000 1.00000
7 hdd 0.45459 osd.7 up 1.00000 1.00000
8 hdd 0.45459 osd.8 up 1.00000 1.00000
9 hdd 0.45459 osd.9 up 1.00000 1.00000
-3 3.63672 host prox2orsay
1 hdd 0.45459 osd.1 up 1.00000 1.00000
10 hdd 0.45459 osd.10 up 1.00000 1.00000
11 hdd 0.45459 osd.11 up 1.00000 1.00000
12 hdd 0.45459 osd.12 up 1.00000 1.00000
13 hdd 0.45459 osd.13 up 1.00000 1.00000
14 hdd 0.45459 osd.14 up 1.00000 1.00000
15 hdd 0.45459 osd.15 up 1.00000 1.00000
16 hdd 0.45459 osd.16 up 1.00000 1.00000
-4 3.63672 host prox3orsay
2 hdd 0.45459 osd.2 up 1.00000 1.00000
17 hdd 0.45459 osd.17 up 1.00000 1.00000
18 hdd 0.45459 osd.18 up 1.00000 1.00000
19 hdd 0.45459 osd.19 up 1.00000 1.00000
20 hdd 0.45459 osd.20 up 1.00000 1.00000
21 hdd 0.45459 osd.21 up 1.00000 1.00000
22 hdd 0.45459 osd.22 up 1.00000 1.00000
23 hdd 0.45459 osd.23 up 1.00000 1.00000
As I had a warning, I entered the command 'ceph config set mon mon_crush_min_required_version firefly', but it did not help.
As you perhaps noted, I have also a little problem with managers, as ceph -s does not display standbys managers. It was already the case on a test cluster (also filestore).
I would be grateful if someone can help solve this issue (which is my fault...).
Thanks in advance
Alain