[SOLVED] crushmap rule for nvme

RobFantini

Famous Member
May 24, 2012
2,043
111
133
Boston,Mass
Hello

I am trying to add a ceph crushmap rule for nvme .

I add this :
Code:
rule replicated_nvme {
    id 1
    type replicated
    min_size 1
    max_size 10
    step take default class nvme
    step chooseleaf firstn 0 type host
    step emit
}

then run
Code:
crushtool -c crushmap.txt -o crushmap.new

no error is returned,
however a crushmap.new is not created .

I've tried a few things and have been reading docs like http://docs.ceph.com/docs/master/rados/operations/crush-map/ , searching etc..

I am hoping to set up a rule so that nvme osd's have a rule to group them at.
also perhaps that ceph-volume --crush-device-class could use. I am still researching..

we are using version ceph: 12.2.12-pve1

Does anyone have a suggestion on how to set up a rule for nvme ?
 
i was able to add a rule, however could not specify class type
the following wounded ceph , do not do like this:
Code:
# rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step chooseleaf firstn 0 type host
        step emit
}
rule replicated_nvme {
        id 1
        type replicated
        min_size 1
        max_size 10
        step chooseleaf firstn 0 type host
        step emit
}

we have 3 nvme's on order .. when i add osd's to use replicated_nvme , perhaps specifying 'step take default class' will work..
 
Last edited:
How would I go about setting up a ceph pool that uses only osds set up on nvme ?

From searching this forum it seems a rule needs to be set up 1ST.

Any suggestions on where to look to figure this out? I'll of course do more research later on..
 
1ST try:
Code:
# ceph osd crush rule create-replicated replicated_nvme  default  host nvme
Error EINVAL: device class nvme does not exist

so will figure out how to add 'device class nvme' .

It looks like an osd needs to be created on a nvme 1ST? then the class will get made automatically? the drives come in tomorrow..
 
RobFantini said:
It looks like an osd needs to be created on a nvme 1ST? then the class will get made automatically?

Yes, that is my understanding. You can see your currently recognized classes with:
Code:
ceph osd crush class ls
[
    "hdd"
]

It is possible to "re-class" an existing OSD such that you could create your CRUSH rule, but I don't recommend it.
 
Yes, that is my understanding. You can see your currently recognized classes with:
Code:
ceph osd crush class ls
[
    "hdd"
]

It is possible to "re-class" an existing OSD such that you could create your CRUSH rule, but I don't recommend it.

thank you for the reply!

I think reclassify could break an existing pool, so I'll stay away from that. we have ceph in production since 2017 , and do not have a testing cluster.

also I'll try this to create osd on nvme. lets see if the class gets created at the same time:
Code:
ceph-volume lvm batch --osds-per-device 2  --crush-device-class  nvme  /dev/nvme0n1
 
ok this worked:
Code:
# ceph-volume lvm batch --osds-per-device 2  --crush-device-class  nvme  /dev/nvme0n1

Total OSDs: 2

  Type            Path                                                    LV Size         % of device
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme0n1                                            447.13 GB       50%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme0n1                                            447.13 GB       50%
--> The above OSDs would be created if the operation continues
--> do you want to proceed? (yes/no) yes
Running command: vgcreate --force --yes ceph-dbd94557-9eaf-4dda-b4cd-ccd985e0152b /dev/nvme0n1
 stdout: Physical volume "/dev/nvme0n1" successfully created.
 stdout: Volume group "ceph-dbd94557-9eaf-4dda-b4cd-ccd985e0152b" successfully created
Running command: lvcreate --yes -l 114464 -n osd-data-6f088d28-a0be-4186-a1b6-29dc7e1de375 ceph-dbd94557-9eaf-4dda-b4cd-ccd985e0152b
 stdout: Logical volume "osd-data-6f088d28-a0be-4186-a1b6-29dc7e1de375" created.
...
Running command: systemctl enable --runtime ceph-osd@4
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@4.service → /lib/systemd/system/ceph-osd@.service.
Running command: systemctl start ceph-osd@4
--> ceph-volume lvm activate successful for osd ID: 4
--> ceph-volume lvm create successful for: ceph-dbd94557-9eaf-4dda-b4cd-ccd985e0152b/osd-data-410ce862-7653-473a-bf9a-dfa867982b41
and there is a nvme class
Code:
# ceph osd crush class ls
[
    "ssd",
    "nvme"
]

ceph -s shows the osd's seem to be added to existing pool:
Code:
# ceph -s
  cluster:
    id:     220b9a53-4556-48e3-a73c-28deff665e45
    health: HEALTH_WARN
            65352/731910 objects misplaced (8.929%)
  services:
    mon: 3 daemons, quorum pve3,pve10,pve14
    mgr: pve3(active), standbys: sys8, pve14, pve10
    osd: 44 osds: 44 up, 44 in; 221 remapped pgs
  data:
    pools:   1 pools, 1024 pgs
    objects: 243.97k objects, 901GiB
    usage:   2.21TiB used, 13.9TiB / 16.2TiB avail
    pgs:     0.195% pgs not active
             65352/731910 objects misplaced (8.929%)
             798 active+clean
             219 active+remapped+backfill_wait
             5   active+remapped+backfilling
             2   activating
  io:
    client:   0B/s rd, 1.67MiB/s wr, 0op/s rd, 83op/s wr
    recovery: 562MiB/s, 157objects/s

so next - How do I make a new pool and use just class nvme .
 
Hello RokaKen - thanks for the reply. I was in the middle of writing the following when you sent .
I'll theck the post you just worte in addition :

This is what I intend to try:
Code:
ceph osd crush rule create-replicated replicated_nvme  default  host nvme

then make new pool at pve using the new rule.
 
OK getting close. I have an issue with the osd's created:

nvme pool shows 0 storage avail [ i tried to move a vm's disk , showed 0 byte avail]
from pve when try to move disk:
Code:
create full clone of drive scsi1 (ceph_vm:vm-9109-disk-1)
TASK ERROR: storage migration failed: error with cfs lock 'storage-nvme_vm': rbd error: rbd: list: (95) Operation not supported

df shows strange mount info:
Code:
# df
Filesystem                     Type      Size  Used Avail Use% Mounted on
/dev/sdj1                      xfs        94M  5.5M   89M   6% /var/lib/ceph/osd/ceph-41
/dev/sdb1                      xfs        94M  5.5M   89M   6% /var/lib/ceph/osd/ceph-25
...
tmpfs                          tmpfs     103G   48K  103G   1% /var/lib/ceph/osd/ceph-3
tmpfs                          tmpfs     103G   48K  103G   1% /var/lib/ceph/osd/ceph-4

I had done the following to create the osd's:
Code:
ceph-volume lvm batch --osds-per-device 2  /dev/nvme0n1


suggestions?
 
so will need to re think how to create 2 osd's on a disk
Why would you like to create two OSDs on the same NVMe? For performance? In my tests, a NVMe only pool with 4x OSDs/NVMe didn't give me better performance. Sadly I never got a NVMe that actually supports namespaces that might get things going.
 
with one osd per nvme , still have an issue using the nvme pool.

after removing the 2 OSD's per nvme, zapping the osds and reboot of nodes [ had to ] - I made one osd per nvme .

df shows:
Code:
/dev/nvme0n1p1                 xfs        97M  5.5M   92M   6% /var/lib/ceph/osd/ceph-5
that is good .

storage set up:
Code:
rbd: nvme-vm
        content images
        krbd 0
        pool rbd_nvme

however pve shows 0 space when i tried to move a disk

and there is a '?' next to rbd_nvme at left on pve.
 
Last edited: