[SOLVED] crushmap rule for nvme

RobFantini

Famous Member
May 24, 2012
2,018
102
133
Boston,Mass
Hello

I am trying to add a ceph crushmap rule for nvme .

I add this :
Code:
rule replicated_nvme {
    id 1
    type replicated
    min_size 1
    max_size 10
    step take default class nvme
    step chooseleaf firstn 0 type host
    step emit
}

then run
Code:
crushtool -c crushmap.txt -o crushmap.new

no error is returned,
however a crushmap.new is not created .

I've tried a few things and have been reading docs like http://docs.ceph.com/docs/master/rados/operations/crush-map/ , searching etc..

I am hoping to set up a rule so that nvme osd's have a rule to group them at.
also perhaps that ceph-volume --crush-device-class could use. I am still researching..

we are using version ceph: 12.2.12-pve1

Does anyone have a suggestion on how to set up a rule for nvme ?
 
i was able to add a rule, however could not specify class type
the following wounded ceph , do not do like this:
Code:
# rules
rule replicated_rule {
        id 0
        type replicated
        min_size 1
        max_size 10
        step chooseleaf firstn 0 type host
        step emit
}
rule replicated_nvme {
        id 1
        type replicated
        min_size 1
        max_size 10
        step chooseleaf firstn 0 type host
        step emit
}

we have 3 nvme's on order .. when i add osd's to use replicated_nvme , perhaps specifying 'step take default class' will work..
 
Last edited:
How would I go about setting up a ceph pool that uses only osds set up on nvme ?

From searching this forum it seems a rule needs to be set up 1ST.

Any suggestions on where to look to figure this out? I'll of course do more research later on..
 
1ST try:
Code:
# ceph osd crush rule create-replicated replicated_nvme  default  host nvme
Error EINVAL: device class nvme does not exist

so will figure out how to add 'device class nvme' .

It looks like an osd needs to be created on a nvme 1ST? then the class will get made automatically? the drives come in tomorrow..
 
RobFantini said:
It looks like an osd needs to be created on a nvme 1ST? then the class will get made automatically?

Yes, that is my understanding. You can see your currently recognized classes with:
Code:
ceph osd crush class ls
[
    "hdd"
]

It is possible to "re-class" an existing OSD such that you could create your CRUSH rule, but I don't recommend it.
 
Yes, that is my understanding. You can see your currently recognized classes with:
Code:
ceph osd crush class ls
[
    "hdd"
]

It is possible to "re-class" an existing OSD such that you could create your CRUSH rule, but I don't recommend it.

thank you for the reply!

I think reclassify could break an existing pool, so I'll stay away from that. we have ceph in production since 2017 , and do not have a testing cluster.

also I'll try this to create osd on nvme. lets see if the class gets created at the same time:
Code:
ceph-volume lvm batch --osds-per-device 2  --crush-device-class  nvme  /dev/nvme0n1
 
ok this worked:
Code:
# ceph-volume lvm batch --osds-per-device 2  --crush-device-class  nvme  /dev/nvme0n1

Total OSDs: 2

  Type            Path                                                    LV Size         % of device
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme0n1                                            447.13 GB       50%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme0n1                                            447.13 GB       50%
--> The above OSDs would be created if the operation continues
--> do you want to proceed? (yes/no) yes
Running command: vgcreate --force --yes ceph-dbd94557-9eaf-4dda-b4cd-ccd985e0152b /dev/nvme0n1
 stdout: Physical volume "/dev/nvme0n1" successfully created.
 stdout: Volume group "ceph-dbd94557-9eaf-4dda-b4cd-ccd985e0152b" successfully created
Running command: lvcreate --yes -l 114464 -n osd-data-6f088d28-a0be-4186-a1b6-29dc7e1de375 ceph-dbd94557-9eaf-4dda-b4cd-ccd985e0152b
 stdout: Logical volume "osd-data-6f088d28-a0be-4186-a1b6-29dc7e1de375" created.
...
Running command: systemctl enable --runtime ceph-osd@4
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@4.service → /lib/systemd/system/ceph-osd@.service.
Running command: systemctl start ceph-osd@4
--> ceph-volume lvm activate successful for osd ID: 4
--> ceph-volume lvm create successful for: ceph-dbd94557-9eaf-4dda-b4cd-ccd985e0152b/osd-data-410ce862-7653-473a-bf9a-dfa867982b41
and there is a nvme class
Code:
# ceph osd crush class ls
[
    "ssd",
    "nvme"
]

ceph -s shows the osd's seem to be added to existing pool:
Code:
# ceph -s
  cluster:
    id:     220b9a53-4556-48e3-a73c-28deff665e45
    health: HEALTH_WARN
            65352/731910 objects misplaced (8.929%)
  services:
    mon: 3 daemons, quorum pve3,pve10,pve14
    mgr: pve3(active), standbys: sys8, pve14, pve10
    osd: 44 osds: 44 up, 44 in; 221 remapped pgs
  data:
    pools:   1 pools, 1024 pgs
    objects: 243.97k objects, 901GiB
    usage:   2.21TiB used, 13.9TiB / 16.2TiB avail
    pgs:     0.195% pgs not active
             65352/731910 objects misplaced (8.929%)
             798 active+clean
             219 active+remapped+backfill_wait
             5   active+remapped+backfilling
             2   activating
  io:
    client:   0B/s rd, 1.67MiB/s wr, 0op/s rd, 83op/s wr
    recovery: 562MiB/s, 157objects/s

so next - How do I make a new pool and use just class nvme .
 
Hello RokaKen - thanks for the reply. I was in the middle of writing the following when you sent .
I'll theck the post you just worte in addition :

This is what I intend to try:
Code:
ceph osd crush rule create-replicated replicated_nvme  default  host nvme

then make new pool at pve using the new rule.
 
OK getting close. I have an issue with the osd's created:

nvme pool shows 0 storage avail [ i tried to move a vm's disk , showed 0 byte avail]
from pve when try to move disk:
Code:
create full clone of drive scsi1 (ceph_vm:vm-9109-disk-1)
TASK ERROR: storage migration failed: error with cfs lock 'storage-nvme_vm': rbd error: rbd: list: (95) Operation not supported

df shows strange mount info:
Code:
# df
Filesystem                     Type      Size  Used Avail Use% Mounted on
/dev/sdj1                      xfs        94M  5.5M   89M   6% /var/lib/ceph/osd/ceph-41
/dev/sdb1                      xfs        94M  5.5M   89M   6% /var/lib/ceph/osd/ceph-25
...
tmpfs                          tmpfs     103G   48K  103G   1% /var/lib/ceph/osd/ceph-3
tmpfs                          tmpfs     103G   48K  103G   1% /var/lib/ceph/osd/ceph-4

I had done the following to create the osd's:
Code:
ceph-volume lvm batch --osds-per-device 2  /dev/nvme0n1


suggestions?
 
so will need to re think how to create 2 osd's on a disk
Why would you like to create two OSDs on the same NVMe? For performance? In my tests, a NVMe only pool with 4x OSDs/NVMe didn't give me better performance. Sadly I never got a NVMe that actually supports namespaces that might get things going.
 
with one osd per nvme , still have an issue using the nvme pool.

after removing the 2 OSD's per nvme, zapping the osds and reboot of nodes [ had to ] - I made one osd per nvme .

df shows:
Code:
/dev/nvme0n1p1                 xfs        97M  5.5M   92M   6% /var/lib/ceph/osd/ceph-5
that is good .

storage set up:
Code:
rbd: nvme-vm
        content images
        krbd 0
        pool rbd_nvme

however pve shows 0 space when i tried to move a disk

and there is a '?' next to rbd_nvme at left on pve.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!