Preface:
I have a hybrid Ceph environment using 16 SATA spinners and 2 Intel Optane NVMe PCIe cards (intended for DB and WAL). Because of enumeration issues on reboot, the NVMe cards can flip their /dev/{names}. This will cause a full cluster re balance if the /dev/{names} flip. The recommendation from Intel is to create separate DB and Wal partitions on the NVMe and name them in Parted so they use:
When creating on OSD, you can select only one device for the DB and WAL together, you cannot separate the two on separate devices. The only way to size this partition is to specify the "bluestore_block_db_size" in the ceph.conf file. This is where my confidence is lost. If the DB and WAL are on this same partition, then why is the size of the partition that Proxmox creates, only the size of what is specified in the ceph.conf file for the "bluestore_block_db_size", and not the concatenation of "bluestore_block_db_size" and "bluestore_block_wal_size"? It is my belief that we are not really using the WAL on the Optane drives. Plus in this configuration method, I am unable to specify a separate "bluestore_block_wal_path" in the ceph.conf file Intel's recommendation.
Issues with the Proxmox CLI:
I have tried to create the OSD’s from a command line but that is broken As well. Using the method with "pveceph createosd" to create the OSD. I can specify the separate “/dev/disk/by-partlabel/” for both the WAL and DB but it fails creating the OSD. -It is left in a partial created state (no Device Class set to HDD on the OSD, and in the Crushmap the GUI says the OSD is FILESTORE and not BLUESTORE) which does not work. I have also tried the “ceph-disk prepare” and the “ceph-volume lvm prepare” methods which also break the same way as noted below.
Here is a command example that I ran, and the limited log output.
Command:
pveceph createosd /dev/sdr --bluestore --wal_dev /dev/disk/by-partlabel/osd-device-79-wal --journal_dev /dev/disk/by-partlabel/osd-device-79-db
Log:
2018-12-04 11:59:25.836454 7f2168ac1e00 0 set uid:gid to 64045:64045 (ceph:ceph)
2018-12-04 11:59:25.836468 7f2168ac1e00 0 ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable), process ceph-osd, pid 56236
2018-12-04 11:59:25.839160 7f2168ac1e00 1 bluestore(/var/lib/ceph/tmp/mnt.Zljbva) mkfs path /var/lib/ceph/tmp/mnt.Zljbva
2018-12-04 11:59:25.840085 7f2168ac1e00 -1 bluestore(/var/lib/ceph/tmp/mnt.Zljbva) _setup_block_symlink_or_file failed to create block.wal symlink to /dev
/disk/by-partlabel/osd-device-79-wal: (17) File exists
2018-12-04 11:59:25.840098 7f2168ac1e00 -1 bluestore(/var/lib/ceph/tmp/mnt.Zljbva) mkfs failed, (17) File exists
2018-12-04 11:59:25.840100 7f2168ac1e00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (17) File exists
2018-12-04 11:59:25.840159 7f2168ac1e00 -1 ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.Zljbva: (17) File exists
Proxmox Version 5.2-9
Any help would be greatly appreciated, I am fairly new to Ceph so forgive me if I missed anything.
I have a hybrid Ceph environment using 16 SATA spinners and 2 Intel Optane NVMe PCIe cards (intended for DB and WAL). Because of enumeration issues on reboot, the NVMe cards can flip their /dev/{names}. This will cause a full cluster re balance if the /dev/{names} flip. The recommendation from Intel is to create separate DB and Wal partitions on the NVMe and name them in Parted so they use:
/dev/disk/by-partlabel/osd-device-0-db
/dev/disk/by-partlabel/osd-device-0-wal
Then add to the ceph.conf :/dev/disk/by-partlabel/osd-device-0-wal
bluestore block db path = /dev/disk/by-partlabel/osd-device-0-db
bluestore block wal path = /dev/disk/by-partlabel/osd-device-0-wal
Issues the Proxmox GUI:bluestore block wal path = /dev/disk/by-partlabel/osd-device-0-wal
When creating on OSD, you can select only one device for the DB and WAL together, you cannot separate the two on separate devices. The only way to size this partition is to specify the "bluestore_block_db_size" in the ceph.conf file. This is where my confidence is lost. If the DB and WAL are on this same partition, then why is the size of the partition that Proxmox creates, only the size of what is specified in the ceph.conf file for the "bluestore_block_db_size", and not the concatenation of "bluestore_block_db_size" and "bluestore_block_wal_size"? It is my belief that we are not really using the WAL on the Optane drives. Plus in this configuration method, I am unable to specify a separate "bluestore_block_wal_path" in the ceph.conf file Intel's recommendation.
Issues with the Proxmox CLI:
I have tried to create the OSD’s from a command line but that is broken As well. Using the method with "pveceph createosd" to create the OSD. I can specify the separate “/dev/disk/by-partlabel/” for both the WAL and DB but it fails creating the OSD. -It is left in a partial created state (no Device Class set to HDD on the OSD, and in the Crushmap the GUI says the OSD is FILESTORE and not BLUESTORE) which does not work. I have also tried the “ceph-disk prepare” and the “ceph-volume lvm prepare” methods which also break the same way as noted below.
Here is a command example that I ran, and the limited log output.
Command:
pveceph createosd /dev/sdr --bluestore --wal_dev /dev/disk/by-partlabel/osd-device-79-wal --journal_dev /dev/disk/by-partlabel/osd-device-79-db
Log:
2018-12-04 11:59:25.836454 7f2168ac1e00 0 set uid:gid to 64045:64045 (ceph:ceph)
2018-12-04 11:59:25.836468 7f2168ac1e00 0 ceph version 12.2.8 (6f01265ca03a6b9d7f3b7f759d8894bb9dbb6840) luminous (stable), process ceph-osd, pid 56236
2018-12-04 11:59:25.839160 7f2168ac1e00 1 bluestore(/var/lib/ceph/tmp/mnt.Zljbva) mkfs path /var/lib/ceph/tmp/mnt.Zljbva
2018-12-04 11:59:25.840085 7f2168ac1e00 -1 bluestore(/var/lib/ceph/tmp/mnt.Zljbva) _setup_block_symlink_or_file failed to create block.wal symlink to /dev
/disk/by-partlabel/osd-device-79-wal: (17) File exists
2018-12-04 11:59:25.840098 7f2168ac1e00 -1 bluestore(/var/lib/ceph/tmp/mnt.Zljbva) mkfs failed, (17) File exists
2018-12-04 11:59:25.840100 7f2168ac1e00 -1 OSD::mkfs: ObjectStore::mkfs failed with error (17) File exists
2018-12-04 11:59:25.840159 7f2168ac1e00 -1 ** ERROR: error creating empty object store in /var/lib/ceph/tmp/mnt.Zljbva: (17) File exists
Proxmox Version 5.2-9
Any help would be greatly appreciated, I am fairly new to Ceph so forgive me if I missed anything.
Last edited: