Hello all,
I recently decided to use SSD in order to improve performance of my cluster. Here is my cluster setup
4 Nodes
36 HDD X 465 GB / node
CPU(s) 8 x Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz (2 Sockets) /node
RAM 128GB / node
I wanted to move all my WAL DB to new SSD in order to improve performance. Originally they were on HDD.
More I read about WAL/DB size, recovery, backfilling more it boggles my mind. I know these questions have been asked many times but I am still confused. Here are the questions I have.
1. Would be beneficial to increase WAL/DB size while creating new OSDs, how do I do that ?
2. Is there any faster way to recreate new OSDs, my each OSD takes 20-30 minutes to backfill when I add it.
3. I am using attached script for moving WAL/DB to SSD.
4. The default setting for CEPH config shows these sizes
ceph --show-config | grep bluestore_block
bluestore_block_create = true
bluestore_block_db_create = false
bluestore_block_db_path =
bluestore_block_db_size = 0
bluestore_block_path =
bluestore_block_preallocate_file = false
bluestore_block_size = 10737418240
bluestore_block_wal_create = false
bluestore_block_wal_path =
bluestore_block_wal_size = 100663296
And these are the partitions it created on my SSD. It only consumed 30GB out of 250 GB SSD.
fdisk -l | grep sds
Disk /dev/sds: 223.6 GiB, 240057409536 bytes, 468862128 sectors
/dev/sds1 2048 2099199 2097152 1G unknown
/dev/sds2 2099200 3278847 1179648 576M unknown
/dev/sds3 3278848 5375999 2097152 1G unknown
/dev/sds4 5376000 6555647 1179648 576M unknown
/dev/sds5 6555648 8652799 2097152 1G unknown
/dev/sds6 8652800 9832447 1179648 576M unknown
/dev/sds7 9832448 11929599 2097152 1G unknown
/dev/sds8 11929600 13109247 1179648 576M unknown
/dev/sds9 13109248 15206399 2097152 1G unknown
/dev/sds10 15206400 16386047 1179648 576M unknown
/dev/sds11 16386048 18483199 2097152 1G unknown
/dev/sds12 18483200 19662847 1179648 576M unknown
/dev/sds13 19662848 21759999 2097152 1G unknown
/dev/sds14 21760000 22939647 1179648 576M unknown
/dev/sds15 22939648 25036799 2097152 1G unknown
/dev/sds16 25036800 26216447 1179648 576M unknown
/dev/sds17 26216448 28313599 2097152 1G unknown
/dev/sds18 28313600 29493247 1179648 576M unknown
/dev/sds19 29493248 31590399 2097152 1G unknown
/dev/sds20 31590400 32770047 1179648 576M unknown
/dev/sds21 32770048 34867199 2097152 1G unknown
/dev/sds22 34867200 36046847 1179648 576M unknown
/dev/sds23 36046848 38143999 2097152 1G unknown
/dev/sds24 38144000 39323647 1179648 576M unknown
/dev/sds25 39323648 41420799 2097152 1G unknown
/dev/sds26 41420800 42600447 1179648 576M unknown
/dev/sds27 42600448 44697599 2097152 1G unknown
/dev/sds28 44697600 45877247 1179648 576M unknown
/dev/sds29 45877248 47974399 2097152 1G unknown
/dev/sds30 47974400 49154047 1179648 576M unknown
/dev/sds31 49154048 51251199 2097152 1G unknown
/dev/sds32 51251200 53348351 2097152 1G unknown
/dev/sds33 53348352 55445503 2097152 1G unknown
/dev/sds34 55445504 56625151 1179648 576M unknown
/dev/sds35 56625152 58722303 2097152 1G unknown
5. If increasing WAL/DB size partition is beneficial, Can I increase the existing partition size on my SSD or I have to recreate them ?
6. Are these any parameters to increase recovery/backfilling time. I tinkered with these two values but it did not help much.
ceph tell 'osd.*' injectargs '--osd-max-backfills 10'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 5'
Here is proxmox information
pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-5-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-8
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-29
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-27
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-36
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1
Lastly my biggest concern is if my SSD dies what will happen and how can I prevent it from failing all my OSDs ?
I recently decided to use SSD in order to improve performance of my cluster. Here is my cluster setup
4 Nodes
36 HDD X 465 GB / node
CPU(s) 8 x Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz (2 Sockets) /node
RAM 128GB / node
I wanted to move all my WAL DB to new SSD in order to improve performance. Originally they were on HDD.
More I read about WAL/DB size, recovery, backfilling more it boggles my mind. I know these questions have been asked many times but I am still confused. Here are the questions I have.
1. Would be beneficial to increase WAL/DB size while creating new OSDs, how do I do that ?
2. Is there any faster way to recreate new OSDs, my each OSD takes 20-30 minutes to backfill when I add it.
3. I am using attached script for moving WAL/DB to SSD.
4. The default setting for CEPH config shows these sizes
ceph --show-config | grep bluestore_block
bluestore_block_create = true
bluestore_block_db_create = false
bluestore_block_db_path =
bluestore_block_db_size = 0
bluestore_block_path =
bluestore_block_preallocate_file = false
bluestore_block_size = 10737418240
bluestore_block_wal_create = false
bluestore_block_wal_path =
bluestore_block_wal_size = 100663296
And these are the partitions it created on my SSD. It only consumed 30GB out of 250 GB SSD.
fdisk -l | grep sds
Disk /dev/sds: 223.6 GiB, 240057409536 bytes, 468862128 sectors
/dev/sds1 2048 2099199 2097152 1G unknown
/dev/sds2 2099200 3278847 1179648 576M unknown
/dev/sds3 3278848 5375999 2097152 1G unknown
/dev/sds4 5376000 6555647 1179648 576M unknown
/dev/sds5 6555648 8652799 2097152 1G unknown
/dev/sds6 8652800 9832447 1179648 576M unknown
/dev/sds7 9832448 11929599 2097152 1G unknown
/dev/sds8 11929600 13109247 1179648 576M unknown
/dev/sds9 13109248 15206399 2097152 1G unknown
/dev/sds10 15206400 16386047 1179648 576M unknown
/dev/sds11 16386048 18483199 2097152 1G unknown
/dev/sds12 18483200 19662847 1179648 576M unknown
/dev/sds13 19662848 21759999 2097152 1G unknown
/dev/sds14 21760000 22939647 1179648 576M unknown
/dev/sds15 22939648 25036799 2097152 1G unknown
/dev/sds16 25036800 26216447 1179648 576M unknown
/dev/sds17 26216448 28313599 2097152 1G unknown
/dev/sds18 28313600 29493247 1179648 576M unknown
/dev/sds19 29493248 31590399 2097152 1G unknown
/dev/sds20 31590400 32770047 1179648 576M unknown
/dev/sds21 32770048 34867199 2097152 1G unknown
/dev/sds22 34867200 36046847 1179648 576M unknown
/dev/sds23 36046848 38143999 2097152 1G unknown
/dev/sds24 38144000 39323647 1179648 576M unknown
/dev/sds25 39323648 41420799 2097152 1G unknown
/dev/sds26 41420800 42600447 1179648 576M unknown
/dev/sds27 42600448 44697599 2097152 1G unknown
/dev/sds28 44697600 45877247 1179648 576M unknown
/dev/sds29 45877248 47974399 2097152 1G unknown
/dev/sds30 47974400 49154047 1179648 576M unknown
/dev/sds31 49154048 51251199 2097152 1G unknown
/dev/sds32 51251200 53348351 2097152 1G unknown
/dev/sds33 53348352 55445503 2097152 1G unknown
/dev/sds34 55445504 56625151 1179648 576M unknown
/dev/sds35 56625152 58722303 2097152 1G unknown
5. If increasing WAL/DB size partition is beneficial, Can I increase the existing partition size on my SSD or I have to recreate them ?
6. Are these any parameters to increase recovery/backfilling time. I tinkered with these two values but it did not help much.
ceph tell 'osd.*' injectargs '--osd-max-backfills 10'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 5'
Here is proxmox information
pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.18-5-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-8
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-29
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-27
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-36
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1
Lastly my biggest concern is if my SSD dies what will happen and how can I prevent it from failing all my OSDs ?