ceph - how to add a separate DB/WAL drive

RobFantini

Renowned Member
May 24, 2012
1,674
37
68
Boston,Mass
Hello
We have a ceph pool.

We are considering adding a drive to each node for db/wall journal.

Can this be done from the gui?

If not could someone point me in the direction to learn the cli command?

thanks
Rob Fantini
 

sb-jw

Active Member
Jan 23, 2018
551
49
28
28
We are considering adding a drive to each node for db/wall journal.
I would not recommend this. You will put an SPOF in your cluster, if this Drive will fail, the whole node will fail.

It only make sense, if your DB/Wall device is much faster then the OSD itself.
 

RobFantini

Renowned Member
May 24, 2012
1,674
37
68
Boston,Mass
I would not recommend this. You will put an SPOF in your cluster, if this Drive will fail, the whole node will fail.


It only make sense, if your DB/Wall device is much faster then the OSD itself.
Thank you for the response!



Here is our issue:

We have 7 nodes with 6 having 7 ssd's each. the ssd's are Intel S3610 .

We have cronscripts to check for issues that in the past have led to data corruption. there is always a time issue in the vm before corruption. so the script checks for time discrepancies . shortly before those occur there are lines like the following in logs / dmesg :
Code:
# zgrep "task abort: SUCCESS scmd" */*log|grep "Jun 14"
apt-cacher-10.1.3.8/apt-cacher.log:Jun 14 12:11:20 apt-cacher kernel: [854392.527972] sd 11:0:2:0: task abort: SUCCESS scmd(00000000e3974120)
apt-cacher-10.1.3.8/apt-cacher.log:Jun 14 13:54:03 apt-cacher kernel: [860555.491860] sd 12:0:2:0: task abort: SUCCESS scmd(0000000019a852bc)
bc-sys2-10.1.15.51/bc-sys2.log:Jun 14 12:11:20 bc-sys2 kernel: [854392.527900] sd 11:0:2:0: task abort: SUCCESS scmd(000000002c30a9d9)
bc-sys7-10.1.15.207/bc-sys7.log:Jun 14 12:12:40 bc-sys7 kernel: [855103.992501] sd 12:0:0:0: task abort: SUCCESS scmd(00000000157a0884)
bc-sys7-10.1.15.207/bc-sys7.log:Jun 14 12:12:40 bc-sys7 kernel: [855103.992558] sd 12:0:0:0: task abort: SUCCESS scmd(000000008ff6d5a8)
dhcp-primary-10.1.3.15/dhcp-primary.log:Jun 14 13:52:56 dhcp-primary kernel: [859366.910045] sd 11:0:1:0: task abort: SUCCESS scmd(00000000a09d9f31)
ldap-10.1.3.164/ldap.log:Jun 14 13:52:56 ldap kernel: [859366.910001] sd 11:0:1:0: task abort: SUCCESS scmd(00000000f08213a0)
The drives are always part of the ceph pool and not rpool or zfs.

For some reason in the last 3 weeks we have had timing / scsi issues about 3 times per week. In the 4 months before that we had 3 time issue emails.

From searches on "task abort: SUCCESS scmd" - some suggest firmware upgrade.

Now we can not easily upgrade the firmware in these drives as they came from Dell systems and we are using Supermicro. The Intel firmware upgrade program will not work with anything but Intel retail drives [ non oem ]. And the dell uprade program only works when the drives are in a supported dell system [ we no longer have any dell systems ].

So in our case having a very reliable and fast NVMe PCIe journal may alleviate the issue we have.

Using a journal drive is a SPOF. however with 7 nodes and relatively little storage traffic [ mainly accounting programs with 30 concurrent users ] we are OK for some time if a node dies. [ We have shutdown a node to replace motherboards a few times].

On each node , the lsi hba also deal with traffic from a zfs mirror used for video recordings. Before adding the hournal to ceph I am doing things like changing settings to sync=disabled . If that does not help i may use a single drive formatted to xfs for recordings.


So I would still like to know how to add a journal drive to an existing ceph pool in case we decide to do so.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!