ceph - how to add a separate DB/WAL drive

Discussion in 'Proxmox VE: Installation and configuration' started by RobFantini, Jun 16, 2019.

  1. RobFantini

    RobFantini Active Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,521
    Likes Received:
    21
    Hello
    We have a ceph pool.

    We are considering adding a drive to each node for db/wall journal.

    Can this be done from the gui?

    If not could someone point me in the direction to learn the cli command?

    thanks
    Rob Fantini
     
  2. sb-jw

    sb-jw Active Member

    Joined:
    Jan 23, 2018
    Messages:
    551
    Likes Received:
    49
    I would not recommend this. You will put an SPOF in your cluster, if this Drive will fail, the whole node will fail.

    It only make sense, if your DB/Wall device is much faster then the OSD itself.
     
  3. RobFantini

    RobFantini Active Member
    Proxmox Subscriber

    Joined:
    May 24, 2012
    Messages:
    1,521
    Likes Received:
    21
    Thank you for the response!



    Here is our issue:

    We have 7 nodes with 6 having 7 ssd's each. the ssd's are Intel S3610 .

    We have cronscripts to check for issues that in the past have led to data corruption. there is always a time issue in the vm before corruption. so the script checks for time discrepancies . shortly before those occur there are lines like the following in logs / dmesg :
    Code:
    # zgrep "task abort: SUCCESS scmd" */*log|grep "Jun 14"
    apt-cacher-10.1.3.8/apt-cacher.log:Jun 14 12:11:20 apt-cacher kernel: [854392.527972] sd 11:0:2:0: task abort: SUCCESS scmd(00000000e3974120)
    apt-cacher-10.1.3.8/apt-cacher.log:Jun 14 13:54:03 apt-cacher kernel: [860555.491860] sd 12:0:2:0: task abort: SUCCESS scmd(0000000019a852bc)
    bc-sys2-10.1.15.51/bc-sys2.log:Jun 14 12:11:20 bc-sys2 kernel: [854392.527900] sd 11:0:2:0: task abort: SUCCESS scmd(000000002c30a9d9)
    bc-sys7-10.1.15.207/bc-sys7.log:Jun 14 12:12:40 bc-sys7 kernel: [855103.992501] sd 12:0:0:0: task abort: SUCCESS scmd(00000000157a0884)
    bc-sys7-10.1.15.207/bc-sys7.log:Jun 14 12:12:40 bc-sys7 kernel: [855103.992558] sd 12:0:0:0: task abort: SUCCESS scmd(000000008ff6d5a8)
    dhcp-primary-10.1.3.15/dhcp-primary.log:Jun 14 13:52:56 dhcp-primary kernel: [859366.910045] sd 11:0:1:0: task abort: SUCCESS scmd(00000000a09d9f31)
    ldap-10.1.3.164/ldap.log:Jun 14 13:52:56 ldap kernel: [859366.910001] sd 11:0:1:0: task abort: SUCCESS scmd(00000000f08213a0)
    
    The drives are always part of the ceph pool and not rpool or zfs.

    For some reason in the last 3 weeks we have had timing / scsi issues about 3 times per week. In the 4 months before that we had 3 time issue emails.

    From searches on "task abort: SUCCESS scmd" - some suggest firmware upgrade.

    Now we can not easily upgrade the firmware in these drives as they came from Dell systems and we are using Supermicro. The Intel firmware upgrade program will not work with anything but Intel retail drives [ non oem ]. And the dell uprade program only works when the drives are in a supported dell system [ we no longer have any dell systems ].

    So in our case having a very reliable and fast NVMe PCIe journal may alleviate the issue we have.

    Using a journal drive is a SPOF. however with 7 nodes and relatively little storage traffic [ mainly accounting programs with 30 concurrent users ] we are OK for some time if a node dies. [ We have shutdown a node to replace motherboards a few times].

    On each node , the lsi hba also deal with traffic from a zfs mirror used for video recordings. Before adding the hournal to ceph I am doing things like changing settings to sync=disabled . If that does not help i may use a single drive formatted to xfs for recordings.


    So I would still like to know how to add a journal drive to an existing ceph pool in case we decide to do so.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice