Ceph - Bluestore - Crash - Compressed Erasure Coded Pool

Discussion in 'Proxmox VE: Installation and configuration' started by David Herselman, May 14, 2018.

  1. David Herselman

    David Herselman Active Member
    Proxmox VE Subscriber

    Joined:
    Jun 8, 2016
    Messages:
    174
    Likes Received:
    37
    We initially tried this with Ceph 12.2.4 and subsequently re-created the problem with 12.2.5.

    Using 'lz4' compression on a Ceph Luminous erasure coded pool causes OSD processes to crash. Changing the compressor to snappy results in the OSD being stable, when the crashed OSD starts thereafter.

    Test cluster environment:
    • 3 hosts
    • 2 BlueStore SSD OSDs per host (they are connected to a HP SmartArray controller so get detected as hdd, we overrode the device class to be ssd)

    Creating the erasure coded data pool and test RBD image:
    Code:
    ceph osd erasure-code-profile set ec21_ssd plugin=jerasure k=2 m=1 technique=reed_sol_van crush-root=default crush-failure-domain=host crush-device-class=ssd directory=/usr/lib/ceph/erasure-code;
    ceph osd pool create ec_ssd 16 erasure ec21_ssd;
    ceph osd pool set ec_ssd allow_ec_overwrites true;
    ceph osd pool application enable ec_ssd rbd;
    ceph osd pool set ec_ssd compression_algorithm lz4;
    ceph osd pool set ec_ssd compression_mode aggressive;
    rbd create rbd_ssd/test_ec --size 100G --data-pool ec_ssd;
    
    [root@kvm1 ~]# rbd info rbd_ssd/test_ec
    rbd image 'test_ec':
            size 102400 MB in 25600 objects
            order 22 (4096 kB objects)
            data_pool: ec_ssd
            block_name_prefix: rbd_data.4.67218c74b0dc51
            format: 2
            features: layering, exclusive-lock, data-pool
            flags:
            create_timestamp: Mon May 14 13:06:46 2018

    Copying test data:
    Code:
    rbd map rbd_ssd/test_ec --name client.admin -k /etc/pve/priv/ceph.client.admin.keyring;
    dd if=/var/lib/vz/template/100G_test_vm of=/dev/rbd0 bs=1G count=20;

    Copy stalled and one of the BlueStore ssd OSDs started flapping. Left it for a while with no difference (boots, comes online and then crashes, repeats continually). Set compressor to snappy instead and OSD was stable thereafter:
    Code:
    ceph osd pool set ec_ssd compression_algorithm snappy

    Sample log information when OSD is crashing:
    Code:
    2018-05-14 13:27:47.329732 7f2f4b4d2700 -1 *** Caught signal (Aborted) **
     in thread 7f2f4b4d2700 thread_name:tp_osd_tp
    
     ceph version 12.2.5 (dfcb7b53b2e4fcd2a5af0240d4975adc711ab96e) luminous (stable)
     1: (()+0xa31194) [0x5637f831d194]
     2: (()+0x110c0) [0x7f2f640c70c0]
     3: (gsignal()+0xcf) [0x7f2f6308efff]
     4: (abort()+0x16a) [0x7f2f6309042a]
     5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5637f836509e]
     6: (BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x352d) [0x5637f820151d]
     7: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x545) [0x5637f820eb95]
     8: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0xfc) [0x5637f820f64c]
     9: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x19f0) [0x5637f8213580]
     10: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546) [0x5637f82147f6]
     11: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x66) [0x5637f7f331b6]
     12: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x867) [0x5637f807c507]
     13: (ECBackend::try_reads_to_commit()+0x37db) [0x5637f808d5fb]
     14: (ECBackend::check_ops()+0x1c) [0x5637f808de3c]
     15: (ECBackend::start_rmw(ECBackend::Op*, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&)+0xac0) [0x5637f80978d0]
     16: (ECBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x3b2) [0x5637f8099202]
     17: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x9fa) [0x5637f7ecdfea]
     18: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x134d) [0x5637f7f17f9d]
     19: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2ef5) [0x5637f7f1b7d5]
     20: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xec5) [0x5637f7ed6025]
     21: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x5637f7d4b87b]
     22: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x5637f7ff613a]
     23: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x102d) [0x5637f7d79d1d]
     24: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x5637f8369d7f]
     25: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5637f836d080]
     26: (()+0x7494) [0x7f2f640bd494]
     27: (clone()+0x3f) [0x7f2f63144acf]
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    
    --- begin dump of recent events ---
        -4> 2018-05-14 13:27:47.326040 7f2f49ccf700  1 get compressor lz4 = 0x56380e1803f0
        -3> 2018-05-14 13:27:47.326854 7f2f49ccf700  1 get compressor lz4 = 0x56380e1803f0
        -2> 2018-05-14 13:27:47.328798 7f2f49ccf700  1 get compressor lz4 = 0x56380e1803f0
        -1> 2018-05-14 13:27:47.328861 7f2f49ccf700  1 get compressor lz4 = 0x56380e1803f0
         0> 2018-05-14 13:27:47.329732 7f2f4b4d2700 -1 *** Caught signal (Aborted) **
     in thread 7f2f4b4d2700 thread_name:tp_osd_tp
    
     ceph version 12.2.5 (dfcb7b53b2e4fcd2a5af0240d4975adc711ab96e) luminous (stable)
     1: (()+0xa31194) [0x5637f831d194]
     2: (()+0x110c0) [0x7f2f640c70c0]
     3: (gsignal()+0xcf) [0x7f2f6308efff]
     4: (abort()+0x16a) [0x7f2f6309042a]
     5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x28e) [0x5637f836509e]
     6: (BlueStore::_do_alloc_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>, boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*)+0x352d) [0x5637f820151d]
     7: (BlueStore::_do_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0x545) [0x5637f820eb95]
     8: (BlueStore::_write(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, unsigned long, unsigned long, ceph::buffer::list&, unsigned int)+0xfc) [0x5637f820f64c]
     9: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x19f0) [0x5637f8213580]
     10: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546) [0x5637f82147f6]
     11: (PrimaryLogPG::queue_transactions(std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x66) [0x5637f7f331b6]
     12: (ECBackend::handle_sub_write(pg_shard_t, boost::intrusive_ptr<OpRequest>, ECSubWrite&, ZTracer::Trace const&, Context*)+0x867) [0x5637f807c507]
     13: (ECBackend::try_reads_to_commit()+0x37db) [0x5637f808d5fb]
     14: (ECBackend::check_ops()+0x1c) [0x5637f808de3c]
     15: (ECBackend::start_rmw(ECBackend::Op*, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&)+0xac0) [0x5637f80978d0]
     16: (ECBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> > const&, boost::optional<pg_hit_set_history_t>&, Context*, Context*, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x3b2) [0x5637f8099202]
     17: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x9fa) [0x5637f7ecdfea]
     18: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x134d) [0x5637f7f17f9d]
     19: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x2ef5) [0x5637f7f1b7d5]
     20: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, ThreadPool::TPHandle&)+0xec5) [0x5637f7ed6025]
     21: (OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3ab) [0x5637f7d4b87b]
     22: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest> const&)+0x5a) [0x5637f7ff613a]
     23: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x102d) [0x5637f7d79d1d]
     24: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef) [0x5637f8369d7f]
     25: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x5637f836d080]
     26: (()+0x7494) [0x7f2f640bd494]
     27: (clone()+0x3f) [0x7f2f63144acf]
     NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
    
    --- logging levels ---
       0/ 5 none
       0/ 1 lockdep
       0/ 1 context
       1/ 1 crush
       1/ 5 mds
       1/ 5 mds_balancer
       1/ 5 mds_locker
       1/ 5 mds_log
       1/ 5 mds_log_expire
       1/ 5 mds_migrator
       0/ 1 buffer
       0/ 1 timer
       0/ 1 filer
       0/ 1 striper
       0/ 1 objecter
       0/ 5 rados
       0/ 5 rbd
       0/ 5 rbd_mirror
       0/ 5 rbd_replay
       0/ 5 journaler
       0/ 5 objectcacher
       0/ 5 client
       0/ 0 osd
       0/ 5 optracker
       0/ 5 objclass
       0/ 0 filestore
       0/ 0 journal
       0/ 0 ms
       1/ 5 mon
       0/10 monc
       1/ 5 paxos
       0/ 5 tp
       1/ 5 auth
       1/ 5 crypto
       1/ 1 finisher
       1/ 1 reserver
       1/ 5 heartbeatmap
       1/ 5 perfcounter
       1/ 5 rgw
       1/10 civetweb
       1/ 5 javaclient
       1/ 5 asok
       1/ 1 throttle
       0/ 0 refs
       1/ 5 xio
       1/ 5 compressor
       1/ 5 bluestore
       1/ 5 bluefs
       1/ 3 bdev
       1/ 5 kstore
       4/ 5 rocksdb
       4/ 5 leveldb
       4/ 5 memdb
       1/ 5 kinetic
       1/ 5 fuse
       1/ 5 mgr
       1/ 5 mgrc
       1/ 5 dpdk
       1/ 5 eventtrace
      -2/-2 (syslog threshold)
      -1/-1 (stderr threshold)
      max_recent     10000
      max_new         1000
      log_file /var/log/ceph/ceph-osd.4.log
    --- end dump of recent events ---
    Object storage utilisation:
    Code:
    [root@kvm1 ~]# rados df
    POOL_NAME       USED   OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS    RD     WR_OPS    WR
    cephfs_data      1767M     445      0   1335                  0       0        0       583 78021k       587  1918M
    cephfs_metadata   889k      58      0    174                  0       0        0       256 14607k       906  1469k
    ec_ssd           3072M     769      0   2307                  0       0        0        94  4168k     14742  3072M
    rbd_hdd          1165G  300427      0 901281                  0       0        0 397694691 25133G 932424410 11349G
    rbd_ssd         59852M   15331      0  45993                  0       0        0   1933348   106G   4721464   160G
    
    total_objects    317030
    total_used       3658G
    total_avail      6399G
    total_space      10058G
     

    Attached Files:

    #1 David Herselman, May 14, 2018
    Last edited: May 15, 2018
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,744
    Likes Received:
    151
    This looks like something for the ceph-users mailing list and ceph-tracker, as I didn't find any similar postings so far. Did you post there too?

    Aside, we don't recommend the usage of EC pools, as they usually are slower then replicated pools with not much space savings either and tend to cause more trouble then benefit.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. David Herselman

    David Herselman Active Member
    Proxmox VE Subscriber

    Joined:
    Jun 8, 2016
    Messages:
    174
    Likes Received:
    37
    We've been happy with our erasure coded pools so far, they however require non-rotational media.

    Our replicated pools use a size of 3 and min_size of 2. Two device failures would hang writes until such time that OSDs are marked out but this yields 300% overhead.

    Our erasure coded pool uses k=3 and m=2 with min_size=4, so it provides the exact same level of protection but it's actually faster than the replicated pool and yields 166% overhead (5/3).

    The majority of data in the erasure coded pools are centralised and uncompressed firewall logs so compression is really helping; although it could be faster and smaller with lz4.

    Ceph IRC discussions have insinuated that it might be the lz4 library in Debian but I'll post in the ceph users list.

    Hope this serves as a warning and quick recovery for anyone else attempting inline compression with BlueStore...
     
    AlexLup likes this.
  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,744
    Likes Received:
    151
    lz4 needs to be in the version 1.7 for ceph, this is the case on PVE.

    Maybe the list gives some clues.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. alexskysilk

    alexskysilk Active Member
    Proxmox VE Subscriber

    Joined:
    Oct 16, 2015
    Messages:
    433
    Likes Received:
    48
    I'm intrigued. could you post some benchmarks for your EC pool vs 3P?
     
  6. David Herselman

    David Herselman Active Member
    Proxmox VE Subscriber

    Joined:
    Jun 8, 2016
    Messages:
    174
    Likes Received:
    37
    RADOS bench:
    Code:
    rados bench -p ec_nvme 10 write --no-cleanup
    Total time run:         10.060251
    Total writes made:      2187
    Write size:             4194304
    Object size:            4194304
    Bandwidth (MB/sec):     869.561
    Stddev Bandwidth:       240.658
    Max bandwidth (MB/sec): 1056
    Min bandwidth (MB/sec): 268
    Average IOPS:           217
    Stddev IOPS:            60
    Max IOPS:               264
    Min IOPS:               67
    Average Latency(s):     0.0735212
    Stddev Latency(s):      0.0614289
    Max latency(s):         0.513049
    Min latency(s):         0.0143447
    Code:
    rados bench -p rbd_nvme 10 write --no-cleanup
    Total time run:         10.358618
    Total writes made:      1872
    Write size:             4194304
    Object size:            4194304
    Bandwidth (MB/sec):     722.876
    Stddev Bandwidth:       188.168
    Max bandwidth (MB/sec): 948
    Min bandwidth (MB/sec): 376
    Average IOPS:           180
    Stddev IOPS:            47
    Max IOPS:               237
    Min IOPS:               94
    Average Latency(s):     0.0881898
    Stddev Latency(s):      0.12408
    Max latency(s):         1.26187
    Min latency(s):         0.0149534
    Erasure coding splits the data to multiple devices, which NVMe really excels at...

    There was a nice write up explaining the 40 MB/s limitation often observed within guests running fio in replicated pools and how erasure coding can lessen the bottleneck. I'll try to find it and update this thread...
     
    #6 David Herselman, May 17, 2018
    Last edited: May 17, 2018
  7. David Herselman

    David Herselman Active Member
    Proxmox VE Subscriber

    Joined:
    Jun 8, 2016
    Messages:
    174
    Likes Received:
    37
    My point above, where I stated that our erasure coded pool outperforms a replicated pool, should be taken in context to this thread. We primarily run replicated pools for stability purposes and use an erasure coded pool and compressed erasure coded pool for specific purposes. Simply dumping data yields 870 MB/s for an erasure coded pools whilst a replicated pool on the same OSDs yields 722 MB/s, the RADOS benchmark is however no comparison to how virtuals behave.

    Large sequential writes can benefit from erasure coding, this has a positive impact on guests as well as writes are sent to the primary OSD and it shards and distributes writes to the other OSDs. Each OSD has to subsequently only process a portion of the write, thus lower latency to a replicated pool where the data is sent to the primary OSD and it then has to transmit the full write to the replica OSDs.

    Herewith benchmark results using Microsoft's 'diskspd' tool. The pools are:
    • rbd_hdd - FileStore with SSD journal partitions (size 3, 6 servers with 4 OSDs each but failure domain is host)
    • rbd_nvme - BlueStore (size 3, 6 servers with 1 OSD each)
    • ec_nvme - BlueStore (k=3, m=2)

    Test VM:
    • Windows 2012r2 - x64
    • VirtIO SCSI with writeback caching
    • First diskspd sends flushes (underlying performance) whilst the 2nd doesn't (writeback buffer speeds)
    • First set of tests measures maximum throughput with large writes (256KB) whilst the 2nd measures IOPS using small writes (8KB)

    Benchmark results:
    Code:
    diskspd -b256K -d120 -h -L -o2 -t4 -r -w30 -c250M c:\io.dat
    diskspd -b256K -d120 -Sb -L -o2 -t4 -r -w30 -c250M c:\io.dat
    Windows 2012r2 - SCSI pass through drivers (vioscsi) with KRBD:
     rbd_hdd read   :  442.82 MBps   1771 IOPs      write   :  189.25 MBps   757 IOPs
                    : 3696.38 MBps  14786 IOPs      write   : 1584.57 MBps  6338 IOPs
     rbd_nvme read  :  643.23 MBps   2573 IOPs      write   :  275.55 MBps  1102 IOPs
                    : 3500.64 MBps  14003 IOPs      write   : 1502.16 MBps  6009 IOPs
     ec_nvme  read  :  475.69 MBps   1903 IOPs      write   :  203.31 MBps   813 IOPs
                    : 3887.80 MBps  15551 IOPs      write   : 1669.62 MBps  6679 IOPs
    
    diskspd -b8K -d120 -h -L -o2 -t4 -r -w30 -c250M c:\io.dat
    diskspd -b8K -d120 -Sb -L -o2 -t4 -r -w30 -c250M c:\io.dat
    Windows 2012r2 - SCSI pass through drivers (vioscsi) with KRBD:
     rbd_hdd read   :    8.32 MBps   1065 IOPs      write   :    3.56 MBps   455 IOPs
                    :  663.01 MBps  81025 IOPs      write   :  271.81 MBps 34792 IOPs
     rbd_nvme read  :   53.97 MBps   6908 IOPs      write   :   23.14 MBps  2962 IOPs
                    :  684.25 MBps  87584 IOPs      write   :  293.78 MBps 37604 IOPs
     ec_nvme  read  :   18.11 MBps   2318 IOPs      write   :    7.74 MBps   991 IOPs
                    :  839.52 MBps 107459 IOPs      write   :  360.31 MBps 46119 IOPs
    
    Disabled Anti-Virus:
    diskspd -b256K -d120 -h -L -o2 -t4 -r -w30 -c250M c:\io.dat
    diskspd -b256K -d120 -Sb -L -o2 -t4 -r -w30 -c250M c:\io.dat
    Windows 2012r2 - SCSI pass through drivers (vioscsi) with KRBD:
     rbd_hdd read   :  511.84 MBps   2047 IOPs      write   :  218.44 MBps   874 IOPs
                    : 3088.03 MBps  12352 IOPs      write   : 1324.08 MBps  5296 IOPs
     rbd_nvme read  :  594.21 MBps   2377 IOPs      write   :  254.11 MBps  1017 IOPs
                    : 3388.33 MBps  13553 IOPs      write   : 1454.38 MBps  5818 IOPs
     ec_nvme  read  :  253.17 MBps   1013 IOPs      write   :  108.78 MBps   435 IOPs
                    : 3168.47 MBps  12674 IOPs      write   : 1359.44 MBps  5438 IOPs
    
    diskspd -b8K -d120 -h -L -o2 -t4 -r -w30 -c250M c:\io.dat
    diskspd -b8K -d120 -Sb -L -o2 -t4 -r -w30 -c250M c:\io.dat
    Windows 2012r2 - SCSI pass through drivers (vioscsi) with KRBD:
     rbd_hdd read   :   19.24 MBps   2463 IOPs      write   :    8.24 MBps  1054 IOPs
                    :  897.44 MBps 114872 IOPs      write   :  385.21 MBps 49307 IOPs
     rbd_nvme read  :   53.97 MBps   6908 IOPs      write   :   23.14 MBps  2962 IOPs (not updated)
                    :  787.30 MBps 100774 IOPs      write   :  337.92 MBps 43254 IOPs
     ec_nvme  read  :   18.11 MBps   2318 IOPs      write   :    7.74 MBps   991 IOPs (not updated)
                    :  839.52 MBps 107459 IOPs      write   :  360.31 MBps 46119 IOPs

    Notes:
    • Cold storing data (eg backups) would be most efficient in a replicated pool of spinners.
    • Storing bulk data that needs to be mined aggressively would benefit from a compressed erasure coded SSD or NVMe pool. (eg centralised firewall logs which compress well)
    • Database hosting with allot of IOPS would be best on a replicated NVMe pool.
     
    #7 David Herselman, May 18, 2018
    Last edited: May 18, 2018
    AlexLup likes this.
  8. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,744
    Likes Received:
    151
    How is your system working, when ceph is in recovery? Does it stay up there with the above benchmark results?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  9. David Herselman

    David Herselman Active Member
    Proxmox VE Subscriber

    Joined:
    Jun 8, 2016
    Messages:
    174
    Likes Received:
    37
    It's surprisingly stable, the rados benchmarks from the post yesterday everything were with 3 deep scrubs active at the time...

    This is with OVS on an active/backup 10Gbps bond were we set the active (primary) NIC to a common switch stack member.
     
  10. alexskysilk

    alexskysilk Active Member
    Proxmox VE Subscriber

    Joined:
    Oct 16, 2015
    Messages:
    433
    Likes Received:
    48
    @David Herselman thank you for the inspiration. I set out to deploy on an ec pool but cannot (at least not containers).

    Current behavior of storage deployment on ceph is

    rbd create pool/vm-xxx-disk-1

    this fails when issued to an ecpool (librbd::image::CreateRequest: 0x55cd62d558d0 handle_add_image_to_directory: error adding image to directory: (95) Operation not supported)

    The correct command to issue to an ecpool should be

    rbd create rbd/vm-xxx-disk-1 --data-pool pool (which should also work for replicated pools). I filed a bug on the proxmox bugtracker (https://bugzilla.proxmox.com/show_bug.cgi?id=1816) hopefully this will be addressed so I can properly test the hypothesis.
     
  11. David Herselman

    David Herselman Active Member
    Proxmox VE Subscriber

    Joined:
    Jun 8, 2016
    Messages:
    174
    Likes Received:
    37
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice