Created an erasure code pool in ceph , but cannot work with it in proxmox

Discussion in 'Proxmox VE: Installation and configuration' started by danielc, Jul 4, 2018.

  1. danielc

    danielc New Member

    Joined:
    Feb 28, 2018
    Messages:
    26
    Likes Received:
    1
    Hello

    Created an erasure code pool in ceph , but cannot work with it in proxmox.
    I simply used RBD(PVE) to moint it.
    The pool can show up under proxmox correctly, with size as well, but cannot move disk to there:

    create full clone of drive virtio0 (hdd:vm-100-disk-1)
    error adding image to directory: (95) Operation not supported
    TASK ERROR: storage migration failed: error with cfs lock 'storage-backup_erasure': rbd create vm-100-disk-1' error: error adding image to directory: (95) Operation not supported

    I also cannot create VM to it:
    rbd: create error: (22) Invalid argument2018-07-04 16:57:35.778203 7fc5627fc700 -1 librbd::image::CreateRequest: 0x561876d2cb90 handle_validate_overwrite: pool missing required overwrite support
    TASK ERROR: create failed - error with cfs lock 'storage-backup_erasure': rbd create vm-107-disk-1' error: rbd: create error: (22) Invalid argument2018-07-04 16:57:35.778203 7fc5627fc700 -1 librbd::image::CreateRequest: 0x561876d2cb90 handle_validate_overwrite: pool missing required overwrite support


    But if i run rados to test, it is totally fine:
    root@ceph1:~# rados -p backup_erasure bench 20 write -t 32 -b 4096 --no-cleanup
    INFO: op_size has been rounded to 12288
    hints = 1
    Maintaining 32 concurrent writes of 12288 bytes to objects of size 12288 for up to 20 seconds or 0 objects
    Object prefix: benchmark_data_ceph1_1265629
    sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
    0 0 0 0 0 0 - 0
    1 32 700 668 7.82797 7.82812 0.0178338 0.0462036
    2 32 1392 1360 7.96808 8.10938 0.0725665 0.046
    3 32 2065 2033 7.94071 7.88672 0.0169972 0.0467111
    .....

    What can i do here with this case?
    Thanks
     
  2. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,736
    Likes Received:
    150
    danielc likes this.
  3. danielc

    danielc New Member

    Joined:
    Feb 28, 2018
    Messages:
    26
    Likes Received:
    1
    Hello Udo

    Thanks for your information, while i aware this tier information during create the erasure pool, i did not know that it is a requirment.
    Now i created the tire pool, but i am not sure if this result is correct? Let say i moved a 50G image to this pool:

    But it looks like the 50G stays in the tier pool and not moved to the erasure pool. Is this supposed to be a correct result?
    Thank you.
     

    Attached Files:

  4. brad_mssw

    brad_mssw Member

    Joined:
    Jun 13, 2014
    Messages:
    116
    Likes Received:
    5
    I'm pretty sure as of Luminous a cache tier is no longer a requirement:
    https://ceph.com/community/new-luminous-erasure-coding-rbd-cephfs/

    However, the issue I think is the header and metadata must still be stored in a replicated pool, with only the data in the erasure pool. An example rbd image creation is:
    rbd create rbd/myimage --size 1T --data-pool ec42

    To my knowledge, ProxMox has not yet enabled this level of sophistication, so you'd have to manually migrate the images then update the vm configuration directly.
     
    danielc likes this.
  5. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,744
    Likes Received:
    151
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    danielc likes this.
  6. danielc

    danielc New Member

    Joined:
    Feb 28, 2018
    Messages:
    26
    Likes Received:
    1
  7. David Herselman

    David Herselman Active Member
    Proxmox VE Subscriber

    Joined:
    Jun 8, 2016
    Messages:
    174
    Likes Received:
    37
    Looks like you simply have to enable overwriting. Herewith my notes from when I setup an erasure coded pool (8 months of production use) and a compressed erasure coded pool (3 months of production use):

    NB: We run the pool with a min_size of 4 (3 data and 1 parity shards) and subsequently require a minimum of 6 hosts, of which 3 are monitors.

    Create Eraure Coded pool:
    Code:
    ceph osd erasure-code-profile set ec32_nvme \
      plugin=jerasure k=3 m=2 technique=reed_sol_van \
      crush-root=default crush-failure-domain=host crush-device-class=nvme \
      directory=/usr/lib/ceph/erasure-code;
    ceph osd pool create ec_nvme 16 erasure ec32_nvme;
    ceph osd pool set ec_nvme allow_ec_overwrites true;
    ceph osd pool application enable ec_nvme rbd;
    

    Optionally enable compression:
    Code:
    ceph osd pool set ec_compr_nvme compression_algorithm snappy;
    ceph osd pool set ec_compr_nvme compression_mode aggressive;

    Create a replicated pool for metadata:
    Code:
    ceph osd crush rule create-replicated replicated_nvme default host nvme;
    ceph osd pool create rbd_nvme 64 64 replicated replicated_nvme;
    ceph osd pool application enable rbd_nvme rbd;

    Create RBD image in replicated pool (metadata) but place data in erasure coded pool:
    Code:
    rbd create rbd_nvme/test_ec --size 100G --data-pool ec_nvme;

    Update Proxmox storage configuration (/etc/pve/storage.cfg):
    Code:
    rbd: rbd_nvme
            monhost 10.254.1.3;10.254.1.4;10.254.1.5
            content images,rootdir
            krbd 1
            pool rbd_nvme
            username admin
    

    PS: Don't forget to copy 'admin' Ceph key:
    Code:
    cp /etc/pve/priv/ceph.client.admin.keyring /etc/pve/priv/ceph/rbd_nvme.keyring;

    Fastest way to manually transfer images (skips unused or trimmed sections):
    Code:
    qemu-img convert -f raw -O raw -t unsafe -T unsafe -nWp rbd:rbd_hdd/vm-213-disk-1 rbd:rbd_nvme/vm-213-disk-1_new

    I've used the following Perl monster for almost 20 years, it essentially reads two block devices in 4MB chunks and only transfers chunks when they don't match. Great for incremental snapshot backups or copying between any block device:
    Code:
    export dev1=`rbd map rbd_hdd/vm-213-disk-1 --name client.admin -k /etc/pve/priv/ceph.client.admin.keyring;`;
    export dev2=`rbd map rbd_nvme/vm-213-disk-1_new --name client.admin -k /etc/pve/priv/ceph.client.admin.keyring;`;
    
    perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};print md5($_)' $dev2 |
      perl -'MDigest::MD5 md5' -ne 'BEGIN{$/=\4194304};$b=md5($_);
        read STDIN,$a,16;if ($a eq $b) {print "s"} else {print "c" . $_}' $dev1 |
          perl -ne 'BEGIN{$/=\1} if ($_ eq"s") {$s++} else {if ($s) {
            seek STDOUT,$s*4194304,1; $s=0}; read ARGV,$buf,4194304; print $buf}' 1<> $dev2;


     
  8. David Herselman

    David Herselman Active Member
    Proxmox VE Subscriber

    Joined:
    Jun 8, 2016
    Messages:
    174
    Likes Received:
    37
    Hi Alwin,

    We manage erasure coded and compressed erasure coded Ceph pools via the CLI and subsequently manually edit the VM configuration files. It would be nice if the GUI obtained information on the images (rbd info <pool>/<image>) to know that data is located in an alternate pool.


    The following is a screenshot of the pool data utilisation, showing no usage in the metadata rbd pool (rbd_nvme):
    ceph-pools.jpg


    Sample view of RBD images stored in rbd_nvme (data for all images is in ec_nvme, apart from vm-172-disk-3 and vm-213-disk-3 who store their data in the ec_compr_nvme pool):
    Code:
    [root@kvm5a priv]# rbd ls rbd_nvme -l
    NAME                       SIZE PARENT                            FMT PROT LOCK
    base-210-disk-1           4400M                                     2
    base-210-disk-1@__base__  4400M                                     2 yes
    base-210-disk-2          30720M                                     2
    base-210-disk-2@__base__ 30720M                                     2 yes
    base-210-disk-3          20480M                                     2
    base-210-disk-3@__base__ 20480M                                     2 yes
    vm-100-disk-1            81920M                                     2      excl
    vm-101-disk-1            81920M                                     2      excl
    vm-172-disk-1             4400M rbd_nvme/base-210-disk-1@__base__   2      excl
    vm-172-disk-2            61440M rbd_nvme/base-210-disk-2@__base__   2      excl
    vm-172-disk-3             3072G                                     2      excl
    vm-211-disk-1             4400M rbd_nvme/base-210-disk-1@__base__   2      excl
    vm-211-disk-2            61440M rbd_nvme/base-210-disk-2@__base__   2      excl
    vm-211-disk-3            20480M rbd_nvme/base-210-disk-3@__base__   2      excl
    vm-212-disk-1             4400M rbd_nvme/base-210-disk-1@__base__   2      excl
    vm-212-disk-2            61440M rbd_nvme/base-210-disk-2@__base__   2      excl
    vm-212-disk-3            20480M rbd_nvme/base-210-disk-3@__base__   2      excl
    vm-213-disk-1             4400M rbd_nvme/base-210-disk-1@__base__   2      excl
    vm-213-disk-2            30720M rbd_nvme/base-210-disk-2@__base__   2      excl
    vm-213-disk-3              750G                                     2      excl
    vm-238-disk-1            81920M                                     2      excl

    rbd info rbd_nvme/vm-172-disk-3
    Code:
    rbd image 'vm-172-disk-3':
            size 3072 GB in 786432 objects
            order 22 (4096 kB objects)
            data_pool: ec_compr_nvme
            block_name_prefix: rbd_data.17.279de52ae8944a
            format: 2
            features: layering, exclusive-lock, data-pool
            flags:
            create_timestamp: Fri Jul  6 22:12:29 2018



     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice