Ceph raw usage grows by itself

Discussion in 'Proxmox VE: Installation and configuration' started by Ozz, Nov 29, 2017.

Tags:
  1. Ozz

    Ozz New Member

    Joined:
    Nov 29, 2017
    Messages:
    10
    Likes Received:
    0
    Hi,

    I have a new cluster of 4 nodes, 3 of them have ceph.
    Code:
    root@pve3:~# pveversion -v
    proxmox-ve: 5.1-25 (running kernel: 4.13.4-1-pve)
    pve-manager: 5.1-35 (running version: 5.1-35/722cc488)
    pve-kernel-4.13.4-1-pve: 4.13.4-25
    libpve-http-server-perl: 2.0-6
    lvm2: 2.02.168-pve6
    corosync: 2.4.2-pve3
    libqb0: 1.0.1-1
    pve-cluster: 5.0-15
    qemu-server: 5.0-17
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-20
    libpve-guest-common-perl: 2.0-13
    libpve-access-control: 5.0-7
    libpve-storage-perl: 5.0-16
    pve-libspice-server1: 0.12.8-3
    vncterm: 1.5-2
    pve-docs: 5.1-12
    pve-qemu-kvm: 2.9.1-2
    pve-container: 2.0-17
    pve-firewall: 3.0-3
    pve-ha-manager: 2.0-3
    ksm-control-daemon: 1.2-2
    glusterfs-client: 3.8.8-1
    lxc-pve: 2.1.0-2
    lxcfs: 2.0.7-pve4
    criu: 2.11.1-1~bpo90
    novnc-pve: 0.6-4
    smartmontools: 6.5+svn4324-1
    zfsutils-linux: 0.7.2-pve1~bpo90
    openvswitch-switch: 2.7.0-2
    ceph: 12.2.1-pve3
    Code:
    ceph -v
    ceph version 12.2.1 (1a629971a9bcaaae99e5539a3a43f800a297f267) luminous (stable)
    I'm using all SSD single pool.
    Bluestore, no rocks-db or WAL.
    The "journal" or whatever it's called now is 100MB a disk.
    ceph-cache is enabled.
    Cache per VM set at no-cache.

    I transferred 4 VMs from VMware vsphere over and testing them.
    The machines are doing nothing. I mean, they do have CentOS 6 on them and apache but nobody communicates with them.
    I'm doing automatic backup every night to an NFS share.

    Now, I noticed that even though the machines are just sitting there - the raw usage of ceph is constantly growing.
    This is the output of "ceph df detail":
    Code:
    GLOBAL:
        SIZE      AVAIL     RAW USED     %RAW USED     OBJECTS
        8941G     8888G       54546M          0.60        3960
    POOLS:
        NAME       ID     QUOTA OBJECTS     QUOTA BYTES     USED       %USED     MAX AVAIL     OBJECTS     DIRTY     READ     WRITE     RAW USED
        VMpool     1      N/A               N/A             14337M      0.17         2812G        3960      3960     490k      525k       43013M
    So the pool usage with replica 3 is 43013 MB, which is fine and it grows very slowly, i.e. several MB a day.
    But the "RAW USED" 54546M in the GLOBAL section grows much faster - about 1GB/day.

    If I run fstrim on the VMs - it helps a little ( 5-20MBs in total).


    So what's with the 11GB difference between the GLOBAL and POOL usage?
    How is the GLOBAL usage calculated?
    And the most important - why does it grow by itself?
    If I transfer all of my 50 VMs over, and there are about 20 VMs with 100GB-800GB - what are the consequences?

    Code:
    ceph -s
      cluster:
        id:     a1ba7570-38aa-4410-9318-92f3788ef7ef
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum pve1,pve2,pve3
        mgr: pve3(active), standbys: pve2, pve1
        osd: 12 osds: 12 up, 12 in
     
      data:
        pools:   1 pools, 1024 pgs
        objects: 3960 objects, 14337 MB
        usage:   54546 MB used, 8888 GB / 8941 GB avail
        pgs:     1024 active+clean
     
      io:
        client:   1364 B/s wr, 0 op/s rd, 0 op/s wr
    Code:
    ceph -w
      cluster:
        id:     a1ba7570-38aa-4410-9318-92f3788ef7ef
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum pve1,pve2,pve3
        mgr: pve3(active), standbys: pve2, pve1
        osd: 12 osds: 12 up, 12 in
     
      data:
        pools:   1 pools, 1024 pgs
        objects: 3960 objects, 14337 MB
        usage:   54569 MB used, 8888 GB / 8941 GB avail
        pgs:     1024 active+clean
     
      io:
        client:   1023 B/s wr, 0 op/s rd, 0 op/s wr
    Please assist, I must know what I'm getting into before I go on.

    Thanks!
     
  2. Ozz

    Ozz New Member

    Joined:
    Nov 29, 2017
    Messages:
    10
    Likes Received:
    0
  3. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,356
    Likes Received:
    213
    To see from all nodes the ceph version, do a 'ceph versions'.

    You still have a RocksDB and WAL, just not on a separate device.
    http://ceph.com/community/new-luminous-bluestore/

    That is a xfs partition that holds the needed metadata and links for the OSD.
    http://ceph.com/community/new-luminous-bluestore/

    The librbd cache is activated by default. With the qemu setting (cache:none/writeback/writethrough) you overrule the ceph settings.
    http://docs.ceph.com/docs/master/rbd/qemu-rbd/#qemu-cache-options

    Not true, they are sure doing something, like writing logfiles, moving unused data to swap, updating files (eg in /temp).

    It not only holds your RAW USED data, but also includes the DB+WAL and by default they are 1GB+512MB, the 1GB for DB is allocated on OSD creation. The GLOBAL also reflects the whole cluster and doesn't need to correspond with the RAW AVAILABLE/USED of the pool. And as more data is added to the OSD (objects + DB), it grows.
    http://ceph.com/planet/understanding-bluestore-cephs-new-storage-backend/
    http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/

    I guess now, you can do the math.
    To calculate how many PGs you might need for your pool: http://ceph.com/pgcalc/
    https://pve.proxmox.com/pve-docs/chapter-pveceph.html

    As always, if all works well, then it is strait forward, but if there is a disaster you need to be prepared. Please find the following links as a help to understand Ceph more deeply.

    Our docs to Ceph: https://pve.proxmox.com/pve-docs/
    If you are looking for a PVE support subscription: https://www.proxmox.com/en/proxmox-ve/pricing
    Intro to Ceph: http://docs.ceph.com/docs/master/start/intro/
    Hardware recommendations + useful tips: http://docs.ceph.com/docs/master/start/hardware-recommendations/
    How to operate: http://docs.ceph.com/docs/master/rados/operations/
    Architecture of Ceph, this goes in deep: http://docs.ceph.com/docs/master/architecture/
    To get in touch with Ceph people, mailling lists, IRC : http://docs.ceph.com/docs/master/start/get-involved/

    You always can ask questions here, in the PVE forum. But if they are very Ceph specific, you will find a wider audience on the Ceph mailling lists. ;)

    High-level intro to ceph:
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  4. Ozz

    Ozz New Member

    Joined:
    Nov 29, 2017
    Messages:
    10
    Likes Received:
    0
    Hi,
    Thanks a lot for such a detailed response.

    I'd like to clarify some things though.
    I couldn't find anywhere the numbers you mentioned: the default size of rocksdb is 1GB and WAL - 500MB. Can you please direct me to these?

    If I have a total of 12 OSDs in cluster, will I be right to assume that the difference between the GLOBAL and POOL usage values will never be larger than 12*(1+0.5)+0.1*12=19.2GB?
    0.1 is the 100MB XFS partition on each OSD.

    Also, can you please elaborate on the cache aspect? I was under impression that I have to leave the default "no-cache" on each disk or VM (don't remember exactly where it's set) but set the option to "true" in ceph.conf.
    Am I wrong?

    Thanks a lot!
     
  5. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,356
    Likes Received:
    213
    Sadly it is not on the docs. You can find information on the mailling list and check the source code.

    Yes, but WAL is allocated on use, AFAIK it is 100 MB on the beginning.

    Set per disk, further see section: QEMU CACHE OPTIONS -> http://docs.ceph.com/docs/master/rbd/qemu-rbd/#qemu-cache-options
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice