Search results

  1. D

    OSD wont start after Ceph upgrade from Hammer to Jewel

    I just updated one of our Ceph nodes using this tutorial, from Hammer to Jewel version. Unfortunately after upgrade OSD's wont start. We use Proxmox 4.4.5. OSD's have the journal mounted on SSD. Error is, root@ceph03:~# systemctl status ceph-osd@2.service ● ceph-osd@2.service - Ceph object...
  2. D

    Understanding Ceph

    In my case, we have 12 osd (6 nodes, 2 osd /node). Using pg_calc, rbd pool name, size 3, 12 osd, 100% data, target per osd 100 result pg count 512. At this moment we have 256. Should I change to 512 or jump to 1024? According to ceph documentation, the range is, Less than 5 OSDs set pg_num to...
  3. D

    Understanding Ceph

    Like they say, It's also important to know that the PG count can be increased, but NEVER decreased without destroying / recreating the pool. However, increasing the PG Count of a pool is one of the most impactful events in a Ceph Cluster, and should be avoided for production clusters if...
  4. D

    Understanding Ceph

    The situation in thesame. Default pool. Only difference is that we have 6 nodes. Use this to set pg_num. :) http://ceph.com/pgcalc/
  5. D

    Understanding Ceph

    Sorry if I hijacked this thread, it was not my intention. I have thesame issue, how to make a small Ceph cluster HA. If you consider it's better, I'll open another thread. :)
  6. D

    Understanding Ceph

    Unfortunately I can't give you at this moment what Ceph say if 2 osd's are down. We have other problems with the VM running on Ceph, partition corruption even when Ceph health is green. Now we are moving VM's out of Ceph until things become clear and they run without problems. No problems in the...
  7. D

    Understanding Ceph

    Here it is, # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable straw_calc_version 1 # devices device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5...
  8. D

    Understanding Ceph

    We have 6 nodes, each node running 2 osd with journal on Intel Enterprise SSD. When a node goes down (2 osd from 12 of them) 18% of the cluster goes out and still, a lot of faults. The VM go down, partition corruption... Still searching for a solution.
  9. D

    Proxmox 4.4.5 kernel: Out of memory: Kill process 8543 (kvm) score or sacrifice child

    Problem on our side was with the CEPH nodes. From time to time, OSD daemons where killed and CEPH marked them down. In one morning, half of our OSD was down, cluster was rebuilding, most of the VM partitions where corupted, some of them impossible to recover, data loss, absolutely horror. After...
  10. D

    Intel Skylake video memory purge kill osd process

    Our cluster use as storage Ceph, this bug caused a lot of partition corruption, some of them impossible to recover and the result was data loss. A lot of pain... :(
  11. D

    Intel Skylake video memory purge kill osd process

    Hello, Fabian. Thanks for the answer. Today I found that topic and updated the kernel. I hope that it will be OK.
  12. D

    Intel Skylake video memory purge kill osd process

    Is anyone alive on this forum? Does Proxmox has a living community?
  13. D

    Intel Skylake video memory purge kill osd process

    I guess is a bug. The ceph-osd consume less than 500MB of RAM. There are 2 OSD's and 16GB of memory. It should be sufficient. root@ceph05:~# ceph tell osd.6 heap stats osd.6 tcmalloc heap stats:------------------------------------------------ MALLOC: 282135536 ( 269.1 MiB) Bytes in use by...
  14. D

    Intel Skylake video memory purge kill osd process

    Unfortunately the problem persist. Not that often as before adding ram, but from time to time it appears.
  15. D

    Intel Skylake video memory purge kill osd process

    Afeter some hours of hell, I came to a conclusion that could help someone that is in the same situation. Our Ceph cluster had 6 nodes, each node 2 OSD (HDD 2TB). Four of them has 16GB of RAM, two of them only 8GB of RAM each. There are no virtual machines running on the Ceph nodes. According...
  16. D

    Intel Skylake video memory purge kill osd process

    Could it be possible that 8GB of RAM is not enough for a node with 2 OSD drives (HDD)? In summary area (Proxmox dashboard) it says that 1.74GB of 7.68 GB are in use.
  17. D

    Intel Skylake video memory purge kill osd process

    We have just updated to the latest version on Proxmox 4.4.5 when the problem start. Our configuration is using a cluster of ceph with 6 servers, 3 of them are Intel Skylake CPU's. On those Skylake based servers we have this, Jan 4 09:32:20 ceph07 kernel: [139775.594411] Purging GPU memory, 0...
  18. D

    CEPH storage corrupting disks when a CEPH node goes down..

    We changed to min_size=2 size=3 and after that we did not see any special hdd activity. After this modification shouldn't the cluster rebuild some data alocation?