New 3-nodes cluster suggestion

Discussion in 'Proxmox VE: Installation and configuration' started by Stefano Giunchi, Feb 3, 2019.

  1. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    37
    Likes Received:
    1
    I'm about to build a new, small and general purpose cluster.
    The selected hardware is this:
    SuperMicro TwinPro (2029TP-HC0R), with 3 nodes, each with:
    1 CPU XEON SCALABLE (P4X-SKL3106-SR3GL)
    64GB RAM DDR4-2666 (MEM-DR432L-CL01-ER26)
    4 port 10GB (AOC-MTG-I4TM-O SIOM) FOR CEPH TRAFFIC (MESH)
    4 port 1 GB (AOC-UG-I4(x4)AOC-UG-I4)
    6 Intel D3-S4510 480GB (SSDSC2KB480G8) SATA/6Gbps
    1 SFT-OOB-LIC IPMI controller advanced management features

    The cluster will be used for a few virtual machines and data, the main scope is high availability, not extreme performance.
    Do you see anything bad in this configuration?

    Thanks
     
  2. rdrl

    rdrl New Member

    Joined:
    Jan 16, 2019
    Messages:
    3
    Likes Received:
    2
    Not knowing the parameters of your "few virtual machines and data" is difficult to give recommendations of configuration. But in my opinion, you may increase RAM to 96 or 128GB - there will be no useless.
     
  3. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    37
    Likes Received:
    1
    It will be 6 VMs to start, 4 Windows and 2 Linux. No more than 40GB used by VMs. Your recommendation to increase RAM is valid, but it's easily upgradeable. My main concern is on the SATA SSD, if anyone has used these for Ceph.
     
  4. rdrl

    rdrl New Member

    Joined:
    Jan 16, 2019
    Messages:
    3
    Likes Received:
    2
    Look at this topic - https://forum.proxmox.com/threads/memory-usage-on-empty-node-with-ceph.50760/

    You have 6 disks, most likely you will use 5-6 OSD on each node - approximately 20-24GB will be used only by CEPH.
    I can add that I have a test cluster ProxMox 5.3 with 3 virtual machines on each node (2 Linux and 1 Windows - allocated 2GB each, in the work take about 1.3 GB) with 6 OSD Ceph - on each node total memory load is about 32GB.
     
  5. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    37
    Likes Received:
    1
  6. alessice

    alessice New Member
    Proxmox Subscriber

    Joined:
    Sep 18, 2015
    Messages:
    11
    Likes Received:
    1
    Hi Stefano,

    I have buy now the same configuration (Supermicro Twin 2029TP-HC0R) with the intention to run Proxmox+Ceph cluster. Have you already install on new hardware and all works fine? Have you request to Supermicro to set LSI 3008 in IT Mode?

    Thanks
     
  7. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    37
    Likes Received:
    1
    Hi,
    I still haven't bought it, I just got the ok from the customer and I think we'll have it in a couple weeks.

    I haven't read the need to ask for the LSI to be flashed in IT mode/passthrough, as it's not a real raid card. Have you got any link to that?
    Please let's both update this post, I think it will be useful.

    What's your hardware? Mine will be this:

    SuperMicro TwinPro (2029TP-HC0R), with 3 nodes, each with:
    1 CPU XEON SCALABLE (P4X-SKL3106-SR3GL)
    64GB RAM DDR4-2666 (MEM-DR432L-CL01-ER26)
    4 port 10GB (AOC-MTG-I4TM-O SIOM) FOR CEPH TRAFFIC (MESH)
    4 port 1 GB (AOC-UG-I4(x4)AOC-UG-I4)
    2 Intel D3-S4510 240GB (SSDSC2KB240G8) SATA/6Gbps (raid1, system)
    4 Intel D3-S4510 960GB (SSDSC2KB960G8) SATA/6Gbps (jbod, ceph)
    1 SFT-OOB-LIC IPMI controller advanced management features

    Stefano
     
  8. alessice

    alessice New Member
    Proxmox Subscriber

    Joined:
    Sep 18, 2015
    Messages:
    11
    Likes Received:
    1
    Stefano, I have buy yestarday :)

    2 SuperMicro TwinPro (2029TP-HC0R), with 4 nodes, each with:
    2 CPU Xeon 4114
    192 GB RAM
    4 port 10GB SPF
    2 128GB SSD SATADOM for OS
    6 Intel D3-S4510 960GB for Ceph

    Why you have also 4 port 1 GB? My intention is to use 2x10Gbit for Ceph and 2x10Gbit for Internet. I will buy also 2 Switch low latency for the network.

    I request to Supermicro to have LSI 3008 in IT Mode since can be configured also in Raid (software):

    https://www.supermicro.com/en/products/storage/cards

    I don't know if is really necessary.
     
  9. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
    Just FYI, 3 nodes is considered ok for lab but not for production for a ceph cluster. The reason is simple- taking one node out will render your cluster read only. At MINIMUM you should have 4 for production.
     
  10. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    37
    Likes Received:
    1
    I did not select the SATADOM for OS, because I read this: https://www.supermicro.com/datasheet/datasheet_SuperDOM.pdf (see Use Cases not recommended)

    I don't use a switch for Ceph and corosync traffic in 10GB, I have it mesh, every connection with two bonded cables for higher reliability. That is
    NODE1-P1/2==>NODE2-P3/4
    NODE2-P1/2==>NODE3-P3/4
    NODE3-P1/2==>NODE1-P3/4

    The 4 ports 1GB are for vm traffic and backup, connected to two stacked switches (two cables each).
    This is the theory. I hope that trunking with stacked switches works as expected.
     
  11. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    37
    Likes Received:
    1
    alexskysilk,
    it's not what I've read and experienced. My lab consisted of two ceph nodes, with one copy each. The pool was 2/1. When I did shut down one node, vm were migrated (if in HA) and everything was working. Recovering was then a pain in the ass (old hardware, only 2 1gb lans, 3 7200rpm sata disks each).

    With three nodes, one data copy on each, it can work (both ceph and corosync) in case of a node down.
    If I set the pool to 3/1, the cluster can work even with TWO nodes down. That's not advisable though, It could be a last resort to revive the server manually in case disaster occurs.

    With four nodes, I need to keep three up for quorum.
     
    #11 Stefano Giunchi, Feb 22, 2019
    Last edited: Feb 22, 2019
  12. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    13,538
    Likes Received:
    404
    No, why?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  13. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
    By my understanding, ceph requires quorum to be authoritative down to the pg layer. Consequently, the minimum required members for a valid authoritative pg is 3. With only three nodes it is not possible to have 3 active chunks in pgs with 2 nodes active, and a default replication rule of 3/2 will disable write access.
     
  14. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    13,538
    Likes Received:
    404
    In a default three node Ceph Cluster, you have all data on all three nodes. so if one host is down, you still have 2 replicas, which is enough.

    2 hosts represents 66,66 % of the cluster (more than 50 %)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  15. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
    fair enough, although there is the outside chance of delta between two chunks in a pg, and absent a 3rd for quorum the pg will be taken offline. Fine for lab but an unnecessary risk for production since the relative cost of another node is low enough to make the deployment irresponsible otherwise.
     
  16. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    37
    Likes Received:
    1
    Thank you for the heads up on this problem. I didn't find any documentation on this, if you have some link it's appreciated.
    I think that writing the deployment with three nodes is "irresponsibile", anyway, seems a bit strong to me: the normal situation is with three nodes, and thus the pg quorum is respected.

    It's like saying that deploying a server with a raid-5 with three disks is irresponsible: if it is, I live in a world of irresponsible people...
     
  17. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    559
    Likes Received:
    59
    The issue isnt with the number of nodes; the issue is with exposure when degraded. The RAID5 analogy is apt here because it suffers from the exact same issue- Yes, the pool continues to function when a disk drops BUT YOU NO LONGER HAVE PARITY, and will not until the pool has been restored to healthy. What that means is that any media, bus, or host error cannot be trapped and your data is suspect as it can be corrupt or broken; in the case of a simple RAID5 you wouldn't even know.

    The "irresponsible" adjective is a function of your role in the design of the system. Since your customer is paying you to put this together, and you knowingly design a system with a hole in it you are being irresponsible by definition. Your customer can decide they'd rather not invest on curing the defect and live with the risk, but that is his decision to make.

    To quote my mom, if Billy jumps off a cliff doesnt mean you should too. RAID5 is absolutely irresponsible for anyone who cares about their data since the cost of removing the risk is ONE DRIVE (RAID6.)
     
    guletz likes this.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice