how to configure disks for a cluster

PiotrDev

Member
Sep 15, 2019
10
0
21
37
Hi,
Some time ago I used proxmox on 1 server with some vms and it worked just fine, so when recently I saw possibilities of clustering and ceph integration I got excited, looks awesome!

I have 3 servers with following configuration:
AMD Epyc
256GB memory
4 x 960GB NVMe datacenter edition
1gb lan dedicated for proxmox cluster and ceph

I tried config:
2 disks raid1 for system
2 disks as osd
but performance looks more like SATA then NVMe :(

Code:
 root@ ~ # rados bench -p cephfs_data 10 seq
 hints = 1
   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
     0       0         0         0         0         0           -           0
     1      16        66        50    199.94       200    0.144333    0.206032
     2      16        98        82   163.965       128    0.651008    0.311796
     3      16       127       111   147.972       116    0.347257    0.338726
     4      16       154       138   137.976       108    0.119017    0.373592
     5      16       194       178   142.376       160    0.112286    0.394227
     6      16       229       213   141.977       140   0.0113202    0.409985
     7      16       272       256   146.263       172    0.230196    0.412742
and performance is not very nice..

I read suggestions to create 4 partitions per nvme disk. And other suggestion that setting bluestore_shard_finishers=true fixes problem of 1 osd on nvme device, but can't find where I could put that setting..?

Usage of cluster would be: 1 vm for ~200GB db (some heavy disk operations), and 2-3 more vms for jobs and/or nginx+php - so this part could be cached whole in memory and use disks for logs only.

Maybe should I put on system disks partition or two for osd? or maybe journal? What would you recommend?

I'm waiting to hear from my dc if they can upgrade my lan network to 10gb. Also I can add some disks, so if it would have big effect on performance I could add separate 2x240ssd just for system proxmox raid1 and 4x nvme would be all for ceph.
I'm looking right now for a way to boost performance, because ~140MB/s I know will not suffice for my needs.
 
I'm waiting to hear from my dc if they can upgrade my lan network to 10gb.

Without that, I would not use CEPH. 10 GBE is mandatory for fast CEPH. I also would use at least two 10 GBE for CEPH and you can live with two outbound 1GBE.

I have 3 servers with following configuration:
AMD Epyc
256GB memory
4 x 960GB NVMe datacenter edition
1gb lan dedicated for proxmox cluster and ceph

Add additional two disks for the OS and dedicate the 4 NVMe to Ceph.
 
Hi,
Some time ago I used proxmox on 1 server with some vms and it worked just fine, so when recently I saw possibilities of clustering and ceph integration I got excited, looks awesome!

I have 3 servers with following configuration:
AMD Epyc
256GB memory
4 x 960GB NVMe datacenter edition
1gb lan dedicated for proxmox cluster and ceph

I tried config:
2 disks raid1 for system
2 disks as osd
but performance looks more like SATA then NVMe :(

Code:
 root@ ~ # rados bench -p cephfs_data 10 seq
hints = 1
   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
     0       0         0         0         0         0           -           0
     1      16        66        50    199.94       200    0.144333    0.206032
     2      16        98        82   163.965       128    0.651008    0.311796
     3      16       127       111   147.972       116    0.347257    0.338726
     4      16       154       138   137.976       108    0.119017    0.373592
     5      16       194       178   142.376       160    0.112286    0.394227
     6      16       229       213   141.977       140   0.0113202    0.409985
     7      16       272       256   146.263       172    0.230196    0.412742
and performance is not very nice..

I read suggestions to create 4 partitions per nvme disk. And other suggestion that setting bluestore_shard_finishers=true fixes problem of 1 osd on nvme device, but can't find where I could put that setting..?

Usage of cluster would be: 1 vm for ~200GB db (some heavy disk operations), and 2-3 more vms for jobs and/or nginx+php - so this part could be cached whole in memory and use disks for logs only.

Maybe should I put on system disks partition or two for osd? or maybe journal? What would you recommend?

I'm waiting to hear from my dc if they can upgrade my lan network to 10gb. Also I can add some disks, so if it would have big effect on performance I could add separate 2x240ssd just for system proxmox raid1 and 4x nvme would be all for ceph.
I'm looking right now for a way to boost performance, because ~140MB/s I know will not suffice for my needs.

You are being massively limited by your network, a single NVME can do more than your 1Gbps network can handle.

You will need to upgrade to 10Gbps atleast to be able to see some of the NVME performance.
 
I see. 10gbit lan network would cost me +50 euro
additional 2 disks nvme 960gb per each server = 6*19 eur = 114 eur, so it makes +164 eur, I'm afraid it's too much for mine budget

You will need to upgrade to 10Gbps atleast to be able to see some of the NVME performance.
what do you think, what could I expect with current disks and 10gbps? 50% performance of nvme or less?



My other idea is to use zfs with replication.
Purpose of cluster is to have:
  1. vm for db
  2. vm for backend jobs
  3. vm for frontend (nginx+php)
  4. vm for db replication
  5. vm with high speed disk that executes heavy io import process once per month - data loss is not a problem
  6. vm for elasticsearch which is rebuild periodically so data loss is not a problem here

So with ZFS I could set:
disks layout: 2 disks with 100G raid1 for system, and raid0 ~700GB*2 for this high speed process (vm 5 + 6)
2 disks in zfs pool for the rests with replication enabled for 2 and 3
 
I see. 10gbit lan network would cost me +50 euro
additional 2 disks nvme 960gb per each server = 6*19 eur = 114 eur, so it makes +164 eur, I'm afraid it's too much for mine budget


what do you think, what could I expect with current disks and 10gbps? 50% performance of nvme or less?



My other idea is to use zfs with replication.
Purpose of cluster is to have:
  1. vm for db
  2. vm for backend jobs
  3. vm for frontend (nginx+php)
  4. vm for db replication
  5. vm with high speed disk that executes heavy io import process once per month - data loss is not a problem
  6. vm for elasticsearch which is rebuild periodically so data loss is not a problem here

So with ZFS I could set:
disks layout: 2 disks with 100G raid1 for system, and raid0 ~700GB*2 for this high speed process (vm 5 + 6)
2 disks in zfs pool for the rests with replication enabled for 2 and 3

I wouldn't worry to much about the extra NVME, the main bottleneck is 100% your network.

If you can add just the 10Gbps LAN, use a small part of each NVME for RAID10 for the OS and the rest for CEPH you should see much better performance than you are currently seeing.
 
NVMe tops at 4 GB/s, your network tops at 1 Gb/s, that is factor 32 slower.... so about 3% of the performance of your NVMe.
Ok, I get that. But I'd love to know is..what performance could I expect when I order 10gbit cards with switch with that amount of disks?
 
what performance

What is performance for you? If you just compare single thread random I/O, you will not benefit from 10 GBE much, for sequential read, the difference is huge.

Ok, I get that.

So NVMe tops at 4 GB/s, your 10 GBE tops at 1,25 GB/s so the maximum throughput is limited at roughly 31% of your NVMe. For CEPH with NVMe, it's best to go with 25, 40 or directly to 100 GBE if you want to saturate the network.

Please keep in mind that the NVMe top speed is often only reached on multiple I/O threads, so single file actions will be slower.
 
  • Like
Reactions: guletz

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!