@mira thanks - that makes sense. So with the test run as you suggested.
Directly against SSD I'm getting 20.0MiB/s with 5,366 IOPS.
With CephFS I'm getting 420KiB/s with 105 IOPS (so a pretty significant drop).
With SCSI mount I'm getting 144KiB/s with 36 IOPS (a huge drop! However much more...
Hi,
I've been testing our Proxmox Ceph cluster and have noticed something interesting. I've been running fio benchmarks against a CephFS mount and within a VM using VirtIO SCSI.
CephFS on /mnt/pve/cephfs -
root@pve03:/mnt/pve/cephfs# fio --name=random-write --ioengine=posixaio --rw=randwrite...
@spirit do you think that moving the journals off to enterprise disks but keeping our actual storage on the Samsung EVO's will resolve our IOWait issue? I'd imagine it should because the fast disks will very quickly be able to tell Ceph where the data is.
So from what you mention, our current...
So I've just run a test and from the VM I posted above. From monitoring the 'public' network and running an fio test I can see it's throwing data across the public network and distributing it to the other nodes.
I wasn't expecting this as I'd assumed (perhaps wrongly?) that it would read...
No, EVO's fortunately :) Ideally would be good to have enterprise 12Gbps drives in there but they're hard to get hold of at the minute!
We monitor the throughput constantly and haven't seen any real issues but from our tests although I can see we could get close to the mark so is certainly...
Hi,
So I've been running some fio tests to see what the random read/write performance is like.
I've got a Ceph pool called ceph_cluster (512 PGs) - this is the pool that all of the VM's sit on.
I also have a CephFS mount on the PVE's called cephfs_data (32 PGs).
The command I am running is -...
Hi Aaron,
I made the changes with ratios and all has been cleverly recalculated and moved as needed. It's automatically chosen 512 placement groups for the main ceph_cluster pool.
With regards to latency, each machine has 4 x 4TB 6Gbps Samsung 870 SSD's. There's one OSD to every disk and the...
Thanks Aaron - appreciate your help on this. I'll go ahead and change the scaling around a bit. With regards to making these changes, I presume it'll be a fairly intensive operation to refactor the 600GB or so we've got into new PG groups and we are likely to see some increased latency?
Thanks...
Hi aaron,
Thanks for your reply. In total I have four pools -
device_health_metrics - 128 PGs / 128 Optimal PGs - 16MB used
ceph-cluster - 64 PGs / 128 Optimal PGs - 499GB used (the one we'll be storing the majority of data ~7TB)
cephfs_data - 128 PGs / 128 Optimal PGs - 4.6 GB (max of 500GB...
Hi,
We're in the process of moving from a cluster that had a networked Dell ScaleIO storage to Proxmox using Ceph. Since moving across a couple of our VM's we've noticed quite an increase in iowait. Is this something that is typical of Ceph due to the nature of it's replication?
For reference...
Hi oguz, I'm using the enterprise repository. There were some updates available so I've updated and rebooted all three nodes.
I can now add a user no problem (or at least it would seem)
However, when I go to set the password, I get change password failed: user 'chris' does not exist (500)...
Hi oguz, thanks for your message. Everything is freshly installed, up to date and quorum is happy. I'm really not sure what the problem is here...
pve01:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-4 (running version: 7.1-4/ca457116)
pve-kernel-5.13: 7.1-4...
I've setup a new user from Datacenter > Permissions > Users
However on creating the user and attempting to set the password I get change password failed: user 'Chris' does not exist (500).
If I then attempt to delete the user I get the below:
delete user failed: cannot update tfa config...
I ran into this thread when searching Google. I've been trying my best to break our test-bed. The error I was getting was
service 'vm:100' in error state, must be disabled and fixed first.
The fix for me was to remove VM 100's HA from Datacenter > HA. After this it was stuck in a migrate state...
Thanks very much for your input - I'll go with switching rather than mesh nodes for redundancy and ability to scale.
- 'Public' networking (public, private & management) separated by VLAN on 10Gbps LACP
- Ceph on 10Gbps LACP
- Corosync with redundant ring, one 1Gbps network on one switch & one...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.