All hardware was recommended by SMC engineering after consultation. I have two clusters, each cluster contains:
Ceph Storage Node Hardware: SMC SSG-6029P-E1CR12L (x3)
24 x Xeon Gold 6128 CPU@3.40GHz
192GB memory
2x Samsung SM863a 1.9TB SSD (one for system, one for CephDB)
10x Toshiba...
I have upgraded my 6 node cluster (3 ceph-only plus 3 compute-only nodes) from 5.4 to 6. The Ceph config was created on the Luminous release and I am following the upgrade instructions provided at https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus. During the upgrade the OSDs were...
Thanks for confirming that. I was about to post a followup with my observations that the bluestore setting was the only way I could get it to create the partition size I needed.
The other thing that got me is that you can't just delete one converted OSD from luminous and re-add it with...
I thought I would try re-creating the OSDs with Nautilus, but now it's creating the DB LVM size at about 370 GB, which I guess is 10% of the OSD. However, my SSD is only 1.7TB, so after creating 4 of the 10 OSDs in that server, it runs out of space on the SSD.
I have tried using the size...
Hi all,
After an upgrade on one cluster from 5.4 to 6.0, I performed the Ceph upgrade procedures listed here:
https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus
Somewhere along the way, in the midst of all the messages, I got the following WARN: BlueFS spillover detected on 30 OSD(s). In...
I have three ceph nodes in my Proxmox Cluster that I do NOT want users creating VMs on or the system automatically moving VMs to. At first I thought I could do that through the permissions system, but after reading some posts it looks like removing storage should do it. I'm just posting this...
Just a quick update. In one cluster I was able to simply delete all the OSDs and pools, add new OSDs and create new pools. That worked perfectly. On a second cluster, where all the VM images were already on Ceph, I added all the new OSDs and removed the OSDs from one of the existing nodes...
Thanks for that advice. I am definitely replacing the older Ceph nodes, not adding to and would like to avoid the re-balancing act at all cost.
I may have another way to go...The reason the current Ceph cluster is being replaced is that it never quite worked right and had issues when I started...
Hi all.
I have an existing PVE/Ceph cluster that I am currently upgrading. The PVE portion is rather straight forward, I'm adding the new nodes to the PVE cluster, moving VMs off the old ones, then removing the old ones from the cluster. Easy Peasy.
However, what I don't know is the best...
Thanks for the input and big message to follow.
It would be nice if there were a concise guide specifically for troubleshooting ceph, especially what to check in what order. Like how to verify ceph network, osd, and physical hardware performance. How to read the dashboard (Reads/Writes/IOPS)...
There are two pools, one RBD with 512 PGs, and now a cephFS with 128 PGs.
The health reports as HEALTH_OK, but it changes as the problem emerges:
2018-12-11 10:34:02.014813 mon.belle mon.0 192.168.201.241:6789/0 114595 : cluster [WRN] Health check failed: 13 slow requests are blocked > 32 sec...
It's been a while since I've done that with LVM. You will need to look closer at renaming LVMs. I would expect you would need to shutdown the VM, rename the LV, change the config file to match, then restart the VM. If you are in a cluster, then you have to run that on the node that owns the VM.
I'm trying to understand how the Ceph pools are all working together. Before Proxmox 5.3. On this cluster, I have three nodes with four 2TB drives in each node (for roughly 22 TB total disk space after overhead). I had a single pool with 512 PGs in 3/2 configuration that was used to create a...
Hi all,
I've been running Proxmox for a number of years and now have a 13 node cluster where last year I added Ceph to the mix (after a 5.2 upgrade) using the empty drive bays in some of the Proxmox nodes. Last Friday I upgraded all nodes to version 5.3.
The Ceph system has always felt slow...
In the first cluster, there are three nodes with four OSDs each. All HDDs (2 TB SAS) and SSD units are identical.
I have looked back over the configuration for the first node in the first cluster (OSDs 0-3) and compared it to the configuration on the other nodes and there doesn't appear to be...
I am now getting back to this issue. I haven't found anything that would explain why the OSDs in the first server (OSDs 0, 1, 2, 3) show the write time to be on average 9 seconds, while the other OSDs (4 through 11) all have write times on average about 1.5 seconds.
I have a different cluster...
While investigating OSD performance issues on a new ceph cluster, I did the same analysis on my "good" cluster. I discovered something interesting and fixing it may be the solution to my new cluster issue.
For the "good" cluster, I have three nearly identical servers. Each server has four...
I built a ceph cluster earlier this year for one of my Proxmox clusters and it has been working just fine. I had enough drive slots in each storage node to include a dedicated SSD for the OSD journals and that cluster is working fine in terms of performance.
On a second cluster, I only had...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.