No, as soon as there is no quorum any more between the MONs, i.e. the majority of MONs do not see each other, the cluster will stop working.
And for the number of MONs: The cephadm orchestrator deploys 5 by default for the reasons I outlined...
The current recommendation from the Ceph project is to run 5 MONs.
With only three MONs you run into a high risk situation after losing just one MON. Losing another and your cluster stops.
With five MONs you can loose two and the cluster will...
Please post the output of
ceph status
ceph mon dump
ceph config dump
ceph osd df tree
ceph osd crush rule dump
ceph osd pool ls detail
from each node: ip addr show
Klar bremst der RAID Controller. Ich würde nie NVMe hinter einen RAID Controller packen. Die RAID Controller haben in der Regel 8 PCI Lanes und jede NVMe auch schon 4, somit ist der RAID Controller immer ein Bottleneck bei mehr als 2 NVMe.
Dann...
Ganz klar, die falsche CPU im Host gewählt. Das Backup führt der Proxmox Backup Client aus auf dem PVE. Dieser muss die Daten auch Komprimieren, was von der Single Core Performance des Hosts abhängt. Da die CPU nur 525MB/s Compression schafft und...
You might be able to achieve what you want using the single file restore feature, which allows to download parts of the filesystem of a VM as zip or tar archive. You will however have to copy and place the files into your VM manually, there is...
16 consumer NVMe drives. Any write Ceph does is sync and any drive without PLP will show high latency and, once its cache fills, poor sequential performance. Keep in mind that you have to write to 3 disks and besides data itself it has to write...
Write Back, schreibt immer erst einmal in den RAM.
Der Ceph Benchmark macht immer mehrere Streams. Dein Diskmark nur einen auf einer vDisk. Mehrere VMs performen dann auch schön.
Produktiv habe ich nirgends Performanceprobleme, auch nicht bei...
Try using CPU x86-64-v2-AES or x86-64-v3-AES instead of "host."
Google Translate:
Versuchen Sie, anstelle von „host“ die CPU-Typen x86-64-v2-AES oder x86-64-v3-AES zu verwenden...
A status update on this:
Two corosync parameters that are especially relevant for larger clusters are the "token timeout" and the "consensus timeout". When a node goes offline, corosync (or rather the totem protocol it implements) will need to...
Kannst Du bitte mal dein Setup genauer beschreiben.
ceph osd df tree
qm config <VMID>
ceph osd pool ls detail
ceph osd pool autoscale-status
Bitte setze dein Output immer in [ CODE ] tags (oben im Menü das "</>"), dann kann man das deutlich...
Is a good practice to separate Ceph data traffic in another interface or almost another VLAN and use 10 Gb or more for this traffic, You should keep in mind that modern SSDs have a very high transfer rate and need a network to match this...
Is not a critical error, blueStore has recovered the failure, and the cause can be a puntual problem with the network, or some hardware element (controller, disk) but it appears to be for a low response of these OSD.
To remove the warning:
ceph...
Hi, from a quick look, the "Retransmit" messages may be a symptom of network stability issues (e.g. lost packets, increased latency etc) that are more likely to occur if corosync shares a physical network with other traffic types -- I'd expect...
There is some documentation available for Corosync network setup: https://pve.proxmox.com/wiki/Cluster_Manager#pvecm_cluster_network
Using bonds is not advised, shared links with other traffic types are not advised too.
Yep, not a whole lot to say really - we have a 5 node ceph cluster (only 4 of which hold storage) and the SQL server's drives are on enterprise SSD backed ceph storage. Zero complaints about stability or performance.
Couldn't see myself moving...