Isn't the issue here that the client process requesting the data from the OSDs needs to know where in the CRUSH topology it runs?
In a hyperconverged cluster with only three nodes this may be a very good optimization, because there all data is...
Hey zusammen,
gerne möchte ich (einige kennen mich sicherlich bereits von ProxLB, PegaProx und den Ansible Modulen für PVE) euch auf das Community-Event Proxday 2026 aufmerksam machen, der von der credativ GmbH veranstaltet wird. Das Konzept ist...
Hi Fabian,
Sorry for not getting back to you sooner. After a thorough investigation, we found out that one (1) of the cores in our Intel Xeon Gold 6154 CPU was faulty, causing the unexpected reboots.
Yes. For Kubernetes nodes to interact with Ceph directly (via RBD or CephFS), they need to be on the Ceph public network. Without that direct line of sight, the Kubernetes workers won't be able to map the block devices or mount the filesystems...
OMG, you won't believe this ..
For some reason, I installed the new version on to an m.2 drive in my chassis, but the boot sequence was coming from an old attached disk from somewhere - that old disk has an old (broken) proxmox install on it, so...
Hello Proxmox community,
I’d like to share an open-source project I maintain: Proxbox, a NetBox plugin for synchronizing Proxmox VE infrastructure data into NetBox.
The goal is simple: keep NetBox updated with inventory data from real Proxmox...
manual modprobing yes, autoloading by the kernel (e.g. by virtue of opening a socket of a certain type) no. but in recent pve-container versions, opening AF_ALG sockets is also forbidden via seccomp...
the upstream patches are still in flux, but the manual mitigations for copyfail and dirtyfrag (forbidding the affected modules) protect against the issue according to the reports so far. there will be an announcement as usual once fixed kernel...
Also thanks for letting me know!! Didn't realize my setup was that far from the network requirements, user error on my part. No way for me to get a better internet service for the buildings (once again whining about xfinity being the only real...
Ping times are way too high for corosync.
And do I read that correctly that there is an Internet connection involved? Do you use some kind of VPN between the sites?
As can be seen with the round trip times (rtt min/avg/max/mdev =...
thats an understatement.
the crucial bx series is one of the worst performing ssds i have ever seen, even on clients.
you may use it as cold storage, but anything warm or hot will perform terrible on it.
more so if its used with zfs/ceph.
even...
Additionally, with just 3 nodes in a ceph cluster, make sure you have at least 4 OSDs in each. Because with just 2 per node, you will likely have issues if one of the OSDs fails. As then Ceph will recover the lost replicas to the only node it can...
Those are on the cheaper and slower side of consumer SSDs. They will not perform well with sustained load and the primarily sync writes that Ceph does.
The recommendation for enterprise SSDs with power loss protection (PLP) is there for good...
Try to update the firmware of your Samsung drives.
Try to disable PCIe ASPM in BIOS or any other power saving feature.
Check if the drives overheat -> improve cooling.
Could also be an issue with the pcie -> m.2 adapter ...
After all we were unable to find a satisfactory solution with the ring topology. We kept experiencing the same issues over and over again.
We have now switched to a star topology using a 10 Gbit switch. Since then, the problems have disappeared...
Thanks for the detailed network diagram. Pulling together everything you've shared, I think the picture is fairly clear now -- though some targeted measurements would still help confirm the timing.
The core reason VMs freeze
During Ceph...