manual modprobing yes, autoloading by the kernel (e.g. by virtue of opening a socket of a certain type) no. but in recent pve-container versions, opening AF_ALG sockets is also forbidden via seccomp...
the upstream patches are still in flux, but the manual mitigations for copyfail and dirtyfrag (forbidding the affected modules) protect against the issue according to the reports so far. there will be an announcement as usual once fixed kernel...
Also thanks for letting me know!! Didn't realize my setup was that far from the network requirements, user error on my part. No way for me to get a better internet service for the buildings (once again whining about xfinity being the only real...
Ping times are way too high for corosync.
And do I read that correctly that there is an Internet connection involved? Do you use some kind of VPN between the sites?
As can be seen with the round trip times (rtt min/avg/max/mdev =...
thats an understatement.
the crucial bx series is one of the worst performing ssds i have ever seen, even on clients.
you may use it as cold storage, but anything warm or hot will perform terrible on it.
more so if its used with zfs/ceph.
even...
Additionally, with just 3 nodes in a ceph cluster, make sure you have at least 4 OSDs in each. Because with just 2 per node, you will likely have issues if one of the OSDs fails. As then Ceph will recover the lost replicas to the only node it can...
Those are on the cheaper and slower side of consumer SSDs. They will not perform well with sustained load and the primarily sync writes that Ceph does.
The recommendation for enterprise SSDs with power loss protection (PLP) is there for good...
Try to update the firmware of your Samsung drives.
Try to disable PCIe ASPM in BIOS or any other power saving feature.
Check if the drives overheat -> improve cooling.
Could also be an issue with the pcie -> m.2 adapter ...
After all we were unable to find a satisfactory solution with the ring topology. We kept experiencing the same issues over and over again.
We have now switched to a star topology using a 10 Gbit switch. Since then, the problems have disappeared...
Thanks for the detailed network diagram. Pulling together everything you've shared, I think the picture is fairly clear now -- though some targeted measurements would still help confirm the timing.
The core reason VMs freeze
During Ceph...
Hi!
we're working on the VRF support for fabrics and the ability to leak into VRFs from other VRFs.
We thought about adding a GUI for sysctls a while back, maybe you could create a feature request on our bugzilla (https://bugzilla.proxmox.com/)?
move your cluster_network ip on the 100Gb too. (cluster_network is used for osd replication when defined, so it's limiting your write speed)
auto nic4_100G00
iface nic4_100G00 inet static
address 10.180.194.211/24
address...
Either use only the public network or move the cluster network to one of the other 100G ports (on all nodes, of course).
[0] https://docs.ceph.com/en/squid/rados/configuration/network-config-ref
Hello Tom,
Thanks for the details. The interoperability between both technologies is much clearer to me now. We will assess if it makes sense to proceed with building the plugin to implement the integration