Issue resolved.
Is seems that even if I had 0.140ms on the corosync network, having a 2-3-5% packet loss was the source of the issue.
I've replaced the link and magically everything healed and worked again.
Hi,
I'm having the same problem: https://forum.proxmox.com/threads/one-node-in-cluster-brings-everything-down.128862/
I suspect the source of the problem the inability to write on the /etc/pve folder after the node boots and joins the cluster.
Is there a way to write there files when the...
So, I have a 14 node cluster.
We had a switch failure and we had to move all the frontend networking (public and pve cluster) endpoints to a backup switch and after a couple of days, we moved them back.
Now, one node from the cluster misbehaves, I can't write to /etc/pve folder.
I've...
I'm having the same behavior, dashboard works except the object storage - and it worked for a few months and now I can't get it to work again.
Error on dashboard:
The Object Gateway Service is not configured
Error connecting to Object Gateway: RGW REST API failed request with status code 403...
Ah.. by "pinning" I mean forcing the grub menu to load that specific version:
first, install the package:
apt install pve-kernel-5.11.22-7-pve
then update the grub menu to load it:
GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve"
after...
I have Debian + PVE on top so your config may differ.
Install kernel :
apt install pve-kernel-5.11.22-7-pve
Update /etc/default/grub line:
GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve"
update-grub, reboot.
also, don't forget to...
I'm doing it from scratch.
Nevertheless, my question remains - on a three node cluster, with replica 2 - if one node goes down (let's say unrecoverable) - I'm unable to access the buckets from the object storage.
This is a new cluster that I will use for storing cold data. All the data that's on the cluster I have it nearby on a different Ceph, that's performing as expected (with some question marks regarding the R/W ratios)
root@pve--1:~# ceph -s
cluster:
id...
Right now i'm rebalancing the cluster, I've marked out & stopped all OSD's from one node - the problem is that even if I have 10G link between the nodes, recovery speed is slow (110MiB/s) - max_backfills 8, osd_recovery_max_active 16
After the cluster is healthy I will go to the next server and...
I don't fully understand the concept of "4-6 OSD's per WAL/DB" disk,
You mean that on a single SSD , I should partition/assign only 4-6 OSD's ?
For this setup, I put 35 OSD's per WAL/DB ssd disk :)
For other clusters, I put 6 OSD's per NVME.
I use two Intel Datacenter 960Gb ssd that I had...
Hi,
I have a three node cluster, each node has 35 OSD + 1 SSD as bluestore db.
replica 2, failover host.
The cluster is used for cold-storage - and i'm starting to dump data on it.
I know that my bottleneck is the single SSD on each node - all the incoming traffic will hit it.
Now, my...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.