Recent content by brosky

  1. B

    [SOLVED] One node in cluster brings everything down.

    Issue resolved. Is seems that even if I had 0.140ms on the corosync network, having a 2-3-5% packet loss was the source of the issue. I've replaced the link and magically everything healed and worked again.
  2. B

    Adding node #10 makes puts whole cluster out of order

    Hi, I'm having the same problem: https://forum.proxmox.com/threads/one-node-in-cluster-brings-everything-down.128862/ I suspect the source of the problem the inability to write on the /etc/pve folder after the node boots and joins the cluster. Is there a way to write there files when the...
  3. B

    [SOLVED] One node in cluster brings everything down.

    So, I have a 14 node cluster. We had a switch failure and we had to move all the frontend networking (public and pve cluster) endpoints to a backup switch and after a couple of days, we moved them back. Now, one node from the cluster misbehaves, I can't write to /etc/pve folder. I've...
  4. B

    Ceph Dashboard (RADOS GW management problem)

    I'm having the same behavior, dashboard works except the object storage - and it worked for a few months and now I can't get it to work again. Error on dashboard: The Object Gateway Service is not configured Error connecting to Object Gateway: RGW REST API failed request with status code 403...
  5. B

    Backup - libceph: read_partial_message 00000000df61d3e0 signature check failed

    Ah.. by "pinning" I mean forcing the grub menu to load that specific version: first, install the package: apt install pve-kernel-5.11.22-7-pve then update the grub menu to load it: GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve" after...
  6. B

    Backup - libceph: read_partial_message 00000000df61d3e0 signature check failed

    you don't have the package available in the repo ?
  7. B

    Backup - libceph: read_partial_message 00000000df61d3e0 signature check failed

    I have Debian + PVE on top so your config may differ. Install kernel : apt install pve-kernel-5.11.22-7-pve Update /etc/default/grub line: GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve" update-grub, reboot. also, don't forget to...
  8. B

    Bluestore SSD wear

    Yes, you are right - the replica was 2/2 - so if I set 2/1 - I should still have access to the data.
  9. B

    Bluestore SSD wear

    I'm doing it from scratch. Nevertheless, my question remains - on a three node cluster, with replica 2 - if one node goes down (let's say unrecoverable) - I'm unable to access the buckets from the object storage.
  10. B

    Bluestore SSD wear

    This is a new cluster that I will use for storing cold data. All the data that's on the cluster I have it nearby on a different Ceph, that's performing as expected (with some question marks regarding the R/W ratios) root@pve--1:~# ceph -s cluster: id...
  11. B

    Bluestore SSD wear

    Right now i'm rebalancing the cluster, I've marked out & stopped all OSD's from one node - the problem is that even if I have 10G link between the nodes, recovery speed is slow (110MiB/s) - max_backfills 8, osd_recovery_max_active 16 After the cluster is healthy I will go to the next server and...
  12. B

    Bluestore SSD wear

    I don't fully understand the concept of "4-6 OSD's per WAL/DB" disk, You mean that on a single SSD , I should partition/assign only 4-6 OSD's ? For this setup, I put 35 OSD's per WAL/DB ssd disk :) For other clusters, I put 6 OSD's per NVME. I use two Intel Datacenter 960Gb ssd that I had...
  13. B

    Bluestore SSD wear

    Hi, I have a three node cluster, each node has 35 OSD + 1 SSD as bluestore db. replica 2, failover host. The cluster is used for cold-storage - and i'm starting to dump data on it. I know that my bottleneck is the single SSD on each node - all the incoming traffic will hit it. Now, my...
  14. B

    Proxmox Ceph without PVE Cluster

    Thank you , all clear. What's the latency where corosync behaves badly ?