Search results

  1. B

    [SOLVED] One node in cluster brings everything down.

    Issue resolved. Is seems that even if I had 0.140ms on the corosync network, having a 2-3-5% packet loss was the source of the issue. I've replaced the link and magically everything healed and worked again.
  2. B

    Adding node #10 makes puts whole cluster out of order

    Hi, I'm having the same problem: https://forum.proxmox.com/threads/one-node-in-cluster-brings-everything-down.128862/ I suspect the source of the problem the inability to write on the /etc/pve folder after the node boots and joins the cluster. Is there a way to write there files when the...
  3. B

    [SOLVED] One node in cluster brings everything down.

    So, I have a 14 node cluster. We had a switch failure and we had to move all the frontend networking (public and pve cluster) endpoints to a backup switch and after a couple of days, we moved them back. Now, one node from the cluster misbehaves, I can't write to /etc/pve folder. I've...
  4. B

    Ceph Dashboard (RADOS GW management problem)

    I'm having the same behavior, dashboard works except the object storage - and it worked for a few months and now I can't get it to work again. Error on dashboard: The Object Gateway Service is not configured Error connecting to Object Gateway: RGW REST API failed request with status code 403...
  5. B

    Backup - libceph: read_partial_message 00000000df61d3e0 signature check failed

    Ah.. by "pinning" I mean forcing the grub menu to load that specific version: first, install the package: apt install pve-kernel-5.11.22-7-pve then update the grub menu to load it: GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve" after...
  6. B

    Backup - libceph: read_partial_message 00000000df61d3e0 signature check failed

    you don't have the package available in the repo ?
  7. B

    Backup - libceph: read_partial_message 00000000df61d3e0 signature check failed

    I have Debian + PVE on top so your config may differ. Install kernel : apt install pve-kernel-5.11.22-7-pve Update /etc/default/grub line: GRUB_DEFAULT="Advanced options for Proxmox VE GNU/Linux>Proxmox VE GNU/Linux, with Linux 5.11.22-7-pve" update-grub, reboot. also, don't forget to...
  8. B

    Bluestore SSD wear

    Yes, you are right - the replica was 2/2 - so if I set 2/1 - I should still have access to the data.
  9. B

    Bluestore SSD wear

    I'm doing it from scratch. Nevertheless, my question remains - on a three node cluster, with replica 2 - if one node goes down (let's say unrecoverable) - I'm unable to access the buckets from the object storage.
  10. B

    Bluestore SSD wear

    This is a new cluster that I will use for storing cold data. All the data that's on the cluster I have it nearby on a different Ceph, that's performing as expected (with some question marks regarding the R/W ratios) root@pve--1:~# ceph -s cluster: id...
  11. B

    Bluestore SSD wear

    Right now i'm rebalancing the cluster, I've marked out & stopped all OSD's from one node - the problem is that even if I have 10G link between the nodes, recovery speed is slow (110MiB/s) - max_backfills 8, osd_recovery_max_active 16 After the cluster is healthy I will go to the next server and...
  12. B

    Bluestore SSD wear

    I don't fully understand the concept of "4-6 OSD's per WAL/DB" disk, You mean that on a single SSD , I should partition/assign only 4-6 OSD's ? For this setup, I put 35 OSD's per WAL/DB ssd disk :) For other clusters, I put 6 OSD's per NVME. I use two Intel Datacenter 960Gb ssd that I had...
  13. B

    Bluestore SSD wear

    Hi, I have a three node cluster, each node has 35 OSD + 1 SSD as bluestore db. replica 2, failover host. The cluster is used for cold-storage - and i'm starting to dump data on it. I know that my bottleneck is the single SSD on each node - all the incoming traffic will hit it. Now, my...
  14. B

    Proxmox Ceph without PVE Cluster

    Thank you , all clear. What's the latency where corosync behaves badly ?
  15. B

    Proxmox Ceph without PVE Cluster

    @mr44er as I understand, there are issues with large pve clusters - over 16 machines - due to increased chatter on the cluster vlan. my scope here is to increase the ceph cluster without increasing the PVE cluster
  16. B

    Proxmox Ceph without PVE Cluster

    Hi, I have a Proxmox Cluster , 14 nodes - with Ceph. Is it possible to add additional Ceph members (with PVE installed) that are not members of the clusters ? I wonder real life setups of large clusters - if there are any issues/
  17. B

    [SOLVED] pve 7.3-3 - lxc template ubuntu 22.04 not starting

    Hi, Fresh install debian+pve from wiki ubuntu 22.04 downloaded from the templates page. lxc-start -n 100 --logfile x.x --logpriority=DEBUG --foreground cat x.x INFO confile - ../src/lxc/confile.c:set_config_idmaps:2267 - Read uid map: type u nsid 0 hostid 100000 range 65536 INFO...
  18. B

    Node freeze with 5.15.xx kernel

    Hi, I have a few nodes on a cluster with latest pve - running k8s in kvm's with Centos7 When running high cpu tasks, the node freezes without errors, only a hard reboot resolves this. Hardware is different - so the same behavior on different hw hosts. I've downgraded the kernel to 5.11.22-7...