Recent content by brucexx

  1. B

    corosync entries in journalctl link: 0 is down etc.

    Update: I see some retransmits sometimes daily 1-2-3 on one or two nodes on others nothing. Sometimes it is happing daily and sometimes no corosync logs for 5 months. I actually have another cluster on that subnet (different cluster name) same version with 5 nodes configured with LACP...
  2. B

    corosync entries in journalctl link: 0 is down etc.

    I get the misalignment of the time links being down but the time frames before node fenced and after node joined are telling. Since February when we had ecc ram issue I do not see any corosync entries for "link down" and since node joined yesterday and its been 18 hours. I will try to do some...
  3. B

    corosync entries in journalctl link: 0 is down etc.

    We had a node 5 in a 6 node cluster fenced due to excessive ram ecc errors. HA worked great and all vms started on other nodes. The cluster worked with no corosync issues for last year since it was put it in production (we had ecc errors in February but ram was replaced). I should mention that...
  4. B

    3 node cluster with nodes in two diferent server rooms

    Thank you for your advice. IINerd , it is going to be a similar setup but with replication on Proxmox.
  5. B

    3 node cluster with nodes in two diferent server rooms

    Does anybody have any experience with putting cluster nodes in different server rooms ? I have several buildings and was wondering what is acceptable latency for a cluster to operate without any issues. The buildings are connected via 10Gbps fiber and latency is very low 1-2 ms. What is max...
  6. B

    pvestatd keeps restarting

    I see in that the pvestatd keeps restarting itself. I have 3 servers in 6 server cluster showing that log. See example below: pvestatd[2352]: restarting server after 72 cycles to reduce memory usage Everything seems to be working properly. How do we troubleshoot it or stop it ? Thank you...
  7. B

    Pooling local backup between two PBSs from the same datastore

    The scenario I have is with 2 Proxmox Backup servers doing sync between each other. There are no naming conflicts. In one location there is one naming convention in the other there is different one. When I pull the backups I just want to backup the the VM that are not on the other side. I just...
  8. B

    Pooling local backup between two PBSs from the same datastore

    I have two locations and backup to PBS in each location. Is it possible to sync between locations only VMs that are local to that PBS but using the same datastore ? In other words sync between locations but only VMs that are not in the other PBS datastore ? Thank you
  9. B

    Using auto scaler for the first time

    So I set it back to "on" from "warn" and now the warning of "too many pgs per old" disappeared with Health status being OK. The number of pgs is still showing 2048. Is this because of what you wrote in the first post: "The autoscaler will only start changing the pg_num automatically if the...
  10. B

    Using auto scaler for the first time

    Thank you again. I did not think about that: If you have a replicated pool with a size of 3, you will have pg_num * 3 replicas in total. 30 OSDs and with ~100 PGs per OSD -> 30 * 100 / 3 = 1000 -> 1024 , that makes more sense now. In other words I forgot about the * 3 replicas. fortunately...
  11. B

    Using auto scaler for the first time

    after exhaustive reading about pgs and how to calculate them , I decided to turn off the autoscaler (mainly because it can start rebalancing during the business hours) , I set the pgs to 2048 (was tempted to use 4096 per the 100pgs per ssd) . I have enough resources CPU/RAM to handle the OSDs...
  12. B

    Using auto scaler for the first time

    Thank you Aaron this is all good advice. Should the target ration be increased gradually , lets say from 0.0 now to 0.2 , then 0.5 then 0.7etc. up to 1. I assume that 1 is the final ratio in my case as this is going to be only one pool in this cluster , is that a correct assumption? Thank you
  13. B

    Using auto scaler for the first time

    I have 1 pool and will have just one pool in this cluster. 5 node cluster, 30 osds (1.6TB drives). I might add within 2 years one node with 6 additional osds but it is not 100%. I will have about 75-80% of the cluster full. The cluster I just decommissioned had pgs assigned statically with 24...
  14. B

    Using auto scaler for the first time

    so i should turn it off and set it up manually to 1024 ? With 27% should I do this gradually and increase by 128 (or perhaps 64) and wait , then increase by another 64 or 128 ? Thank you
  15. B

    Using auto scaler for the first time

    I tested the auto scaling option on a test system and now I am using it in production. I have 5 nodes and 30 osd (all ssds), I setup the target size to 80% of the total size of the pool. Ceph shows the pool has 512 pg and that the optimal # of pgs is 1024 , the autoscaler is on , I checked...