Recent content by lifeboy

  1. L

    Node went down - unclear why - log attached

    That's inconsequential. That Node was down and had started up, but PBS1 and InfluxDB were not started yet. I have attached all the records from journalctl between 12:00 and the completed shutdown of the Node. I don't see any reason why the Node shutdown. It looks like an orderly shutdown so...
  2. L

    Node went down - unclear why - log attached

    We had a node go down two days ago and I'm at a loss figuring out why. I attached the log. This happened at 12:30. The other nodes simply show that the OSD's when down and feverishly started rebalancing the cluster. Is there any indication as to why? Sep 8 12:29:56 FT1-NodeA...
  3. L

    NVMe OSD generates crc error. Failing drive?

    I have relatively new Samsung Enterprise NVMe in a node that is generating the following error: ... 2025-08-26T15:56:43.870+0200 7fe8ac968700 0 bad crc in data 3326000616 != exp 1246001655 from v1:192.168.131.4:0/1799093090 2025-08-26T16:03:54.757+0200 7fe8ad96a700 0 bad crc in data...
  4. L

    enabling ceph image replication: how to set up host addresses?

    No, @birdflewza. I didn't pursue this any further, since the customer that requested it didn't want it anymore. It's on our list though, so we'll visit this again some time.
  5. L

    CPU frequency

    Proxmox 7 also runs in "Performance mode", so unless you're on an older version this tweak should not be necessary. You can check this with the following: cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor performance performance ...
  6. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Yes, we have checked that in great detail. The VLAN's on the Mellanox switch all have all the active ports joined to every VLAN, so no matter where the VM runs, the VLANs are active there.
  7. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Indeed, some VMs crashed. However, the 2 pfSense VMs are 100 and 101 and neither crashed. That was the first thing I checked for in the logs. The reason for taking the nodes down was exactly that: We doubled the RAM in each node.
  8. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Logs of the 10 Nev attached. The first node to be shutdown was NodeC at about 12:40, then NodeD, then A and last B
  9. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    That's why I have two instances of pfSense who poll each other with CARP. If I shut one down, the other takes over within seconds. So it's not that. The VM's on the nodes stay on, but they don't communicate with the control plane anymore as far as I can tell. So if I check the logs on a...
  10. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Hmm... that is the only clue I have been able to find about what happens. Or maybe it's unrelated then?
  11. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    I think I have eventually found a possible source of the problem: When I shut down a node, although I have disable most of the ceph rebalancing and checking functions, the kernel crashes due to lack of memory. We have now doubled the amount of RAM, so I don't believe it will happen again. Nov...
  12. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    On a 4 cluster Proxmox installation, when one node is shut down, access to the network on the others goes away somehow. Here is configuration: Each node is set up similarly, but with the LAN, corosync and other address changed with each node. The enlan2.25 and enlan2.35 are legacy setups...
  13. L

    Windows Server licensing

    It's been a couple of years, but the issue is still the same. I'd like some clarification on this please. If I have 4 nodes on a pmx cluster with 2 x 10 core cpu's in each and I want to license for example a Windows Server 2022, then would I have to: Pay MS for a license for 16 cores for...
  14. L

    enabling ceph image replication: how to set up host addresses?

    The file rbd-mirror-capetown.conf contains the config of the capetown cluster on the remote cluster, so from that I assume that I have to create a VPN link between the two sites so that the replication service on the LAN at the remote site is able to get to the local / LAN address given in that...
  15. L

    enabling ceph image replication: how to set up host addresses?

    I'm attempting to do a test to replicate a ceph image to a remote cluster by following this HOWTO. However, what I'm missing is the detail of how or where to specify where "site-a" is in the examples given in terms of ip address. When I follow the instructions, I see this in the status logs...