Search results

  1. L

    Node went down - unclear why - log attached

    That's inconsequential. That Node was down and had started up, but PBS1 and InfluxDB were not started yet. I have attached all the records from journalctl between 12:00 and the completed shutdown of the Node. I don't see any reason why the Node shutdown. It looks like an orderly shutdown so...
  2. L

    Node went down - unclear why - log attached

    We had a node go down two days ago and I'm at a loss figuring out why. I attached the log. This happened at 12:30. The other nodes simply show that the OSD's when down and feverishly started rebalancing the cluster. Is there any indication as to why? Sep 8 12:29:56 FT1-NodeA...
  3. L

    NVMe OSD generates crc error. Failing drive?

    I have relatively new Samsung Enterprise NVMe in a node that is generating the following error: ... 2025-08-26T15:56:43.870+0200 7fe8ac968700 0 bad crc in data 3326000616 != exp 1246001655 from v1:192.168.131.4:0/1799093090 2025-08-26T16:03:54.757+0200 7fe8ad96a700 0 bad crc in data...
  4. L

    enabling ceph image replication: how to set up host addresses?

    No, @birdflewza. I didn't pursue this any further, since the customer that requested it didn't want it anymore. It's on our list though, so we'll visit this again some time.
  5. L

    CPU frequency

    Proxmox 7 also runs in "Performance mode", so unless you're on an older version this tweak should not be necessary. You can check this with the following: cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor performance performance ...
  6. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Yes, we have checked that in great detail. The VLAN's on the Mellanox switch all have all the active ports joined to every VLAN, so no matter where the VM runs, the VLANs are active there.
  7. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Indeed, some VMs crashed. However, the 2 pfSense VMs are 100 and 101 and neither crashed. That was the first thing I checked for in the logs. The reason for taking the nodes down was exactly that: We doubled the RAM in each node.
  8. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Logs of the 10 Nev attached. The first node to be shutdown was NodeC at about 12:40, then NodeD, then A and last B
  9. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    That's why I have two instances of pfSense who poll each other with CARP. If I shut one down, the other takes over within seconds. So it's not that. The VM's on the nodes stay on, but they don't communicate with the control plane anymore as far as I can tell. So if I check the logs on a...
  10. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Hmm... that is the only clue I have been able to find about what happens. Or maybe it's unrelated then?
  11. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    I think I have eventually found a possible source of the problem: When I shut down a node, although I have disable most of the ceph rebalancing and checking functions, the kernel crashes due to lack of memory. We have now doubled the amount of RAM, so I don't believe it will happen again. Nov...
  12. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    On a 4 cluster Proxmox installation, when one node is shut down, access to the network on the others goes away somehow. Here is configuration: Each node is set up similarly, but with the LAN, corosync and other address changed with each node. The enlan2.25 and enlan2.35 are legacy setups...
  13. L

    Windows Server licensing

    It's been a couple of years, but the issue is still the same. I'd like some clarification on this please. If I have 4 nodes on a pmx cluster with 2 x 10 core cpu's in each and I want to license for example a Windows Server 2022, then would I have to: Pay MS for a license for 16 cores for...
  14. L

    enabling ceph image replication: how to set up host addresses?

    The file rbd-mirror-capetown.conf contains the config of the capetown cluster on the remote cluster, so from that I assume that I have to create a VPN link between the two sites so that the replication service on the LAN at the remote site is able to get to the local / LAN address given in that...
  15. L

    enabling ceph image replication: how to set up host addresses?

    I'm attempting to do a test to replicate a ceph image to a remote cluster by following this HOWTO. However, what I'm missing is the detail of how or where to specify where "site-a" is in the examples given in terms of ip address. When I follow the instructions, I see this in the status logs...
  16. L

    [SOLVED] 2 stuck OSD's in ceph database

    I recreated the manager on a node (after deleting all the managers) and that resolved the issue, so I can now add the OSD's again.
  17. L

    [SOLVED] 2 stuck OSD's in ceph database

    That just hangs, since the osd's were on a node that doesn't exist anymore. Here is also says :~# pveceph osd destroy 1 OSD osd.1 does not belong to node pmx2! at /usr/share/perl5/PVE/API2/Ceph/OSD.pm line 952, <DATA> line 960. This zapped the osd's, but they are still shown in the ceph...
  18. L

    [SOLVED] 2 stuck OSD's in ceph database

    I tried to remove all OSD's from a cluster and recreate them, but 2 of them are still stuck in the ceph configuration database. I have done all the standard commands to remove them, but the reference stays. # ceph osd crush remove osd.1 removed item id 1 name 'osd.1' from crush map # ceph osd...
  19. L

    New install pve 8.2 on Debian 12 certificate blocks GUI

    # cat /etc/hosts 127.0.0.1 localhost 154.65.99.47 pmx1 ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters # pvecm updatecerts --force (re)generate node files generate new node certificate merge authorized SSH keys creating directory...
  20. L

    New install pve 8.2 on Debian 12 certificate blocks GUI

    This host gets a dynamic ip address as per the cloud provider's settings. Do I have to have the address set in the hosts file? inet 154.65.99.47/20 metric 100 brd 154.65.111.255 scope global dynamic ens3