Search results

  1. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Yes, we have checked that in great detail. The VLAN's on the Mellanox switch all have all the active ports joined to every VLAN, so no matter where the VM runs, the VLANs are active there.
  2. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Indeed, some VMs crashed. However, the 2 pfSense VMs are 100 and 101 and neither crashed. That was the first thing I checked for in the logs. The reason for taking the nodes down was exactly that: We doubled the RAM in each node.
  3. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Logs of the 10 Nev attached. The first node to be shutdown was NodeC at about 12:40, then NodeD, then A and last B
  4. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    That's why I have two instances of pfSense who poll each other with CARP. If I shut one down, the other takes over within seconds. So it's not that. The VM's on the nodes stay on, but they don't communicate with the control plane anymore as far as I can tell. So if I check the logs on a...
  5. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    Hmm... that is the only clue I have been able to find about what happens. Or maybe it's unrelated then?
  6. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    I think I have eventually found a possible source of the problem: When I shut down a node, although I have disable most of the ceph rebalancing and checking functions, the kernel crashes due to lack of memory. We have now doubled the amount of RAM, so I don't believe it will happen again. Nov...
  7. L

    Perplexing: When a node is turned off, the whole cluster looses it's network

    On a 4 cluster Proxmox installation, when one node is shut down, access to the network on the others goes away somehow. Here is configuration: Each node is set up similarly, but with the LAN, corosync and other address changed with each node. The enlan2.25 and enlan2.35 are legacy setups...
  8. L

    Windows Server licensing

    It's been a couple of years, but the issue is still the same. I'd like some clarification on this please. If I have 4 nodes on a pmx cluster with 2 x 10 core cpu's in each and I want to license for example a Windows Server 2022, then would I have to: Pay MS for a license for 16 cores for...
  9. L

    enabling ceph image replication: how to set up host addresses?

    The file rbd-mirror-capetown.conf contains the config of the capetown cluster on the remote cluster, so from that I assume that I have to create a VPN link between the two sites so that the replication service on the LAN at the remote site is able to get to the local / LAN address given in that...
  10. L

    enabling ceph image replication: how to set up host addresses?

    I'm attempting to do a test to replicate a ceph image to a remote cluster by following this HOWTO. However, what I'm missing is the detail of how or where to specify where "site-a" is in the examples given in terms of ip address. When I follow the instructions, I see this in the status logs...
  11. L

    [SOLVED] 2 stuck OSD's in ceph database

    I recreated the manager on a node (after deleting all the managers) and that resolved the issue, so I can now add the OSD's again.
  12. L

    [SOLVED] 2 stuck OSD's in ceph database

    That just hangs, since the osd's were on a node that doesn't exist anymore. Here is also says :~# pveceph osd destroy 1 OSD osd.1 does not belong to node pmx2! at /usr/share/perl5/PVE/API2/Ceph/OSD.pm line 952, <DATA> line 960. This zapped the osd's, but they are still shown in the ceph...
  13. L

    [SOLVED] 2 stuck OSD's in ceph database

    I tried to remove all OSD's from a cluster and recreate them, but 2 of them are still stuck in the ceph configuration database. I have done all the standard commands to remove them, but the reference stays. # ceph osd crush remove osd.1 removed item id 1 name 'osd.1' from crush map # ceph osd...
  14. L

    New install pve 8.2 on Debian 12 certificate blocks GUI

    # cat /etc/hosts 127.0.0.1 localhost 154.65.99.47 pmx1 ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters # pvecm updatecerts --force (re)generate node files generate new node certificate merge authorized SSH keys creating directory...
  15. L

    New install pve 8.2 on Debian 12 certificate blocks GUI

    This host gets a dynamic ip address as per the cloud provider's settings. Do I have to have the address set in the hosts file? inet 154.65.99.47/20 metric 100 brd 154.65.111.255 scope global dynamic ens3
  16. L

    New install pve 8.2 on Debian 12 certificate blocks GUI

    # cat /etc/hosts 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters
  17. L

    New install pve 8.2 on Debian 12 certificate blocks GUI

    I have done fresh install on a Debian 12 cloud host and all went well I thought, except that port 8006 is not responding. (I followed the documentation here) I the logs I find this: Jun 04 17:52:23 pmx1 pveproxy[12734]: /etc/pve/local/pve-ssl.pem: failed to use local certificate chain...
  18. L

    Windows Server 2022 reports disk errors on ceph volume

    Sorry, it's now called retrim, which differs from defrag. However, it still kills the machine when doing it on a 1TB drive. It's NVMe storage, can't be too slow. It doesn't happen on any other machine, only on the new WS 2022 installation.
  19. L

    Windows Server 2022 reports disk errors on ceph volume

    The driver update made no difference. However, the scsi-virtio driver allows thin provisioning of the disk volume, which is what we use, so Windows starts a defrag (Edit: it's actually an "optimization") once a week by default and it uses 75% of the available RAM. When some user also log on...
  20. L

    Windows Server 2022 reports disk errors on ceph volume

    We installed a new Windows server 2022 on a cluster that uses an SSD-based ceph volume. All seems to be going well, when suddenly windows event log reports: "An error was detected on device \Device\Harddisk0\DR0 during a paging operation" It's Windows error # 51 There are other Windows...

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!