aaron's latest activity

  • aaron
    Thanks. That looks good and is as it should be. So I will have to take a look at the code that is receiving and processing that data.
  • aaron
    This is curious. Would it be okay for you to gather a bit more information? Because it seems that for some reason, the pvestatd service still collects and distributes the old pre PVE 9 metric format, but under the new key... So to further see...
  • aaron
    Siehe auch https://pve.proxmox.com/wiki/Upgrade_from_8_to_9#VM_Memory_Consumption_Shown_is_Higher
  • aaron
    aaron replied to the thread 100% Swap Usage.
    Swap is more than just an escape for low memory: https://chrisdown.name/2018/01/02/in-defence-of-swap.html But given that the host has ~185G of memory, you could consider disabling swap, as that is a lot of memory, and if you run out of memory...
  • aaron
    Hmm, those versions look new enough. Can you please restart the pvestatd service on the hosts? Either in the Node→System panel, or with systemctl restart pvestatd Does that help to get rid the log messages?
  • aaron
    Yeah. I assume you have one interface for everything on the hosts, that goes to the switch, right? The single point of failure there is the switch. If you can add a direct cable between the hosts, without a switch in between, you can configure a...
  • aaron
    Well, those long timeouts are most likely the explanation. If corosync takes too long to form a new quorum with just the QDevice, it might take longer than the 60s timeout of the LRM! Please set it back to defaults, from one of my test clusters...
  • aaron
    Can you please post your /etc/pve/corosync.conf file? And make sure that the /etc/pve/corosync.conf and /etc/corosync/corosync.conf files are the same.
  • aaron
    aaron replied to the thread how P2V in Proxmox ?.
    Any tool that allows booting a live system on the physical host and the target VM to transfer disk contents. That can be just a regular Linux live system and dd + ssh on both sides. Or something more guided like Clonezilla. There are surely also...
  • aaron
    aaron replied to the thread PVE9 Memory management problems.
    Has that host been installed a while ago? Because, IIRC, since about 8.1, the installer limits the ARC by default. If you installed earlier, you can manually set a limit on the ARC...
  • aaron
    yeah, if you can do another test, I am interested in the pvecm status output of node pve/Node1 a few seconds after you disconnected pve1/Node2, but before it will eventually fence itself (if something is wrong). I think, that info is not yet in...
  • aaron
    There is a misunderstanding in how fencing works. It is handled by the LRM on each node. If it is in "active" mode and the host lost the connection to the quorum for more than 60 seconds, it will not renew the watchdog. Once the watchdog runs...
  • aaron
    Can you disable the HA resource, wait ~10 minutes until all LRMs are idle, and then do the following please? With no active LRM, the nodes won't fence. 1. get pvecm status while all nodes are up and working 2. disconnect one of the nodes 3. get...
  • aaron
    Looks a lot better :)
  • aaron
    not great, because the Ceph Cluster network is only used for the replication between the OSDs. Everything else, also IO from the guests to their virtual disks, is going via the Ceph Public network. Which is probably why you see the rather meh...
  • aaron
    Does the Ceph network actually provide 10Gbit? Check with ethtool if they autonegiotated to the expected speed, and if, run some iperf tests between the nodes to see how much bandwidth it can provide. Are both, Ceph Public and Ceph Cluster...
  • aaron
    Hmm, can you post the output of pveversion -v on the host where you have/had guest 130? Ideally inside of tags (or use the formatting options in the buttons above (</> for a code block) Is guest 130 a VM or a CT?
  • aaron
    aaron replied to the thread Lost data in ceph.
    In the ceph pg dump_stuck output you have the columns ACTING_PRIMARY, where the replica(s) currently are. and UP_PRIMARY, where they should be.
  • aaron
    okay. that is curious. are all guests powered on or are some powered off? For example, guest VMID 130 in that error message from the first post. Was it on or off at that time?
  • aaron
    Weils im englischen Forum auch gerade vorkam, hier meine Antwort dort mit ein paar Details bez. des jetzt einfacheren Pinnings: https://forum.proxmox.com/threads/network-drops-on-boot.65210/#post-793255