Search results

  1. aaron

    syslog getting spammed with "notice: RRD update error"s

    Hmm, those versions look new enough. Can you please restart the pvestatd service on the hosts? Either in the Node→System panel, or with systemctl restart pvestatd Does that help to get rid the log messages?
  2. aaron

    [TUTORIAL] [High Availability] Watchdog reboots

    Yeah. I assume you have one interface for everything on the hosts, that goes to the switch, right? The single point of failure there is the switch. If you can add a direct cable between the hosts, without a switch in between, you can configure a second IP subnet on it and add it as second...
  3. aaron

    [TUTORIAL] [High Availability] Watchdog reboots

    Well, those long timeouts are most likely the explanation. If corosync takes too long to form a new quorum with just the QDevice, it might take longer than the 60s timeout of the LRM! Please set it back to defaults, from one of my test clusters: quorum { provider: corosync_votequorum } totem...
  4. aaron

    [TUTORIAL] [High Availability] Watchdog reboots

    Can you please post your /etc/pve/corosync.conf file? And make sure that the /etc/pve/corosync.conf and /etc/corosync/corosync.conf files are the same.
  5. aaron

    how P2V in Proxmox ?

    Any tool that allows booting a live system on the physical host and the target VM to transfer disk contents. That can be just a regular Linux live system and dd + ssh on both sides. Or something more guided like Clonezilla. There are surely also other more commercial tools out there that can do...
  6. aaron

    PVE9 Memory management problems

    Has that host been installed a while ago? Because, IIRC, since about 8.1, the installer limits the ARC by default. If you installed earlier, you can manually set a limit on the ARC: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_limit_memory_usage What has changed with the...
  7. aaron

    [TUTORIAL] [High Availability] Watchdog reboots

    yeah, if you can do another test, I am interested in the pvecm status output of node pve/Node1 a few seconds after you disconnected pve1/Node2, but before it will eventually fence itself (if something is wrong). I think, that info is not yet in the previous posts, unless I missed it. Because...
  8. aaron

    [TUTORIAL] [High Availability] Watchdog reboots

    There is a misunderstanding in how fencing works. It is handled by the LRM on each node. If it is in "active" mode and the host lost the connection to the quorum for more than 60 seconds, it will not renew the watchdog. Once the watchdog runs out, the host will reboot/fence. So you would see a...
  9. aaron

    [TUTORIAL] [High Availability] Watchdog reboots

    Can you disable the HA resource, wait ~10 minutes until all LRMs are idle, and then do the following please? With no active LRM, the nodes won't fence. 1. get pvecm status while all nodes are up and working 2. disconnect one of the nodes 3. get pvecm status from the node that can still talk to...
  10. aaron

    Low Ceph Performance on 3-Node Proxmox 9 Cluster with SATA SSDs

    not great, because the Ceph Cluster network is only used for the replication between the OSDs. Everything else, also IO from the guests to their virtual disks, is going via the Ceph Public network. Which is probably why you see the rather meh performance in the benchmark, as the benchmark client...
  11. aaron

    Low Ceph Performance on 3-Node Proxmox 9 Cluster with SATA SSDs

    Does the Ceph network actually provide 10Gbit? Check with ethtool if they autonegiotated to the expected speed, and if, run some iperf tests between the nodes to see how much bandwidth it can provide. Are both, Ceph Public and Ceph Cluster network on a 10Gbit Link or is one on a slower link?
  12. aaron

    syslog getting spammed with "notice: RRD update error"s

    Hmm, can you post the output of pveversion -v on the host where you have/had guest 130? Ideally inside of tags (or use the formatting options in the buttons above (</> for a code block) Is guest 130 a VM or a CT?
  13. aaron

    Lost data in ceph

    In the ceph pg dump_stuck output you have the columns ACTING_PRIMARY, where the replica(s) currently are. and UP_PRIMARY, where they should be.
  14. aaron

    syslog getting spammed with "notice: RRD update error"s

    okay. that is curious. are all guests powered on or are some powered off? For example, guest VMID 130 in that error message from the first post. Was it on or off at that time?
  15. aaron

    [SOLVED] VE 9 - Keine Netzwerkverbindung nach Einbau einer NVME-Festplatte

    Weils im englischen Forum auch gerade vorkam, hier meine Antwort dort mit ein paar Details bez. des jetzt einfacheren Pinnings: https://forum.proxmox.com/threads/network-drops-on-boot.65210/#post-793255
  16. aaron

    [SOLVED] Network drops on boot

    Since the Proxmox VE 9 release, and I think in the very latest 8.4, there is now the pve-network-interface-pinning tool. This makes it a lot easier to ping NICs to a specific name. And you can even choose a more fitting name. For example enphys0 or enguest0. This avoids automatic renames if the...
  17. aaron

    syslog getting spammed with "notice: RRD update error"s

    Hey, are all nodes running on Proxmox VE 9 by now? If so, do you see files for all guests (VMs and CTs) on all hosts in the /var/lib/rrdcached/db/pve-vm-9.0 directory?
  18. aaron

    Lost data in ceph

    Interesting, even though you set a size/min_size of 2/2, (better would be 3/2, but needs more space), many PGs currently only have on replica o_O. All affected PGs want to be on OSD 5 with one replica, but apparently can't. Have you tried restarting OSD 5? Then, what is it with the different...
  19. aaron

    ACME with custom validation script

    not that I am aware of, but others might know more. Ideally you could contribute an integration with your DNS provider upstream to acme.sh. Then it will also be available in Proxmox VE.