Search results

  1. M

    Watchdog Reboots

    For my side, had another reboot this friday, 2 PVE hosts. We traced the issue to our ceph HDD pool which is being slow at that moment. Were moving some copy/sync which is going to the HDD pool to be more stretched out and not all hosts hitting it at once, but it still feels silly a node has to...
  2. M

    Watchdog Reboots

    The current version of pve-ha-manager is 5.1.0 which does not contain any of these patches. Also there is no 'testing' version available yet, the patches seem a bit much to all do manual, do we have any timeline when a 5.1.1 would come in testing?
  3. M

    Watchdog Reboots

    it seems, but it is absolutely not, our normal cpu load during the day is kept very low, same with memory. This 'overloaded' is caused purely by io delay on the mounted backup volume, which makes sense is slower during backup windows. This should however not cause a complete PVE node to reboot...
  4. M

    Watchdog Reboots

    `Journalctl -b -1` (previous boot log) - Cleaned up and anonymized. from ~15 min before restart. Jan 24 01:49:28 pve25 vzdump[2026007]: <root@pam> starting task UPID:pve25:001EEA18:06F4E8D3:69741718:vzdump::root@pam: Jan 24 01:49:29 pve25 vzdump[2026008]: INFO: starting new backup job: vzdump...
  5. M

    Watchdog Reboots

    Good to know this affects everyone equally :-) We have had discussion on this topic in the past on this forum, It would be nice to get a way to see the softdog status and get logging of when the watchers decide to NOT ping the watchdog for whatever reason. So far this whole thing is a big...
  6. M

    Watchdog Reboots

    We have had this same issue since replacing our intel based nodes with amd ones. Lately we have unexpected reboots at least weekly on one or more nodes. For us this always happens during the backup window (lucky?) and we see high IO delay right before the PVE host decides to shit itself. Still...
  7. M

    create a VLAN without having a physical switch or changing anything in the router

    That is unfortunate. But yes, you might want to make a 2nd bridge device, and a vm or container to act as a router between the 2 bridges.
  8. M

    create a VLAN without having a physical switch or changing anything in the router

    You often do not have to, but I would consider it good practice to do anyway so you have a clear indication of what lan these vms are on.
  9. M

    create a VLAN without having a physical switch or changing anything in the router

    Hi moshe, you have 2 options here, you enable vlan support on the first (default) bridge vmbr0. it you will put vm's on vlan 1, they will be able to communicate with router, vlan1 is the default vlan in most (all) netwerk equiptment. If you put any other vlan besides 1, they can only communicate...
  10. M

    Uploading ISO's to a different server in the cluster fails.

    This is a bit of a meh.. issue easy to work around, but would be nice to find out whats going on. Affects: at least PVE 8 & 9 - exact patch version does not matter. - File size in this case does not matter, happens with any size image. Uploading an iso to a server in the cluster that is not...
  11. M

    Opt-in Linux 6.17 Kernel for Proxmox VE 9 available on test & no-subscription

    Crosspost reply here. I also noticed problems with 6.17.2-2 which are not an issue on 6.17.2-1: https://forum.proxmox.com/threads/super-slow-timeout-and-vm-stuck-while-backing-up-after-updated-to-pve-9-1-1-and-pbs-4-0-20.176444/post-822997 On top of that, in that treat it does not look like...
  12. M

    [SOLVED] Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

    Yes, we have determined as a group the problem is on the PBS (kernel) side, affecting all versions of PVE. - Would also like to add 6.17.2-2 has problems on the PVE side, we have noticed vm disks halting randomly with 'watchers' being stuck on the ceph side. This happens with live migrations...
  13. M

    Homelab/Home Office 2-Node Proxmox Setup: When to Use PDM vs PVE-CM (Clustering)?

    No ! while this technically might work it invalidates the cluster, you shouldnt have 2 votes on a single node... Might as well install qdevice on 1 of the hosts directly (is that even possible?) What exactly do you disagree with, @SInisterPisces can build a valid 2 node cluster, and there for...
  14. M

    Homelab/Home Office 2-Node Proxmox Setup: When to Use PDM vs PVE-CM (Clustering)?

    In our internal documentation, a 'just' 2 node setup is not recommended. The recommendation we do, 2 PVE servers + 1 PBS server (hardware) or something else that can act as a voting node: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_qdevice_technical_overview In all cases we end up...
  15. M

    [SOLVED] Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

    Yes I agree with fabian here, if you look at your 'read' speed that is stable, and it is scanning the full disk to find the blocks to backup which can result in parts of the backup actually writing 0 bytes and in nearly all cases itll write less then it will read. This looks normal.
  16. M

    [SOLVED] Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

    @ Staff. I see the 6.17 kernel is still in the enterprise repository for PBS.. with these problems resulting in broken VM disks i'd have expected that to be pulled for now until a fix is available. Right now if I, or anyone would update in a simular situation their env blows up.
  17. M

    [SOLVED] Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

    it is in my earlier message, but our main production clusters we have both 9 and 8 versions, both fully updated as of last weekend. All PBS is 4 though (4.0 with the 6.14 kernel)
  18. M

    [SOLVED] Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

    Can confirm downgrading the kernel worked for us, no more hanging backups or broken vms.
  19. M

    [SOLVED] Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

    Welcome to the party Do not wait finishing upgrading your PVE cluster. PBS is the problem here so either do not upgrade PBS, or 'downgrade' PBS to the 6.14 kernel. The PVE machines are not an issue here and seem to be safe to upgrade to 9. For right now in our env I have: - Disabled all...
  20. M

    [SOLVED] Super slow, timeout, and VM stuck while backing up, after updated to PVE 9.1.1 and PBS 4.0.20

    Same issue here PBS on ZFS fully upgraded, PVE 8 and 9 (2 seperate clusters) on both sides after upgrading this weekend I woke up to a absolute shitshow this morning with vm's detached from their ceph disks, linux complaining about scsi & ext4 errors and generally a bad time. After fixing all of...