Search results

  1. E

    Proxmox 7.4, ZFS storage doesn't show up as option for container

    ok, I feel like an idiot now. It was for the selection of the template. no issues. Thanks for the wake-up call.
  2. E

    Proxmox 7.4, ZFS storage doesn't show up as option for container

    I have a cluster of 8 nodes. ZFS storage shows up just fine as an option when creating VMs, but not when creating a container. The only storage that shows up as an option for containers is "local". I have this same config saved from an older (working) version of proxmox (6.4) and the only...
  3. E

    What now? - Please avoid using zpool upgrade on the "rpool" (root pool) itself, when upgrading to ZFS 2.0

    I have this server running PVE 6.3 with ZFS 2.0.4 that I did 'zpool upgrade -a' on and didn't expect it to be able to reboot, but it did: Now I'm wondering why it worked and if there is something specific that I can check to see if the others will also work? Is not booting with GRUB...
  4. E

    What now? - Please avoid using zpool upgrade on the "rpool" (root pool) itself, when upgrading to ZFS 2.0

    Just to clarify, does PVE with ZFS on rpool only work with UEFI boot as of PVE 6.4?
  5. E

    What now? - Please avoid using zpool upgrade on the "rpool" (root pool) itself, when upgrading to ZFS 2.0

    In the Proxmox VE 6.4 release notes known issues section it says: Please avoid using zpool upgrade on the "rpool" (root pool) itself, when upgrading to ZFS 2.0 on a system booted by GRUB in legacy mode, as that will break pool import by GRUB. ZFS 2.0.4 is also available in 6.3-6 which I'm...
  6. E

    ZFS on NVMe massive performance drop after update from Proxmox 6.2 to 6.3

    Just to conclude this thread: TL;DR proxmox 6.2-4 -> zfs 0.8.3 normal IO delay proxmox 6.3-3 -> zfs 0.8.5 has problems causing excessive IO delay, huge performance penalty proxmox 6.3-6 -> zfs 2.0.4 normal IO delay I moved all VMs to the identically configured host and had the identical...
  7. E

    ZFS on NVMe massive performance drop after update from Proxmox 6.2 to 6.3

    so https://forum.proxmox.com/threads/zfs-on-hdd-massive-performance-drop-after-update-from-proxmox-6-2-to-6-3.81820/ doesn't count? same before and after version and exactly the same problem. I will try this, thanks. Yes, it's unfortunate that I replaced a drive and upgraded at the same time...
  8. E

    ZFS on NVMe massive performance drop after update from Proxmox 6.2 to 6.3

    Hello Proxmox community, TL;DR Huge performance hit after upgrading to 6.3-3, probably having to do with IO delay. Replication from system to system will stall completely for up to a minute at a time. This graph shows the sharp rise in IO delay. The rise corresponds precisely with the...
  9. E

    [SOLVED] ZFS on HDD massive performance drop after update from Proxmox 6.2 to 6.3

    I have the EXACT same symptoms caused by upgrading to 6.3-3 from 6.2-4. That sharp rise in IO delay happened when I upgraded. same load before and after. I also started having LONG hangs when doing replication between hosts, almost a minute long. I have an extremely fast pool of NVMe...
  10. E

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    That's what I thought initially too. But the reason SSH is inaccessible is because there are files that SSHD needs that live in /etc/pve hierarchy. /etc/pve becomes inaccessible when corosync/pve-cluster stop functioning, so anything that touches it hangs. *and* restarting...
  11. E

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Hopefully this is still being looked at. I still have nodes that go offline and can be brought back by restarting corosync, then restarting pve-cluster. I have 2 4-node clusters. 1 node in each cluster has never gone offline - they have 128G RAM in them. The other 3 nodes in each cluster...
  12. E

    pvesr Call Trace: (servers going offline)

    I just figured out from this post https://forum.proxmox.com/threads/pve-5-4-11-corosync-3-x-major-issues.56124/post-262788 that I can bring my 'grey' node back online by restarting corosync and pve-cluster on all nodes in the cluster. After doing this, I can now SSH into and out of the once...
  13. E

    pvesr Call Trace: (servers going offline)

    I'm reviving this thread. I've moved the clustering network to it's own physical NIC on each node going across an isolated switch. This is did not fix the problem. I currently have 1 node that is in the 'grey' state - VMs continuing to run and function normally, but I can't SSH into or out of...
  14. E

    pvesr Call Trace: (servers going offline)

    I am using ZFS as my storage... However, the thread you referenced doesn't seem to be related. When my nodes go offline, the VMs and containers keep running just fine. My heaviest node has 83 VMs/containers that are very active on the filesystem and network. They all keep running... for days...
  15. E

    pvesr Call Trace: (servers going offline)

    Yes, all traffic is on X.Y.241 (10GE, but only about 15% utilization max). I will try putting corosync on it's own network, which will hopefull fix it - but it seems like a pretty serious bug in corosync if missing a packet can make a server become unresponsive. It seems like missing a packet...
  16. E

    pct snapshot <vmid> <snapname> not working reliably

    I would like to conclude this thread by apologizing for wasting anyone's time. It turns out that 'pct snapshot' is reliable. Here's what was happening: The script goes through and snapshots all the VMs with 'qm snapshot', then snapshots all the containers with 'pct snapshot'. The script then...
  17. E

    pvesr Call Trace: (servers going offline)

    HA is not enabled. Another data point and this may be the most relevant one... I have 14 nodes running proxmox 6.0: 6 of them are standalone - all stable. 4 of them are in the first cluster - only 3 of them have gone offline 4 of them are in the second cluster - only 3 of them have gone...
  18. E

    pvesr Call Trace: (servers going offline)

    root@vsys07:/etc/pve# cat corosync.conf logging { debug: off to_syslog: yes } nodelist { node { name: vsys06 nodeid: 1 quorum_votes: 1 ring0_addr: X.Y.241.2 } node { name: vsys07 nodeid: 2 quorum_votes: 1 ring0_addr: X.Y.241.3 } node { name...
  19. E

    pvesr Call Trace: (servers going offline)

    another data point: If i'm logged into the console I can NOT ssh out of the server, but all the VMs and containers continue to run without issue - reading/writing to disk, reading/writing to the network.
  20. E

    pvesr Call Trace: (servers going offline)

    Another data point: if I 'systemctl restart sshd' it restarts and the old hung sessions are still hung and I still can't ssh into the server. If there are any other commands you want me to run on the console while it's in this hung state, let me know.