Search results

  1. E

    What now? - Please avoid using zpool upgrade on the "rpool" (root pool) itself, when upgrading to ZFS 2.0

    I have this server running PVE 6.3 with ZFS 2.0.4 that I did 'zpool upgrade -a' on and didn't expect it to be able to reboot, but it did: Now I'm wondering why it worked and if there is something specific that I can check to see if the others will also work? Is not booting with GRUB...
  2. E

    What now? - Please avoid using zpool upgrade on the "rpool" (root pool) itself, when upgrading to ZFS 2.0

    Just to clarify, does PVE with ZFS on rpool only work with UEFI boot as of PVE 6.4?
  3. E

    What now? - Please avoid using zpool upgrade on the "rpool" (root pool) itself, when upgrading to ZFS 2.0

    In the Proxmox VE 6.4 release notes known issues section it says: Please avoid using zpool upgrade on the "rpool" (root pool) itself, when upgrading to ZFS 2.0 on a system booted by GRUB in legacy mode, as that will break pool import by GRUB. ZFS 2.0.4 is also available in 6.3-6 which I'm...
  4. E

    ZFS on NVMe massive performance drop after update from Proxmox 6.2 to 6.3

    Just to conclude this thread: TL;DR proxmox 6.2-4 -> zfs 0.8.3 normal IO delay proxmox 6.3-3 -> zfs 0.8.5 has problems causing excessive IO delay, huge performance penalty proxmox 6.3-6 -> zfs 2.0.4 normal IO delay I moved all VMs to the identically configured host and had the identical...
  5. E

    ZFS on NVMe massive performance drop after update from Proxmox 6.2 to 6.3

    so https://forum.proxmox.com/threads/zfs-on-hdd-massive-performance-drop-after-update-from-proxmox-6-2-to-6-3.81820/ doesn't count? same before and after version and exactly the same problem. I will try this, thanks. Yes, it's unfortunate that I replaced a drive and upgraded at the same time...
  6. E

    ZFS on NVMe massive performance drop after update from Proxmox 6.2 to 6.3

    Hello Proxmox community, TL;DR Huge performance hit after upgrading to 6.3-3, probably having to do with IO delay. Replication from system to system will stall completely for up to a minute at a time. This graph shows the sharp rise in IO delay. The rise corresponds precisely with the...
  7. E

    [SOLVED] ZFS on HDD massive performance drop after update from Proxmox 6.2 to 6.3

    I have the EXACT same symptoms caused by upgrading to 6.3-3 from 6.2-4. That sharp rise in IO delay happened when I upgraded. same load before and after. I also started having LONG hangs when doing replication between hosts, almost a minute long. I have an extremely fast pool of NVMe...
  8. E

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    That's what I thought initially too. But the reason SSH is inaccessible is because there are files that SSHD needs that live in /etc/pve hierarchy. /etc/pve becomes inaccessible when corosync/pve-cluster stop functioning, so anything that touches it hangs. *and* restarting...
  9. E

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Hopefully this is still being looked at. I still have nodes that go offline and can be brought back by restarting corosync, then restarting pve-cluster. I have 2 4-node clusters. 1 node in each cluster has never gone offline - they have 128G RAM in them. The other 3 nodes in each cluster...
  10. E

    pvesr Call Trace: (servers going offline)

    I just figured out from this post https://forum.proxmox.com/threads/pve-5-4-11-corosync-3-x-major-issues.56124/post-262788 that I can bring my 'grey' node back online by restarting corosync and pve-cluster on all nodes in the cluster. After doing this, I can now SSH into and out of the once...
  11. E

    pvesr Call Trace: (servers going offline)

    I'm reviving this thread. I've moved the clustering network to it's own physical NIC on each node going across an isolated switch. This is did not fix the problem. I currently have 1 node that is in the 'grey' state - VMs continuing to run and function normally, but I can't SSH into or out of...
  12. E

    pvesr Call Trace: (servers going offline)

    I am using ZFS as my storage... However, the thread you referenced doesn't seem to be related. When my nodes go offline, the VMs and containers keep running just fine. My heaviest node has 83 VMs/containers that are very active on the filesystem and network. They all keep running... for days...
  13. E

    pvesr Call Trace: (servers going offline)

    Yes, all traffic is on X.Y.241 (10GE, but only about 15% utilization max). I will try putting corosync on it's own network, which will hopefull fix it - but it seems like a pretty serious bug in corosync if missing a packet can make a server become unresponsive. It seems like missing a packet...
  14. E

    pct snapshot <vmid> <snapname> not working reliably

    I would like to conclude this thread by apologizing for wasting anyone's time. It turns out that 'pct snapshot' is reliable. Here's what was happening: The script goes through and snapshots all the VMs with 'qm snapshot', then snapshots all the containers with 'pct snapshot'. The script then...
  15. E

    pvesr Call Trace: (servers going offline)

    HA is not enabled. Another data point and this may be the most relevant one... I have 14 nodes running proxmox 6.0: 6 of them are standalone - all stable. 4 of them are in the first cluster - only 3 of them have gone offline 4 of them are in the second cluster - only 3 of them have gone...
  16. E

    pvesr Call Trace: (servers going offline)

    root@vsys07:/etc/pve# cat corosync.conf logging { debug: off to_syslog: yes } nodelist { node { name: vsys06 nodeid: 1 quorum_votes: 1 ring0_addr: X.Y.241.2 } node { name: vsys07 nodeid: 2 quorum_votes: 1 ring0_addr: X.Y.241.3 } node { name...
  17. E

    pvesr Call Trace: (servers going offline)

    another data point: If i'm logged into the console I can NOT ssh out of the server, but all the VMs and containers continue to run without issue - reading/writing to disk, reading/writing to the network.
  18. E

    pvesr Call Trace: (servers going offline)

    Another data point: if I 'systemctl restart sshd' it restarts and the old hung sessions are still hung and I still can't ssh into the server. If there are any other commands you want me to run on the console while it's in this hung state, let me know.
  19. E

    pvesr Call Trace: (servers going offline)

    Another data point: I was able to log into the console and 'systemctl restart corosync'. and corosync seemed to come back up and regain quorum - but I still can't SSH into the box and that node in the GUI then changes from a grey question mark to a red X, just like as if it was powered down.
  20. E

    pvesr Call Trace: (servers going offline)

    I've attached those files. It appears that corosync and/or pvesr are having issues. Thanks for your attention to this.

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!