Search results

  1. W

    Opt-in Linux Kernel 5.15 for Proxmox VE 7.x available

    Supermicro, 4 near-identical hosts though one was purchased much more recently. They should all be on latest bios or near to it, particularly the more recent purchase. Today was coincidentally my last day with this company though so don't have screenshots of the kernel stack trace on hand but...
  2. W

    Opt-in Linux Kernel 5.15 for Proxmox VE 7.x available

    Besides my issues on my home lab with gpu passthrough issues, this 5.15.30 kernel has now caused 2 of 4 of our commercial hosting dual socket EPYC servers to crash and dump kernel stracktraces repeatedly and we are forced to downgrade to restore any kind of working hosting environment on these...
  3. W

    [SOLVED] GPU Passthrough Issues After Upgrade to 7.2

    Same problem here with 5.15.30 using AMD cpus and Nvidia cards. I've noticed that the mainline kernels 5.15.33 to .36 mention a lot of iommu and vfio changes/fixes but I'm unsure if any are relevent. I'm going to test 5.15.37 on my home lab this weekend and will report my findings to proxmox...
  4. W

    Opt-in Linux Kernel 5.15 for Proxmox VE 7.x available

    I've had to downgrade to 5.13 to get any kind of gpu passthrough working to my guest VMs again. On 5.15 the error messages below flood syslog *rapidly* to the point of filling the root partition with a multi-gigabyte sized /var/log/syslog nb: this is with simplefb disabled as well as all other...
  5. W

    [SOLVED] GPU Passthrough throws driver error 43

    I've been having errors and breakage with gpu passthrough on all my hosts I've upgraded to 7.2 with kernel 5.15.30. Downgrading the pve servers to kernel 5.13.x fixes the issue for me. I'm doing some testing and will file a bug when I have some concrete data to give PVE devs. I'll also note...
  6. W

    Unable to boot OVMF VM on Proxmox 6

    Regarding the efi boot issue, I can reproduce this reliably on other proxmox clusters using normal disk images. Make a new vm, debian 10 iso, 32gb disk, ovmf 'bios' and q35 machine type. Add a second 32gb disk image. Boot VM. In the debian installer select 'manual' partitioning. Create a 500mb...
  7. W

    Unable to boot OVMF VM on Proxmox 6

    Yep, done, https://bugzilla.proxmox.com/show_bug.cgi?id=3010 Thanks
  8. W

    Unable to boot OVMF VM on Proxmox 6

    Yes, several times to be sure. The entries created by efibootmgr under linux weren't 'sticking' apparently with 'q35' as machine type, as well as only 1 drive showing up in early efi boot and grub was unable to assemble the mdadm raid device to read its main grub.cfg - unless exited back out to...
  9. W

    Unable to boot OVMF VM on Proxmox 6

    I'm hitting this issue now too. I thought I had fixed it by destroying and remaking the efi disk, but this apparently only works for 1 boot and then the problem manifests again. The *only* reliable workaround I've found is to manually change the machine type in the vm .conf from 'q53' to...
  10. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Hi @Fabio, @fabian, This morning sanctuary dropped out of the cluster with a divide error, load would have been negligible at the time: Aug 7 00:10:37 sanctuary kernel: [739314.684408] show_signal: 6 callbacks suppressed Aug 7 00:10:37 sanctuary kernel: [739314.684410] traps...
  11. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    I can report our cluster is still green again this morning for all 3 nodes
  12. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Hi Apollon77, Marin Bernard's logs did show pmtud log entries, which lead me to suspect their issue was the same as mine - but it's possible there's multiple things at work here (in which case I apologize for any potential thread hijacking!). You might want to grep your logs for it to confirm...
  13. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    I did try that the night before you posted the test version of libknet, and that morning the cluster was green and not reporting the PMTUD issue. I undid this change to test that version of libknet however
  14. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    I think previously, on the original libknet version other hosts were flapping too. I can go back further in the logs to find out if needed. The workload on scramjet isn't very different in nature but it is higher, being an EPYC system with more ram than the other hosts it'll typically run more...
  15. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Hi Fabio, we have some hardware coming in this week for a new production cluster so more than happy to do whatever we can to help fix this issue. Attached are the logs from all three hosts from 12am saturday morning until I manually restarted corosync on scramjet around midday. Scramjet is the...
  16. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    This morning the cluster is green, no hosts marked as offline. I hope this means the specific issue with knet pmtud and crypto has been resolved, and the floating point exception I saw over the weekend was an anomaly. When I get to the office this morning I'll dig through logs for any sign of...
  17. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Aug 4 00:18:16 scramjet corosync[1472229]: [KNET ] pmtud: possible MTU misconfiguration detected. kernel is reporting MTU: 1500 bytes for host 2 link 0 but the other node is not acknowledging packets of this size. Aug 4 00:18:16 scramjet corosync[1472229]: [KNET ] pmtud: This can be...
  18. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Hi @fabian, sorry to report this version still has issues. I don't see the PMTUD issue in the logs today but one host still had issues keeping quorum, reported lost tokens for about an hour, (all of which might be a separate issue elsewhere) but then its kernel reported a crash in libknet.so...
  19. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    Hi Fabian, I'd be happy to test. I'll report back after the weekend if it's stable, or sooner if not.
  20. W

    [SOLVED] PVE 5.4-11 + Corosync 3.x: major issues

    *maybe* possibly related to this issue on knet with pmtud when using crypto? ://github.com/kronosnet/kronosnet/pull/242/commits if so it looks like a possible fix is in the pipeline already then (had to remove the https because the forum won't let me post links)

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!