Hello,
Note: I am not that good at reading logs and knowing which file logs what, so just let me know if you need something.
What is the Problem:
My host is crashing/freezing daily for a long time now. I would say about the time when I upgraded to PVE7 (shortly after release) also I installed a GPU around this time for passthrough to my Plex VM. I have tried a lot of things now but the only thing I "achieved" was that the host is not crashing all the time, instead it's now freezing sometimes or at first only some random VMs or CTs freeze.
Pictures of some screenshotted crashes/freezes:
This was no crash/freeze but seemed weird so I screenshotted it:
Before reinstalling Proxmox on the host, I got these GRUB messages (now I use UEFI):
What is my Setup:
What I think the problem could have something to do with and what I have done so far:
I hope someone has has an Idea. Thank you so much for helping!!!
Note: I am not that good at reading logs and knowing which file logs what, so just let me know if you need something.
What is the Problem:
My host is crashing/freezing daily for a long time now. I would say about the time when I upgraded to PVE7 (shortly after release) also I installed a GPU around this time for passthrough to my Plex VM. I have tried a lot of things now but the only thing I "achieved" was that the host is not crashing all the time, instead it's now freezing sometimes or at first only some random VMs or CTs freeze.
Pictures of some screenshotted crashes/freezes:
This was no crash/freeze but seemed weird so I screenshotted it:
Before reinstalling Proxmox on the host, I got these GRUB messages (now I use UEFI):
What is my Setup:
- AMD Ryzen 5 PRO 4650G
- 64GB RAM non ECC
- Gigabyte Aorus AMD x570 PRO Motherboard (newest BIOS - F36)
- NVIDIA Quadro P400
- All VMs and Container run on M.2 (2x Samsung 970 Evo 1TB - ZFS Mirror) SSDs
- Some have a data pool attached (2x Seagate Ironwolf Pro 6TB - ZFS Mirror + Log & Zil (2 partitions on same drive) on Seagate FireCuda M.2)
- Host is running on 2x Seagate IronWolf 510 Sata SSD - ZFS Mirror
- PVE Packages:
proxmox-ve: 7.4-1 (running kernel: 5.15.104-1-pve)pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-1
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1 - Currently, I have 4 VMs and 2 CTs
What I think the problem could have something to do with and what I have done so far:
- My GPU passthrough works perfect but at some time a thought crashes have to do with it because of the BIOS ERROR, so I reconfigured everything. Also tried deactivated the PT.
- BIOS - Updated to the newest BIOS version (2 times now)
- Due to this post "kernel-panic-whole-server-crashes-about-every-day" I:
- Installed microcode
- Set all my storage Async IO from "io_uring" to "native"
- Tried optional Linux Kernel (6.x)
- Due to the APPARMOR DENIED messages (screenshots), I gave my privileged Nextcloud container the features: nesting, nfs, cifs. Now o get the STATUS messages from APPARMOR (profile_replace, error=-13, apparmor_parser) (Screenshots)
- Maybe a motherboard problem? - But then, why was it running before upgrading PVE and installing a GPU just fine
I hope someone has has an Idea. Thank you so much for helping!!!