Since a few days I get these errors:
After this message by 2 SATA SSD's are unusable, see attachment for fuill but truncated logs (it goes on till I restart).
Code:
Feb 12 04:42:59 pve kernel: ahci 0000:01:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000a address=0x80040000 flags=0x0000]
-- Boot 9d008e871b754521822776480c2b9a01 --
Feb 12 12:38:46 pve kernel: ahci 0000:01:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000a address=0xc83f8000 flags=0x0000]
-- Boot 34bc63ef87fd4de2b262329e5e738864 --
-- Boot 653e9b64e57446e4888762403f9f5d23 --
Feb 15 05:47:49 pve kernel: ahci 0000:01:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000a address=0xa5f80000 flags=0x0000]
-- Boot f26145dd092d4273a37b339e489a5d2f --
-- Boot 3a13feb2c33c4a2fbc987886c5b28ff9 --
-- Boot 0244ca83800c469ba48d0fdd3bb0cae1 --
Feb 16 08:42:45 pve kernel: ahci 0000:01:00.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x000a address=0xb884a000 flags=0x0000]
After this message by 2 SATA SSD's are unusable, see attachment for fuill but truncated logs (it goes on till I restart).
- I don't see anything relevant before this error. At first I though it was an SSD overheating issue, but I misinterpreted the smartd message , and the drives were actually 27 degrees celcius.
- After this I added better monitoring, but this issue even occurs with not much going on (e.g. < 1MBps read, and < 4 MBps write). The only thing I can see in the last occurrence is the temperature of the drives dropping all of a sudden to 6.6. This is probably just a read error, as it is 17+ in the room it is sitting in.
- It took me a few crashes realizing the temp wasn't an issue, and made some mistakes with monitoring (writing to those SATA drives), but once I realized this wasn't an issue, I updated proxmox and also did a bios update. The guest is Debian 13, fully up-to-date.
- I noticed an update of the VM of the linux-image-amd64 the day before the crash, can this be the issue (I don't expect it as it is the guest, or am I wrong here?)
Setting up linux-image-amd64 (6.12.69-1) ...
Removing linux-image-6.12.57+deb13-amd64 (6.12.57-1) . - I added a few things for iGPU passthrough
Grub:and added this to the modprobe blacklist:Code:GRUB_CMDLINE_LINUX_DEFAULT="quiet video=vesafb:off video=efifb:off video=simplefb:off nofb initcall_blacklist=sysfb_init nomodeset iommu=pt"
The VM using this hasn't been run for a while, is it worth trying without all these options?Code:blacklist amdgpublacklist radeon - Hardware is from December 2021, but has not been used for about 3 out of 4 years, it's been running non stop since about 3- 6 months though. Hardware summary:
AMD Ryzen 7 5700G with Radeon Graphics
Asus ROG STRIX B550-I GAMING
m2 NVME disk (no issues)
SATA: 2x Samsung SSD 870 QVO 2TB - I couldn't find a similar issue, but from what I read this is either a Virtualization issue or SATA Controller issue?