iGPU cause vm crash

crc-error-79

Member
Apr 10, 2023
72
6
8
Italy
Hello,
I am having problems with Proxmox 8.3 and the iGPU of my Lenovo M90q. After the update, I am having a lot of crash with a debian vm where I pass the iGPU.
The vm crashes every 24-48 hours, sometimes during the daily backup, or during its idle period during the day/night.
How could I solve?

host
Code:
root@kronos:~# pveversion -v
proxmox-ve: 8.3.0 (running kernel: 6.8.12-4-pve)
pve-manager: 8.3.0 (running version: 8.3.0/c1689ccb1065a83b)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-4
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
proxmox-kernel-6.8.12-3-pve-signed: 6.8.12-3
proxmox-kernel-6.8.12-2-pve-signed: 6.8.12-2
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.0
libpve-storage-perl: 8.2.9
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.2.9-1
proxmox-backup-file-restore: 3.2.9-1
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.1
pve-cluster: 8.0.10
pve-container: 5.2.2
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-1
pve-ha-manager: 4.0.6
pve-i18n: 3.3.1
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.0
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1


root@kronos:~# lspci -s 00:02.0 -k
00:02.0 VGA compatible controller: Intel Corporation CometLake-S GT2 [UHD Graphics 630] (rev 03)
    DeviceName: Onboard - Video
    Subsystem: Lenovo CometLake-S GT2 [UHD Graphics 630]
    Kernel driver in use: vfio-pci
    Kernel modules: i915

root@kronos:~# uname -a
Linux kronos 6.8.12-4-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-4 (2024-11-06T15:04Z) x86_64 GNU/Linux

root@kronos:~# dmesg | grep -e DMAR -e IOMMU
[    0.011896] ACPI: DMAR 0x000000008FA22000 0000C8 (v01 LENOVO TC-M2W   000015E0      01000013)
[    0.011944] ACPI: Reserving DMAR table memory at [mem 0x8fa22000-0x8fa220c7]
[    0.112474] DMAR: IOMMU enabled
[    0.325654] DMAR: Host address width 39
[    0.325655] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.325667] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[    0.325671] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.325676] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.325678] DMAR: RMRR base: 0x000000901c9000 end: 0x00000090412fff
[    0.325681] DMAR: RMRR base: 0x00000093000000 end: 0x0000009f7fffff
[    0.325682] DMAR: RMRR base: 0x0000008f8ab000 end: 0x0000008f92afff
[    0.325685] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.325687] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.325689] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.327981] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.628809] DMAR: No ATSR found
[    0.628810] DMAR: No SATC found
[    0.628811] DMAR: IOMMU feature fl1gp_support inconsistent
[    0.628813] DMAR: IOMMU feature pgsel_inv inconsistent
[    0.628814] DMAR: IOMMU feature nwfs inconsistent
[    0.628815] DMAR: IOMMU feature pasid inconsistent
[    0.628816] DMAR: IOMMU feature eafs inconsistent
[    0.628817] DMAR: IOMMU feature prs inconsistent
[    0.628818] DMAR: IOMMU feature nest inconsistent
[    0.628819] DMAR: IOMMU feature mts inconsistent
[    0.628820] DMAR: IOMMU feature sc_support inconsistent
[    0.628821] DMAR: IOMMU feature dev_iotlb_support inconsistent
[    0.628822] DMAR: dmar0: Using Queued invalidation
[    0.628826] DMAR: dmar1: Using Queued invalidation
[    0.629474] DMAR: Intel(R) Virtualization Technology for Directed I/O

root@kronos:~# cat /etc/modprobe.d/blacklist.conf
blacklist i915

root@kronos:~# dmesg | grep -i vfio
[    2.106645] VFIO - User Level meta-driver version: 0.3
[    2.132616] vfio-pci 0000:00:02.0: vgaarb: deactivate vga console
[    2.132620] vfio-pci 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    2.132749] vfio_pci: add [8086:9bc8[ffffffff:ffffffff]] class 0x000000/00000000
[  899.729744] vfio-pci 0000:00:02.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xcb43
[284391.236288] vfio-pci 0000:00:02.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xcb43

dmesg inside of the vm
Code:
[    1.557050] i915 0000:01:00.0: [drm] VT-d active for gfx access
[    1.557087] i915 0000:01:00.0: [drm] Using Transparent Hugepages
[    1.561356] i915 0000:01:00.0: BAR 6: can't assign [??? 0x00000000 flags 0x20000000] (bogus alignment)
[    1.561359] i915 0000:01:00.0: [drm] Failed to find VBIOS tables (VBT)
[    1.561368] i915 0000:01:00.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
[    1.561653] i915 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    1.565210] i915 0000:01:00.0: firmware: direct-loading firmware i915/kbl_dmc_ver1_04.bin
[    1.565579] i915 0000:01:00.0: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[    1.594712] sr 1:0:0:0: Attached scsi CD-ROM sr0
[    2.746842] i915 0000:01:00.0: [drm] failed to retrieve link info, disabling eDP
[    2.753984] [drm] Initialized i915 1.6.0 20201103 for 0000:01:00.0 on minor 0
[    2.786415] i915 0000:01:00.0: [drm] Cannot find any crtc or sizes
[    2.789858] Console: switching to colour dummy device 80x25
[    2.789928] bochs-drm 0000:00:01.0: vgaarb: deactivate vga console
[    2.790091] bochs-drm 0000:00:01.0: enabling device (0004 -> 0006)
[    2.791083] [drm] Found bochs VGA, ID 0xb0c5.
[    2.791084] [drm] Framebuffer size 16384 kB @ 0x80000000, mmio @ 0x8304b000.
[    2.792028] [drm] Found EDID data blob.
[    2.792154] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:01.0 on minor 1

lspci
Code:
olimpo@persefone:~$ sudo lspci -s 01:00.0 -k
01:00.0 VGA compatible controller: Intel Corporation CometLake-S GT2 [UHD Graphics 630] (rev 03)
    Subsystem: Lenovo CometLake-S GT2 [UHD Graphics 630]
    Kernel driver in use: i915
    Kernel modules: i915


crash during the backup
Code:
INFO: Starting Backup of VM 254 (qemu)
INFO: Backup started at 2024-11-26 10:00:50
INFO: status = running
INFO: VM Name: persefone
INFO: include disk 'virtio0' 'local-btrfs:254/vm-254-disk-1.raw' 48G
INFO: include disk 'efidisk0' 'local-btrfs:254/vm-254-disk-0.raw' 528K
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/254/2024-11-26T09:00:50Z'
INFO: started backup task '9be2454c-f4a8-481b-88a0-045e835aa228'
INFO: resuming VM again
ERROR: VM 254 qmp command 'cont' failed - Resetting the Virtual Machine is required
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 254 failed - VM 254 qmp command 'cont' failed - Resetting the Virtual Machine is required
INFO: Failed at 2024-11-26 10:00:50
INFO: Starting Backup of VM 255 (qemu)
INFO: Backup started at 2024-11-26 10:00:50
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
 
Hi,

can you post the vm config and the host journal from the time of the crash too?
 
Hi and thanks for the reply!

Sure, here the config:

Code:
root@kronos:~# cat /etc/pve/qemu-server/254.conf
## persefone
#
#vm senza privilegi, multimedia, dipende da **truenas**
#
### servizi docker
#- jellyfin
#- audiobookshelf
#- handbrake
#- portainer agent
#
### protocolli
#- nfs
#- smb
balloon: 0
bios: ovmf
boot: order=virtio0;ide2;net0
cores: 8
cpu: host
efidisk0: local-btrfs:254/vm-254-disk-0.raw,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:00:02,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 8192
meta: creation-qemu=9.0.2,ctime=1726163790
name: persefone
net0: virtio=BC:24:11:C8:B9:13,bridge=vmbr0,firewall=1,tag=202
net1: virtio=BC:24:11:AA:75:6D,bridge=vmbr0,firewall=1,tag=203
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-single
smbios1: uuid=ca95f78e-9556-44fa-8ae5-773b25623908
sockets: 1
startup: order=8,up=30
tags: docker;nas
virtio0: local-btrfs:254/vm-254-disk-1.raw,discard=on,iothread=1,size=48G
vmgenid: ceae0840-3db9-49f0-acbe-f2cf85d1e4e6

Hi,

can you post the vm config and the host journal from the time of the crash too?
how can I get the journal?
 
Here 2 logs (novemeber 23 & 26) during the weekend I did some tests on both host & guest so logs are not valid.

On both at 3 am I have this:


Code:
Nov 23 03:01:24 kronos QEMU[360722]: error: kvm run failed Bad address
Nov 23 03:01:24 kronos QEMU[360722]: RAX=0000000000000000 RBX=ffffa39502000000 RCX=0000000000000001 RDX=ffffa395027ffd98
Nov 23 03:01:24 kronos QEMU[360722]: RSI=00000000000fffb3 RDI=ffff92621168c828 RBP=00000000000fffb3 RSP=ffffa3950a4a7d10
Nov 23 03:01:24 kronos QEMU[360722]: R8 =0000000000000002 R9 =000000008020000f R10=0000000000001000 R11=0000000000000000
Nov 23 03:01:24 kronos QEMU[360722]: R12=000000010052d001 R13=000000000000004d R14=ffff926208c86000 R15=0000000000000000
Nov 23 03:01:24 kronos QEMU[360722]: RIP=ffffffffc0801111 RFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
Nov 23 03:01:24 kronos QEMU[360722]: ES =0000 0000000000000000 ffffffff 00c00000
Nov 23 03:01:24 kronos QEMU[360722]: CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
Nov 23 03:01:24 kronos QEMU[360722]: SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
Nov 23 03:01:24 kronos QEMU[360722]: DS =0000 0000000000000000 ffffffff 00c00000
Nov 23 03:01:24 kronos QEMU[360722]: FS =0000 0000000000000000 ffffffff 00c00000
Nov 23 03:01:24 kronos QEMU[360722]: GS =0000 ffff926377c40000 ffffffff 00c00000
Nov 23 03:01:24 kronos QEMU[360722]: LDT=0000 0000000000000000 00000000 00000000
Nov 23 03:01:24 kronos QEMU[360722]: TR =0040 fffffe000003e000 00004087 00008b00 DPL=0 TSS64-busy
Nov 23 03:01:24 kronos QEMU[360722]: GDT=     fffffe000003c000 0000007f
Nov 23 03:01:24 kronos QEMU[360722]: IDT=     fffffe0000000000 00000fff
Nov 23 03:01:24 kronos QEMU[360722]: CR0=80050033 CR2=00007ffdc73daff8 CR3=0000000183e10004 CR4=00370ee0
Nov 23 03:01:24 kronos QEMU[360722]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Nov 23 03:01:24 kronos QEMU[360722]: DR6=00000000ffff0ff0 DR7=0000000000000400
Nov 23 03:01:24 kronos QEMU[360722]: EFER=0000000000000d01
Nov 23 03:01:24 kronos QEMU[360722]: Code=d5 72 27 45 85 c0 74 17 31 c0 48 63 d0 48 01 ea 48 8d 14 d3 <4c> 89 22 83 c0 01 41 39 c0 75 eb 5b 5d 41 5c 41 5d c3 cc cc cc cc 44 89 e9 48 c7 c7 90 65
Nov 23 03:10:01 kronos CRON[772847]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 23 03:10:01 kronos CRON[772848]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Nov 23 03:10:01 kronos CRON[772847]: pam_unix(cron:session): session closed for user root
Nov 23 03:17:01 kronos CRON[775566]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 23 03:17:01 kronos CRON[775567]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 23 03:17:01 kronos CRON[775566]: pam_unix(cron:session): session closed for user root
Nov 23 03:26:52 kronos systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
 

Attachments

Last edited:
Thanks again,
I have already saw those posts but on host and guest I don't have the /usr/bin/fwupdmgr.

could the issue be related to this?
host
Code:
root@kronos:~# dmesg | grep -i vfio
[    2.106645] VFIO - User Level meta-driver version: 0.3
[    2.132616] vfio-pci 0000:00:02.0: vgaarb: deactivate vga console
[    2.132620] vfio-pci 0000:00:02.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    2.132749] vfio_pci: add [8086:9bc8[ffffffff:ffffffff]] class 0x000000/00000000
[  899.729744] vfio-pci 0000:00:02.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xcb43
[284391.236288] vfio-pci 0000:00:02.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xcb43
see the last 2 lines

and on the
guest
Code:
root@persefone:~# dmesg | grep "drm"
[    1.002626] ACPI: bus type drm_connector registered
[    1.330516] i915 0000:01:00.0: [drm] VT-d active for gfx access
[    1.330555] i915 0000:01:00.0: [drm] Using Transparent Hugepages
[    1.338909] i915 0000:01:00.0: [drm] Failed to find VBIOS tables (VBT)
[    1.342698] i915 0000:01:00.0: [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4)
[    2.505642] i915 0000:01:00.0: [drm] failed to retrieve link info, disabling eDP
[    2.512619] [drm] Initialized i915 1.6.0 20201103 for 0000:01:00.0 on minor 0
[    2.544984] i915 0000:01:00.0: [drm] Cannot find any crtc or sizes
[    2.549816] bochs-drm 0000:00:01.0: vgaarb: deactivate vga console
[    2.549870] bochs-drm 0000:00:01.0: enabling device (0004 -> 0006)
[    2.551287] [drm] Found bochs VGA, ID 0xb0c5.
[    2.551289] [drm] Framebuffer size 16384 kB @ 0x80000000, mmio @ 0x8304b000.
[    2.552313] [drm] Found EDID data blob.
[    2.552577] [drm] Initialized bochs-drm 1.0.0 20130925 for 0000:00:01.0 on minor 1
[    2.553433] fbcon: bochs-drmdrmfb (fb0) is primary device
[    2.579109] i915 0000:01:00.0: [drm] Cannot find any crtc or sizes
[    2.612040] i915 0000:01:00.0: [drm] Cannot find any crtc or sizes
[    2.733831] bochs-drm 0000:00:01.0: [drm] fb0: bochs-drmdrmfb frame buffer device
[    3.572570] systemd[1]: Starting modprobe@drm.service - Load Kernel Module drm...
[    3.587414] systemd[1]: modprobe@drm.service: Deactivated successfully.
[    3.587518] systemd[1]: Finished modprobe@drm.service - Load Kernel Module drm.

Note:
on the host, after an update-grub, an update-initramfs -u and a reboot everything seems fine but at 3 am the history repeats, the vm crashes and I have the return of those messages/state

EDIT:
on the vm's BIOS the secure boot is disable
 
I just noticed that everytime the crash occurs the qemu agent on the vm config become disabled, so I have to shutdown the vm and re-enable it and start the vm again.

I don't know, I am not so expert and maybe this can can help you...
to do a comparison, on another Lenovo mini pc, I have a vm (argos) with similar configuration.
On this one to see the therminal I had to add a serial device and change the grub config, othewise the vnc screen become blank as soon as linux starts booting.

On this one instead (persefone), the system (I think) added another virtual video card so I don't need the serial, maybe this is the cause of the issues

persefone
Code:
olimpo@persefone:~$ lspci -nnk | grep -iA3 "vga"
00:01.0 VGA compatible controller [0300]: Device [1234:1111] (rev 02)
        Subsystem: Red Hat, Inc. Device [1af4:1100]
        Kernel driver in use: bochs-drm
        Kernel modules: bochs
--
01:00.0 VGA compatible controller [0300]: Intel Corporation CometLake-S GT2 [UHD Graphics 630] [8086:9bc8] (rev 03)
        Subsystem: Lenovo CometLake-S GT2 [UHD Graphics 630] [17aa:316a]
        Kernel driver in use: i915
        Kernel modules: i915

argos
Code:
olimpo@argos:~$ lspci -nnk | grep -iA3 "vga"
01:00.0 VGA compatible controller [0300]: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] [8086:3e92]
        Subsystem: Lenovo CoffeeLake-S GT2 [UHD Graphics 630] [17aa:3136]
        Kernel driver in use: i915
        Kernel modules: i915


1732714336279.png
 
could the issue be related to this?
possibly , do you boot the host with bios or uefi?
could you maybe test to set the vm to seabios instead of ovmf? (should not make a difference, but it can't hurt to try)

I just noticed that everytime the crash occurs the qemu agent on the vm config become disabled, so I have to shutdown the vm and re-enable it and start the vm again.
what do you mean by this? the config does not change by itself, this must be done by someone or by some tool
 
possibly , do you boot the host with bios or uefi?
It sould be UEFI
Code:
olimpo@persefone:~$ sudo ls /sys/firmware/efi/
config_table  efivars  fw_platform_size  fw_vendor  mok-variables  runtime  runtime-map  systab

olimpo@persefone:~$ sudo dmesg | grep -i efi
[    0.000000] efi: EFI v2.70 by Proxmox distribution of EDK II
[    0.000000] efi: SMBIOS=0x7e9d4000 ACPI=0x7eb7d000 ACPI 2.0=0x7eb7d014 MEMATTR=0x7ca20018 MOKvar=0x7e97f000
[    0.036827] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.145217] pci 0000:00:01.0: BAR 0: assigned to efifb
[    0.313665] Registered efivars operations
[    0.636464] efifb: probing for efifb
[    0.636474] efifb: framebuffer at 0x80000000, using 1408k, total 1408k
[    0.636475] efifb: mode is 800x600x24, linelength=2400, pages=1
[    0.636476] efifb: scrolling: redraw
[    0.636477] efifb: Truecolor: size=0:8:8:8, shift=0:16:8:0
[    0.637995] fb0: EFI VGA frame buffer device
[    0.673924] integrity: Loading X.509 certificate: UEFI:db
[    0.673974] integrity: Loading X.509 certificate: UEFI:db
[    0.673987] integrity: Loaded X.509 cert 'Microsoft Corporation UEFI CA 2011: 13adbf4309bd82709c8cd54f316ed522988a1bd4'
[    3.363147] systemd[1]: Starting modprobe@efi_pstore.service - Load Kernel Module efi_pstore...
[    3.372677] pstore: Registered efi as persistent store backend
[    3.377937] systemd[1]: modprobe@efi_pstore.service: Deactivated successfully.
[    3.378033] systemd[1]: Finished modprobe@efi_pstore.service - Load Kernel Module efi_pstore.

olimpo@persefone:~$ sudo cat /boot/grub/grub.cfg | grep -i "uefi"
### BEGIN /etc/grub.d/30_uefi-firmware ###

menuentry 'UEFI Firmware Settings' $menuentry_id_option 'uefi-firmware' {
### END /etc/grub.d/30_uefi-firmware ###


could you maybe test to set the vm to seabios instead of ovmf? (should not make a difference, but it can't hurt to try)
I can try

what do you mean by this? the config does not change by itself, this must be done by someone or by some tool
I am not totally sure, but when the vm start its "crashing tour" (sad), I noticed that the qemu was disabled, so I re-enabled it.
During the weekend I did many test to try to understand what was the cause, including deleting and restoring... maybe I recovered a backup with that already disabled. Now I enabled it.

Ps. do you know how this vm has 2 video cards (virtual and real) but the other with the same installation procedure, same os (deb 12), same config has just the real?
 
with seabios the vm doesn't boot and stays stuck on "Booting from hard disk"
depending on the guest os you can't simply change the bios mode, i more meant e.g. installing a second (similar) vm but on seabios to test

Ps. do you know how this vm has 2 video cards (virtual and real) but the other with the same installation procedure, same os (deb 12), same config has just the real?
one has display on serial and one on default (default adds a std virtual display device)
 
one has display on serial and one on default (default adds a std virtual display device)
I know, and it is strange because on this vm (on another pc) to see something on the vnc view, I had to add the serial port and some lines on the grub.conf.

Here instead (same debian 12, installed with same procedure) this behaviour is missing.

Here the new log, the vm has been up from monday to this early morning see the log.
The problem that cause the crash is always the same.

Code:
root@kronos:~# journalctl --since "2024-11-27 03:00:00" --until "2024-11-27 04:00:00"
Nov 27 03:10:01 kronos CRON[2109881]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 27 03:10:01 kronos CRON[2109882]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Nov 27 03:10:01 kronos CRON[2109881]: pam_unix(cron:session): session closed for user root
Nov 27 03:17:01 kronos CRON[2112661]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 27 03:17:01 kronos CRON[2112662]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 27 03:17:01 kronos CRON[2112661]: pam_unix(cron:session): session closed for user root

root@kronos:~# journalctl --since "2024-11-28 03:00:00" --until "2024-11-28 04:00:00"
Nov 28 03:00:38 kronos pvedaemon[2408352]: <root@pam> successful auth for user 'root@pam'
Nov 28 03:10:01 kronos CRON[2693588]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 28 03:10:01 kronos CRON[2693589]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Nov 28 03:10:01 kronos CRON[2693588]: pam_unix(cron:session): session closed for user root
Nov 28 03:15:39 kronos pvedaemon[2408928]: <root@pam> successful auth for user 'root@pam'
Nov 28 03:17:01 kronos CRON[2696354]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 28 03:17:01 kronos CRON[2696355]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 28 03:17:01 kronos CRON[2696354]: pam_unix(cron:session): session closed for user root
Nov 28 03:29:05 kronos systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Nov 28 03:29:07 kronos pveupdate[2701142]: <root@pam> starting task UPID:kronos:0029375B:02645206:6747D573:aptupdate::root@pam:
Nov 28 03:29:08 kronos pveupdate[2701147]: update new package list: /var/lib/pve-manager/pkgupdates
Nov 28 03:29:09 kronos pveupdate[2701142]: <root@pam> end task UPID:kronos:0029375B:02645206:6747D573:aptupdate::root@pam: OK
Nov 28 03:29:09 kronos pveupdate[2701142]: Custom certificate does not expire soon, skipping ACME renewal.
Nov 28 03:29:09 kronos systemd[1]: pve-daily-update.service: Deactivated successfully.
Nov 28 03:29:09 kronos systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Nov 28 03:29:09 kronos systemd[1]: pve-daily-update.service: Consumed 3.171s CPU time.
Nov 28 03:30:42 kronos pvedaemon[1947758]: <root@pam> successful auth for user 'root@pam'
Nov 28 03:45:43 kronos pvedaemon[2408928]: <root@pam> successful auth for user 'root@pam'


root@kronos:~# journalctl --since "2024-11-29 03:00:00" --until "2024-11-29 04:00:00"
Nov 29 03:01:24 kronos QEMU[2459043]: error: kvm run failed Bad address
Nov 29 03:01:24 kronos QEMU[2459043]: RAX=0000000000000000 RBX=ffffc19502800000 RCX=0000000000000001 RDX=ffffc19502fffd98
Nov 29 03:01:24 kronos QEMU[2459043]: RSI=00000000000fffb3 RDI=ffff9db3b0c20828 RBP=00000000000fffb3 RSP=ffffc195066bbd10
Nov 29 03:01:24 kronos QEMU[2459043]: R8 =0000000000000002 R9 =0000000080200017 R10=0000000000001000 R11=0000000000000000
Nov 29 03:01:24 kronos QEMU[2459043]: R12=0000000109d0e001 R13=000000000000004d R14=ffff9db423365180 R15=0000000000000000
Nov 29 03:01:24 kronos QEMU[2459043]: RIP=ffffffffc0b0a111 RFL=00010202 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
Nov 29 03:01:24 kronos QEMU[2459043]: ES =0000 0000000000000000 ffffffff 00c00000
Nov 29 03:01:24 kronos QEMU[2459043]: CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA]
Nov 29 03:01:24 kronos QEMU[2459043]: SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS   [-WA]
Nov 29 03:01:24 kronos QEMU[2459043]: DS =0000 0000000000000000 ffffffff 00c00000
Nov 29 03:01:24 kronos QEMU[2459043]: FS =0000 0000000000000000 ffffffff 00c00000
Nov 29 03:01:24 kronos QEMU[2459043]: GS =0000 ffff9db4f7d00000 ffffffff 00c00000
Nov 29 03:01:24 kronos QEMU[2459043]: LDT=0000 0000000000000000 00000000 00000000
Nov 29 03:01:24 kronos QEMU[2459043]: TR =0040 fffffe00000ef000 00004087 00008b00 DPL=0 TSS64-busy
Nov 29 03:01:24 kronos QEMU[2459043]: GDT=     fffffe00000ed000 0000007f
Nov 29 03:01:24 kronos QEMU[2459043]: IDT=     fffffe0000000000 00000fff
Nov 29 03:01:24 kronos QEMU[2459043]: CR0=80050033 CR2=00007f3f90000020 CR3=000000010558a006 CR4=00370ee0
Nov 29 03:01:24 kronos QEMU[2459043]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Nov 29 03:01:24 kronos QEMU[2459043]: DR6=00000000ffff0ff0 DR7=0000000000000400
Nov 29 03:01:24 kronos QEMU[2459043]: EFER=0000000000000d01
Nov 29 03:01:24 kronos QEMU[2459043]: Code=d5 72 27 45 85 c0 74 17 31 c0 48 63 d0 48 01 ea 48 8d 14 d3 <4c> 89 22 83 c0 01 41 39 c0 75 eb 5b 5d 41 5c 41 5d c3 cc cc cc cc 44 89 e9 48 c7 c7 90 f5
Nov 29 03:10:01 kronos CRON[3277385]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 29 03:10:01 kronos CRON[3277386]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Nov 29 03:10:01 kronos CRON[3277385]: pam_unix(cron:session): session closed for user root
Nov 29 03:17:01 kronos CRON[3280158]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Nov 29 03:17:01 kronos CRON[3280159]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Nov 29 03:17:01 kronos CRON[3280158]: pam_unix(cron:session): session closed for user root


Anyway, I now removed the igpu to see if the it crashes even in this condition.
In meantime I created a new vm (always deb12) and I passed the igpu.
let see what will happen during the weekend..
 
Little update:
despite the warning/error about the drm, during the weekend the machine worked - in idle - without any issue.

Yesterday's evening I watched some media on Jellyfin that it runs on this vm in docker and today at 3 am I got the usual crash.

I am thinking about the jellyfin, but the contents I watched weren't transcoded but just played (since the tv is compatible with the video format) so the igpu was not used...

Anyway, to exclude it I am rolling back to version 10.10.1 (near 1 month ago and before all this crash) and I will see what happens..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!