Hi,
I do encounter - more or less randomly - freezes of the complete host (Debian with proxmox) however it seems to be related to when I game on the Windows 10 VM.
Host System:
The config of the windows VM:
The whole system completely freezes and the only thing i can do is a hard reset. So far it happen mostly when I was gaming on the Windows VM, apart from that it runs flawlessly, only every now and then a freeze occurs.
I have checked the dmesg output while running (-w switch) but it really doesn't show anything of value (at least i dont see anything)
Here as example the last crash:
The last line happened way before the freeze... Also the journalctl doesnt show anything of value to me:
The htop that was open showed me a CPU temp of 61 degrees celsius and a load avg 3.8-4.2
The Iommu itself works fine and the groups are also fine.
Nevertheless here my configs:
Don't mind the NVIDIA stuff, this was when i added an NVIDIA GPU. The problem showed also up when I did not have the entries for the nvidia card.
Any ideas how to debug this? The only ideas I have found in similar threads is the recommendation to run a memtest?
Help appreciated, thanks in advance
I do encounter - more or less randomly - freezes of the complete host (Debian with proxmox) however it seems to be related to when I game on the Windows 10 VM.
Host System:
CPU | AMD Phenom 2 1090 XT @3.2GHz (6 core) |
Board | AS rock 890FX (Deluxe 4) |
GPU (passed trough) | AMD Radeon R9 285 |
RAM | 32GB @667mhz |
The config of the windows VM:
Code:
audio0: device=ich9-intel-hda,driver=spice
balloon: 0
boot: order=ide0;ide2;net0
cores: 4
cpu: host
hostpci0: 0000:07:00,pcie=1,x-vga=1
ide0: local:100/vm-100-disk-0.qcow2,size=120G
ide2: none,media=cdrom
machine: pc-q35-6.1
memory: 12288
meta: creation-qemu=6.1.0,ctime=1637058178
name: pWin10
net0: e1000=3E:4D:9B:B7:98:1B,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=c9547027-5d60-4960-9280-8ee54a480cd7
sockets: 1
usb0: host=0a12:0001,usb3=1
vga: none
vmgenid: e3b10431-1cb4-4e7b-8a96-0fd21a3e71f4
The whole system completely freezes and the only thing i can do is a hard reset. So far it happen mostly when I was gaming on the Windows VM, apart from that it runs flawlessly, only every now and then a freeze occurs.
I have checked the dmesg output while running (-w switch) but it really doesn't show anything of value (at least i dont see anything)
Here as example the last crash:
Code:
[42355.726529] fwbr100i0: port 2(tap100i0) entered disabled state
[42355.749345] fwbr100i0: port 1(fwln100i0) entered disabled state
[42355.749535] vmbr0: port 2(fwpr100p0) entered disabled state
[42355.749680] device fwln100i0 left promiscuous mode
[42355.749690] fwbr100i0: port 1(fwln100i0) entered disabled state
[42355.768896] device fwpr100p0 left promiscuous mode
[42355.768905] vmbr0: port 2(fwpr100p0) entered disabled state
[42356.096438] usb 9-2: reset full-speed USB device number 2 using ohci-pci
[42483.976444] device tap100i0 entered promiscuous mode
[42484.030109] vmbr0: port 2(fwpr100p0) entered blocking state
[42484.030119] vmbr0: port 2(fwpr100p0) entered disabled state
[42484.030222] device fwpr100p0 entered promiscuous mode
[42484.030299] vmbr0: port 2(fwpr100p0) entered blocking state
[42484.030302] vmbr0: port 2(fwpr100p0) entered forwarding state
[42484.036676] fwbr100i0: port 1(fwln100i0) entered blocking state
[42484.036685] fwbr100i0: port 1(fwln100i0) entered disabled state
[42484.036814] device fwln100i0 entered promiscuous mode
[42484.036882] fwbr100i0: port 1(fwln100i0) entered blocking state
[42484.036885] fwbr100i0: port 1(fwln100i0) entered forwarding state
[42484.044604] fwbr100i0: port 2(tap100i0) entered blocking state
[42484.044614] fwbr100i0: port 2(tap100i0) entered disabled state
[42484.044747] fwbr100i0: port 2(tap100i0) entered blocking state
[42484.044750] fwbr100i0: port 2(tap100i0) entered forwarding state
[42489.564400] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[42489.564415] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[42503.634902] usb 9-2: reset full-speed USB device number 2 using ohci-pci
The last line happened way before the freeze... Also the journalctl doesnt show anything of value to me:
Code:
May 01 12:16:57 center sudo[90219]: matze : TTY=pts/4 ; PWD=/home/matze ; USER=root ; COMMAND=/usr/bin/dmesg -w
May 01 12:16:57 center sudo[90219]: pam_unix(sudo:session): session opened for user root(uid=0) by matze(uid=1000)
May 01 12:17:01 center CRON[90240]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 01 12:17:01 center CRON[90241]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
May 01 12:17:01 center CRON[90240]: pam_unix(cron:session): session closed for user root
May 01 12:17:13 center sshd[90264]: Accepted password for matze from 192.168.1.103 port 47466 ssh2
May 01 12:17:13 center sshd[90264]: pam_unix(sshd:session): session opened for user matze(uid=1000) by (uid=0)
May 01 12:17:13 center systemd-logind[730]: New session 24 of user matze.
May 01 12:17:13 center systemd[1]: Started Session 24 of user matze.
May 01 12:17:19 center gnome-software[82915]: Only 0 apps for recent list, hiding
May 01 12:17:19 center gnome-software[82915]: hiding category games featured applications: found only 0 to show, need at least 9
May 01 12:17:19 center gnome-software[82915]: hiding category productivity featured applications: found only 0 to show, need at least 9
May 01 12:17:19 center gnome-software[82915]: automatically prevented from changing kind on system/package/debian-stable-main/generic/org.gphoto.libgphoto2/* from generic>
May 01 12:17:19 center gnome-software[2265]: hiding category productivity featured applications: found only 0 to show, need at least 9
May 01 12:17:19 center gnome-software[2265]: hiding category games featured applications: found only 0 to show, need at least 9
May 01 12:17:19 center gnome-software[2265]: Only 0 apps for recent list, hiding
May 01 12:17:19 center gnome-software[2265]: automatically prevented from changing kind on system/package/debian-stable-main/generic/org.gphoto.libgphoto2/* from generic >
May 01 12:17:20 center PackageKit[2194]: get-updates transaction /4441_aecbabce from uid 1000 finished with success after 669ms
May 01 12:17:20 center PackageKit[2194]: get-updates transaction /4442_bdeaaaea from uid 1001 finished with success after 657ms
May 01 12:23:28 center pvedaemon[89324]: <root@pam> successful auth for user 'matze@pam'
May 01 12:23:36 center smartd[720]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 61 to 47
May 01 12:23:36 center smartd[720]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 62 to 60
May 01 12:24:03 center pvedaemon[2109]: <root@pam> successful auth for user 'matze@pam'
May 01 12:24:11 center pveproxy[88476]: worker exit
May 01 12:24:11 center pveproxy[2329]: worker 88476 finished
May 01 12:24:11 center pveproxy[2329]: starting 1 worker(s)
May 01 12:24:11 center pveproxy[2329]: worker 91183 started
May 01 12:28:35 center pveproxy[88958]: worker exit
May 01 12:28:35 center pveproxy[2329]: worker 88958 finished
May 01 12:28:35 center pveproxy[2329]: starting 1 worker(s)
May 01 12:28:35 center pveproxy[2329]: worker 91702 started
May 01 12:33:08 center pveproxy[89669]: worker exit
May 01 12:33:08 center pveproxy[2329]: worker 89669 finished
May 01 12:33:08 center pveproxy[2329]: starting 1 worker(s)
May 01 12:33:08 center pveproxy[2329]: worker 92248 started
May 01 12:38:29 center pvedaemon[89324]: <root@pam> successful auth for user 'matze@pam'
May 01 12:39:03 center pvedaemon[89324]: <root@pam> successful auth for user 'matze@pam'
The htop that was open showed me a CPU temp of 61 degrees celsius and a load avg 3.8-4.2
The Iommu itself works fine and the groups are also fine.
Nevertheless here my configs:
Code:
cat /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="amd_iommu=on vfio-pci.ids=1002:6939,1002:aad8,10de:13c2,10de:0fbb"
Code:
cat /etc/modules
vfio
vfio_iommu_type1
vfio_pci ids=1002:6939,1002:aad8,10de:13c2,10de:0fbb
vfio_virqfd
Code:
cat /etc/modprobe.d/vfio_pci.conf
options vfio-pci ids=1002:6939,1002:aad8,10de:13c2,10de:0fbb
softdep amdgpu pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nouveau pre: vfio-pci
Don't mind the NVIDIA stuff, this was when i added an NVIDIA GPU. The problem showed also up when I did not have the entries for the nvidia card.
Any ideas how to debug this? The only ideas I have found in similar threads is the recommendation to run a memtest?
Help appreciated, thanks in advance