Hi,
Due to some stability issues I reinstalled my whole VM last weekend. This was to get rid of some BSODs that I was getting with my Windows 10 VM. They are still there unfortunately but things have gotten worse. It seems that when I leave my system alone for a little while I got a lock up. The second time it locked up I was able to get the following info from the log. Hopefully someone can shed some light on this. By the way, this is a total system lock up so not just my VM anymore:
[44963.608464] fpu exception: 0000 [#1] SMP NOPTI
[44963.608467] CPU: 15 PID: 24366 Comm: kworker/15:0 Tainted: P O 5.3.10-1-pve #1
[44963.608468] Hardware name: System manufacturer System Product Name/ROG STRIX X570-E GAMING, BIOS 1408 04/01/2020
[44963.608520] Workqueue: events dm_irq_work_func [amdgpu]
[44963.608564] RIP: 0010:dcn_bw_ceil2+0x40/0x60 [amdgpu]
[44963.608565] Code: 89 e5 48 83 ec 08 dd 05 6e be 19 00 f3 0f 2c c2 0f 57 d2 f3 0f 2a d0 f3 0f 59 d1 f3 0f 11 55 fc d9 45 fc f3 0f 11 45 fc de c1 <d9> 45 fc d9 c9 df f1 dd d8 72 02 c9 c3 c9 f3 0f 58 ca 0f 28 c1 c3
[44963.608566] RSP: 0018:ffffb987432379c0 EFLAGS: 00010286
[44963.608567] RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000000a00
[44963.608567] RDX: ffff8df1b5ff2ce4 RSI: ffff8df1b5ff2ce4 RDI: ffff8df1b5ff1ed8
[44963.608568] RBP: ffffb987432379c8 R08: ffff8df1b5ff3b98 R09: 0000000000000550
[44963.608569] R10: 0000000000000020 R11: 0000000000000000 R12: ffff8df1b5ff2030
[44963.608569] R13: 0000000000000000 R14: 0000000000000002 R15: ffff8df1b5ff1ed8
[44963.608570] FS: 0000000000000000(0000) GS:ffff8dfa3ebc0000(0000) knlGS:0000000000000000
[44963.608571] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44963.608571] CR2: 0000000000000000 CR3: 0000000fbbbb8000 CR4: 0000000000340ea0
[44963.608572] Call Trace:
[44963.608613] dml20_ModeSupportAndSystemConfigurationFull+0x3a7/0x5a50 [amdgpu]
[44963.608616] ? prep_new_page+0x129/0x160
[44963.608617] ? get_page_from_freelist+0x7b8/0x1370
[44963.608657] dml_get_voltage_level+0x137/0x1d0 [amdgpu]
[44963.608697] dcn20_validate_bandwidth+0x305/0x1720 [amdgpu]
[44963.608733] dc_validate_global_state+0x27e/0x330 [amdgpu]
[44963.608773] amdgpu_dm_atomic_check+0x5ad/0x7b0 [amdgpu]
[44963.608783] drm_atomic_check_only+0x462/0x830 [drm]
[44963.608790] drm_atomic_commit+0x18/0x50 [drm]
[44963.608828] dm_restore_drm_connector_state+0xfc/0x130 [amdgpu]
[44963.608865] handle_hpd_irq+0xc1/0x100 [amdgpu]
[44963.608901] dm_irq_work_func+0x53/0x70 [amdgpu]
[44963.608904] process_one_work+0x20f/0x3d0
[44963.608905] worker_thread+0x34/0x400
[44963.608906] kthread+0x120/0x140
[44963.608907] ? process_one_work+0x3d0/0x3d0
[44963.608908] ? __kthread_parkme+0x70/0x70
[44963.608910] ret_from_fork+0x22/0x40
[44963.608911] Modules linked in: veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter ax25 edac_mce_amd kvm_amd nfnetlink_log nfnetlink kvm zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gspca_vc032x uvcvideo gspca_main videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 input_leds aes_x86_64 joydev videobuf2_common crypto_simd eeepc_wmi cryptd snd_hda_codec_realtek glue_helper videodev asus_wmi snd_hda_codec_generic snd_usb_audio sparse_keymap ledtrig_audio btusb video snd_usbmidi_lib btrtl snd_rawmidi btbcm snd_seq_device wmi_bmof mc iwlmvm btintel snd_hda_codec_hdmi bluetooth pcspkr mac80211 libarc4 snd_hda_intel ecdh_generic snd_hda_codec ecc snd_hda_core snd_hwdep snd_pcm snd_timer k10temp ccp snd soundcore iwlwifi cfg80211 mac_hid zcommon(PO) znvpair(PO) spl(O) vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core
[44963.608941] iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c hid_microsoft ff_memless hid_generic usbkbd usbmouse usbhid hid amdgpu mxm_wmi amd_iommu_v2 gpu_sched ttm i2c_piix4 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci igb dca i2c_algo_bit wmi
[44963.608955] ---[ end trace b56b613f7fb5520c ]---
[44963.608993] RIP: 0010:dcn_bw_ceil2+0x40/0x60 [amdgpu]
[44963.608994] Code: 89 e5 48 83 ec 08 dd 05 6e be 19 00 f3 0f 2c c2 0f 57 d2 f3 0f 2a d0 f3 0f 59 d1 f3 0f 11 55 fc d9 45 fc f3 0f 11 45 fc de c1 <d9> 45 fc d9 c9 df f1 dd d8 72 02 c9 c3 c9 f3 0f 58 ca 0f 28 c1 c3
[44963.608994] RSP: 0018:ffffb987432379c0 EFLAGS: 00010286
[44963.608995] RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000000a00
[44963.608995] RDX: ffff8df1b5ff2ce4 RSI: ffff8df1b5ff2ce4 RDI: ffff8df1b5ff1ed8
[44963.608996] RBP: ffffb987432379c8 R08: ffff8df1b5ff3b98 R09: 0000000000000550
[44963.608996] R10: 0000000000000020 R11: 0000000000000000 R12: ffff8df1b5ff2030
[44963.608997] R13: 0000000000000000 R14: 0000000000000002 R15: ffff8df1b5ff1ed8
[44963.608998] FS: 0000000000000000(0000) GS:ffff8dfa3ebc0000(0000) knlGS:0000000000000000
[44963.608998] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44963.608999] CR2: 0000000000000000 CR3: 0000000fbbbb8000 CR4: 0000000000340ea0
root@pve:~#
Due to some stability issues I reinstalled my whole VM last weekend. This was to get rid of some BSODs that I was getting with my Windows 10 VM. They are still there unfortunately but things have gotten worse. It seems that when I leave my system alone for a little while I got a lock up. The second time it locked up I was able to get the following info from the log. Hopefully someone can shed some light on this. By the way, this is a total system lock up so not just my VM anymore:
[44963.608464] fpu exception: 0000 [#1] SMP NOPTI
[44963.608467] CPU: 15 PID: 24366 Comm: kworker/15:0 Tainted: P O 5.3.10-1-pve #1
[44963.608468] Hardware name: System manufacturer System Product Name/ROG STRIX X570-E GAMING, BIOS 1408 04/01/2020
[44963.608520] Workqueue: events dm_irq_work_func [amdgpu]
[44963.608564] RIP: 0010:dcn_bw_ceil2+0x40/0x60 [amdgpu]
[44963.608565] Code: 89 e5 48 83 ec 08 dd 05 6e be 19 00 f3 0f 2c c2 0f 57 d2 f3 0f 2a d0 f3 0f 59 d1 f3 0f 11 55 fc d9 45 fc f3 0f 11 45 fc de c1 <d9> 45 fc d9 c9 df f1 dd d8 72 02 c9 c3 c9 f3 0f 58 ca 0f 28 c1 c3
[44963.608566] RSP: 0018:ffffb987432379c0 EFLAGS: 00010286
[44963.608567] RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000000a00
[44963.608567] RDX: ffff8df1b5ff2ce4 RSI: ffff8df1b5ff2ce4 RDI: ffff8df1b5ff1ed8
[44963.608568] RBP: ffffb987432379c8 R08: ffff8df1b5ff3b98 R09: 0000000000000550
[44963.608569] R10: 0000000000000020 R11: 0000000000000000 R12: ffff8df1b5ff2030
[44963.608569] R13: 0000000000000000 R14: 0000000000000002 R15: ffff8df1b5ff1ed8
[44963.608570] FS: 0000000000000000(0000) GS:ffff8dfa3ebc0000(0000) knlGS:0000000000000000
[44963.608571] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44963.608571] CR2: 0000000000000000 CR3: 0000000fbbbb8000 CR4: 0000000000340ea0
[44963.608572] Call Trace:
[44963.608613] dml20_ModeSupportAndSystemConfigurationFull+0x3a7/0x5a50 [amdgpu]
[44963.608616] ? prep_new_page+0x129/0x160
[44963.608617] ? get_page_from_freelist+0x7b8/0x1370
[44963.608657] dml_get_voltage_level+0x137/0x1d0 [amdgpu]
[44963.608697] dcn20_validate_bandwidth+0x305/0x1720 [amdgpu]
[44963.608733] dc_validate_global_state+0x27e/0x330 [amdgpu]
[44963.608773] amdgpu_dm_atomic_check+0x5ad/0x7b0 [amdgpu]
[44963.608783] drm_atomic_check_only+0x462/0x830 [drm]
[44963.608790] drm_atomic_commit+0x18/0x50 [drm]
[44963.608828] dm_restore_drm_connector_state+0xfc/0x130 [amdgpu]
[44963.608865] handle_hpd_irq+0xc1/0x100 [amdgpu]
[44963.608901] dm_irq_work_func+0x53/0x70 [amdgpu]
[44963.608904] process_one_work+0x20f/0x3d0
[44963.608905] worker_thread+0x34/0x400
[44963.608906] kthread+0x120/0x140
[44963.608907] ? process_one_work+0x3d0/0x3d0
[44963.608908] ? __kthread_parkme+0x70/0x70
[44963.608910] ret_from_fork+0x22/0x40
[44963.608911] Modules linked in: veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter ax25 edac_mce_amd kvm_amd nfnetlink_log nfnetlink kvm zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel gspca_vc032x uvcvideo gspca_main videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 input_leds aes_x86_64 joydev videobuf2_common crypto_simd eeepc_wmi cryptd snd_hda_codec_realtek glue_helper videodev asus_wmi snd_hda_codec_generic snd_usb_audio sparse_keymap ledtrig_audio btusb video snd_usbmidi_lib btrtl snd_rawmidi btbcm snd_seq_device wmi_bmof mc iwlmvm btintel snd_hda_codec_hdmi bluetooth pcspkr mac80211 libarc4 snd_hda_intel ecdh_generic snd_hda_codec ecc snd_hda_core snd_hwdep snd_pcm snd_timer k10temp ccp snd soundcore iwlwifi cfg80211 mac_hid zcommon(PO) znvpair(PO) spl(O) vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core
[44963.608941] iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c hid_microsoft ff_memless hid_generic usbkbd usbmouse usbhid hid amdgpu mxm_wmi amd_iommu_v2 gpu_sched ttm i2c_piix4 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm ahci libahci igb dca i2c_algo_bit wmi
[44963.608955] ---[ end trace b56b613f7fb5520c ]---
[44963.608993] RIP: 0010:dcn_bw_ceil2+0x40/0x60 [amdgpu]
[44963.608994] Code: 89 e5 48 83 ec 08 dd 05 6e be 19 00 f3 0f 2c c2 0f 57 d2 f3 0f 2a d0 f3 0f 59 d1 f3 0f 11 55 fc d9 45 fc f3 0f 11 45 fc de c1 <d9> 45 fc d9 c9 df f1 dd d8 72 02 c9 c3 c9 f3 0f 58 ca 0f 28 c1 c3
[44963.608994] RSP: 0018:ffffb987432379c0 EFLAGS: 00010286
[44963.608995] RAX: 0000000000000004 RBX: 0000000000000000 RCX: 0000000000000a00
[44963.608995] RDX: ffff8df1b5ff2ce4 RSI: ffff8df1b5ff2ce4 RDI: ffff8df1b5ff1ed8
[44963.608996] RBP: ffffb987432379c8 R08: ffff8df1b5ff3b98 R09: 0000000000000550
[44963.608996] R10: 0000000000000020 R11: 0000000000000000 R12: ffff8df1b5ff2030
[44963.608997] R13: 0000000000000000 R14: 0000000000000002 R15: ffff8df1b5ff1ed8
[44963.608998] FS: 0000000000000000(0000) GS:ffff8dfa3ebc0000(0000) knlGS:0000000000000000
[44963.608998] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44963.608999] CR2: 0000000000000000 CR3: 0000000fbbbb8000 CR4: 0000000000340ea0
root@pve:~#