proxmox doing strange reboots / getting stuck

escaparrac

New Member
Jun 2, 2022
23
4
3
I've been trying to install a win10 VM for some days. It's been nearly impossible.

Every time I need to reboot the VM or shutdown, proxmox gets stuck. I can't stop or reset the vm, even with the unlock command. After some minutes, proxmox disconnects and just dies with no output on the display. I need to hard-reset the machine after that.

When the reset happens, the CTs and the TrueNas VM take like 5 minutes to boot up.

Image when win10 vm gets stuck:

1660681526461.png
win10vm config.
1660681857701.png

image when booting up
1660681750502.png


Code:
pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.39-3-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-8
pve-kernel-helper: 7.2-8
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.39-3-pve: 5.15.39-3
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 15.2.15-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-7
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-11
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1


I hope you can help me diagnose. Should I do a proxmox clean install?

I can backup the CTs and just assign the IPs again. The problem might be the TrueNas VM, but as far as I know, it recognizes the created pools from the disks right?
 
journalctl when it died (I guess)

Code:
Aug 16 22:32:55 pve login[21459]: pam_unix(login:session): session opened for user r>

Aug 16 22:32:55 pve pvedaemon[21451]: starting termproxy UPID:pve:000053CB:000073B3:>
Aug 16 22:32:55 pve pvedaemon[2408]: <root@pam> successful auth for user 'root@pam'
Aug 16 22:32:55 pve pvedaemon[2408]: <root@pam> successful auth for user 'root@pam'
Aug 16 22:30:41 pve kernel: br-1261d231e4f7: port 1(veth52d00a2) entered disabled st>
Aug 16 22:28:07 pve smartd[1890]: Device: /dev/nvme0, opened
Aug 16 22:28:07 pve smartd[1890]: Device: /dev/nvme0, Samsung SSD 980 500GB, S/N:S64>
Aug 16 22:28:07 pve smartd[1890]: Device: /dev/nvme0, is SMART capable. Adding to "m>
Aug 16 22:28:07 pve smartd[1890]: Device: /dev/nvme0, state read from /var/lib/smart>
Aug 16 22:28:07 pve smartd[1890]: Monitoring 4 ATA/SATA, 0 SCSI/SAS and 1 NVMe devic>
Aug 16 22:28:07 pve systemd-udevd[1327]: Using default interface naming scheme 'v247>
Aug 16 22:28:07 pve systemd-udevd[1327]: ethtool: autonegotiation is unset or enable>
Aug 16 22:28:07 pve kernel: vmbr0: port 1(enp2s0) entered blocking state
Aug 16 22:28:07 pve kernel: vmbr0: port 1(enp2s0) entered disabled state
Aug 16 22:28:07 pve kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY d>
Aug 16 22:28:07 pve smartd[1890]: Device: /dev/sda [SAT], state written to /var/lib/>
Aug 16 22:28:07 pve smartd[1890]: Device: /dev/sdb [SAT], state written to /var/lib/>
Aug 16 22:28:07 pve smartd[1890]: Device: /dev/sdc [SAT], state written to /var/lib/>
Aug 16 22:28:07 pve smartd[1890]: Device: /dev/sdd [SAT], state written to /var/lib/>
Aug 16 22:28:07 pve smartd[1890]: Device: /dev/nvme0, state written to /var/lib/smar>
Aug 16 22:28:07 pve systemd[1]: Started Self Monitoring and Reporting Technology (SM>
Aug 16 22:28:07 pve kernel: device enp2s0 entered promiscuous mode
Aug 16 22:28:07 pve kernel: r8169 0000:02:00.0 enp2s0: Link is Down
Aug 16 22:28:07 pve kernel: vmbr0: port 1(enp2s0) entered blocking state
Aug 16 22:28:07 pve kernel: vmbr0: port 1(enp2s0) entered forwarding state
Aug 16 22:28:07 pve systemd[1]: Finished Network initialization.
Aug 16 22:28:07 pve systemd[1]: Reached target Network.
Aug 16 22:28:07 pve systemd[1]: Reached target Network is Online.
Aug 16 22:28:05 pve kernel: [drm:drm_core_init [drm]] Initialized
Aug 16 22:28:05 pve kernel: RPC: Registered named UNIX socket transport module.
Aug 16 22:28:05 pve kernel: RPC: Registered udp transport module.
Aug 16 22:28:05 pve kernel: RPC: Registered tcp transport module.
Aug 16 22:28:05 pve kernel: RPC: Registered tcp NFSv4.1 backchannel transport module.
Aug 16 22:28:05 pve systemd[1]: modprobe@drm.service: Succeeded.
Aug 16 22:28:05 pve systemd[1]: Finished Load Kernel Module drm.
Aug 16 22:28:05 pve systemd[1]: Mounted RPC Pipe File System.
Aug 16 22:28:05 pve systemd[1]: Finished Create System Users.
Aug 16 22:28:05 pve systemd[1]: Starting Create Static Device Nodes in /dev...
Aug 16 22:28:05 pve systemd[1]: Finished Load/Save Random Seed.
Aug 16 22:28:05 pve systemd[1]: Condition check resulted in First Boot Complete being skipped.
Aug 16 22:28:05 pve kernel: thermal_sys: Registered thermal governor 'fair_share'
Aug 16 22:28:05 pve kernel: thermal_sys: Registered thermal governor 'bang_bang'
Aug 16 22:28:05 pve kernel: thermal_sys: Registered thermal governor 'step_wise'
Aug 16 22:28:05 pve kernel: thermal_sys: Registered thermal governor 'user_space'
Aug 16 22:28:05 pve kernel: thermal_sys: Registered thermal governor 'power_allocator'
Aug 16 22:28:05 pve kernel: EISA bus registered
Aug 16 22:28:05 pve kernel: cpuidle: using governor ladder
Aug 16 22:28:05 pve kernel: cpuidle: using governor menu
Aug 16 22:28:05 pve kernel: ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
Aug 16 22:28:05 pve kernel: ACPI: bus type PCI registered
Aug 16 22:28:05 pve kernel: acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
Aug 16 22:28:05 pve kernel: PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
Aug 16 22:28:05 pve kernel: PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved in E820
Aug 16 22:28:05 pve kernel: PCI: Using configuration type 1 for base access
Aug 16 22:28:05 pve kernel: Kprobes globally optimized
Aug 16 22:28:05 pve kernel: HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
Aug 16 22:28:05 pve kernel: HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
Aug 16 22:28:05 pve kernel: fbcon: Taking over console
Aug 16 22:28:05 pve kernel: ACPI: Added _OSI(Module Device)
Aug 16 22:28:05 pve kernel: ACPI: Added _OSI(Processor Device)
Aug 16 22:28:05 pve kernel: ACPI: Added _OSI(3.0 _SCP Extensions)
Aug 16 22:28:05 pve kernel: ACPI: Added _OSI(Processor Aggregator Device)
Aug 16 22:28:05 pve kernel: ACPI: Added _OSI(Linux-Dell-Video)
Aug 16 22:26:36 pve kernel:  __purge_vmap_area_lazy+0xb9/0x700
Aug 16 22:26:36 pve kernel:  ? __cond_resched+0x1a/0x50
Aug 16 22:26:36 pve kernel:  free_vmap_area_noflush+0x2ef/0x330
Aug 16 22:26:36 pve kernel:  remove_vm_area+0x9e/0xb0
Aug 16 22:26:36 pve kernel:  __vunmap+0x93/0x2a0
Aug 16 22:26:36 pve kernel:  free_work+0x25/0x40
Aug 16 22:26:36 pve kernel:  process_one_work+0x228/0x3d0
Aug 16 22:26:36 pve kernel:  worker_thread+0x53/0x420
Aug 16 22:26:36 pve kernel:  ? process_one_work+0x3d0/0x3d0
Aug 16 22:26:36 pve kernel:  kthread+0x127/0x150
Aug 16 22:26:36 pve kernel:  ? set_kthread_struct+0x50/0x50
Aug 16 22:26:36 pve kernel:  ret_from_fork+0x1f/0x30
Aug 16 22:26:36 pve kernel:  </TASK>
 
journalctl part2
Code:
Aug 16 22:32:55 pve login[21459]: pam_unix(login:session): session opened for user r>

Aug 16 22:27:04 pve kernel: watchdog: BUG: soft lockup - CPU#6 stuck for 152s! [kworker/6:3:7716]
Aug 16 22:27:04 pve kernel: Modules linked in: tcp_diag inet_diag nf_conntrack_netlink xt_MASQUERADE xfrm_user xfrm_algo xt_addrtype iptable_nat overlay xt_state xt_conntrack xt_t>
Aug 16 22:27:04 pve kernel:  rapl intel_cstate intel_wmi_thunderbolt gigabyte_wmi wmi_bmof pcspkr efi_pstore snd ee1004 mei_me soundcore mei parport_pc parport mac_hid acpi_pad ac>
Aug 16 22:27:04 pve kernel: CPU: 6 PID: 7716 Comm: kworker/6:3 Tainted: P        W  O L    5.15.39-3-pve #2
Aug 16 22:27:04 pve kernel: Hardware name: Gigabyte Technology Co., Ltd. B560M DS3H V2/B560M DS3H V2, BIOS F7 03/25/2022
Aug 16 22:27:04 pve kernel: Workqueue: events free_work
Aug 16 22:27:04 pve kernel: RIP: 0010:smp_call_function_many_cond+0x13c/0x360
Aug 16 22:27:04 pve kernel: Code: 01 41 89 c4 73 2d 4d 63 ec 48 8b 13 49 81 fd ff 1f 00 00 0f 87 e3 01 00 00 4a 03 14 ed e0 ca 4b a3 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75>
Aug 16 22:27:04 pve kernel: RSP: 0018:ffffad61ce417cb0 EFLAGS: 00000202
Aug 16 22:27:04 pve kernel: RAX: 0000000000000011 RBX: ffff8d2f1c3b1b80 RCX: 0000000000000004
Aug 16 22:27:04 pve kernel: RDX: ffff8d2f1c337a60 RSI: 0000000000000000 RDI: ffff8d27800679f8
Aug 16 22:27:04 pve kernel: RBP: ffffad61ce417d18 R08: 0000000000000000 R09: 0000000000000000
Aug 16 22:27:04 pve kernel: R10: 0000000000000004 R11: fffffffffffffff0 R12: 0000000000000004
Aug 16 22:27:04 pve kernel: R13: 0000000000000004 R14: 0000000000000001 R15: 000000000000000c
Aug 16 22:27:04 pve kernel: FS:  0000000000000000(0000) GS:ffff8d2f1c380000(0000) knlGS:0000000000000000
Aug 16 22:27:04 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 16 22:27:04 pve kernel: CR2: 00007fd590806670 CR3: 00000007eae10005 CR4: 00000000003726e0
Aug 16 22:27:04 pve kernel: Call Trace:
Aug 16 22:27:04 pve kernel:  <TASK>
Aug 16 22:27:04 pve kernel:  ? __flush_tlb_all+0x30/0x30
Aug 16 22:27:04 pve kernel:  on_each_cpu_cond_mask+0x22/0x30
Aug 16 22:27:04 pve kernel:  flush_tlb_kernel_range+0x41/0xa0
Aug 16 22:27:04 pve kernel:  __purge_vmap_area_lazy+0xb9/0x700
Aug 16 22:27:04 pve kernel:  ? __cond_resched+0x1a/0x50
Aug 16 22:27:04 pve kernel:  free_vmap_area_noflush+0x2ef/0x330
Aug 16 22:27:04 pve kernel:  remove_vm_area+0x9e/0xb0
Aug 16 22:27:04 pve kernel:  __vunmap+0x93/0x2a0
Aug 16 22:27:04 pve kernel:  free_work+0x25/0x40
Aug 16 22:27:04 pve kernel:  process_one_work+0x228/0x3d0
Aug 16 22:27:04 pve kernel:  worker_thread+0x53/0x420
Aug 16 22:27:04 pve kernel:  ? process_one_work+0x3d0/0x3d0
Aug 16 22:27:04 pve kernel:  kthread+0x127/0x150
Aug 16 22:27:04 pve kernel:  ? set_kthread_struct+0x50/0x50
Aug 16 22:27:04 pve kernel:  ret_from_fork+0x1f/0x30
Aug 16 22:27:04 pve kernel:  </TASK>
Aug 16 22:27:20 pve kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 205s! [kworker/7:1:173]
Aug 16 22:27:20 pve kernel: Modules linked in: tcp_diag inet_diag nf_conntrack_netlink xt_MASQUERADE xfrm_user xfrm_algo xt_addrtype iptable_nat overlay xt_state xt_conntrack xt_t>
Aug 16 22:27:20 pve kernel:  rapl intel_cstate intel_wmi_thunderbolt gigabyte_wmi wmi_bmof pcspkr efi_pstore snd ee1004 mei_me soundcore mei parport_pc parport mac_hid acpi_pad ac>
Aug 16 22:27:20 pve kernel: CPU: 7 PID: 173 Comm: kworker/7:1 Tainted: P        W  O L    5.15.39-3-pve #2
Aug 16 22:27:20 pve kernel: Hardware name: Gigabyte Technology Co., Ltd. B560M DS3H V2/B560M DS3H V2, BIOS F7 03/25/2022
Aug 16 22:27:20 pve kernel: Workqueue: events netstamp_clear
Aug 16 22:27:20 pve kernel: RIP: 0010:smp_call_function_many_cond+0x13c/0x360
Aug 16 22:27:20 pve kernel: Code: 01 41 89 c4 73 2d 4d 63 ec 48 8b 13 49 81 fd ff 1f 00 00 0f 87 e3 01 00 00 4a 03 14 ed e0 ca 4b a3 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75>
Aug 16 22:27:20 pve kernel: RSP: 0018:ffffad61c0753cd8 EFLAGS: 00000202
Aug 16 22:27:20 pve kernel: RAX: 0000000000000011 RBX: ffff8d2f1c3f1b80 RCX: 0000000000000004
Aug 16 22:27:20 pve kernel: RDX: ffff8d2f1c337a80 RSI: 0000000000000000 RDI: ffff8d2780067378
Aug 16 22:27:20 pve kernel: RBP: ffffad61c0753d40 R08: 0000000000000000 R09: 0000000000000000
Aug 16 22:27:20 pve kernel: R10: 0000000000000004 R11: fffffffffffffff0 R12: 0000000000000004
Aug 16 22:27:20 pve kernel: R13: 0000000000000004 R14: 0000000000000001 R15: 000000000000000c
Aug 16 22:27:20 pve kernel: FS:  0000000000000000(0000) GS:ffff8d2f1c3c0000(0000) knlGS:0000000000000000
Aug 16 22:27:20 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 16 22:27:20 pve kernel: CR2: 00007f64ac207e2c CR3: 00000007eae10005 CR4: 00000000003726e0
Aug 16 22:27:20 pve kernel: Call Trace:
Aug 16 22:27:20 pve kernel:  <TASK>
Aug 16 22:27:20 pve kernel:  ? text_poke_loc_init+0x190/0x190
Aug 16 22:27:20 pve kernel:  on_each_cpu_cond_mask+0x22/0x30
Aug 16 22:27:20 pve kernel:  text_poke_bp_batch+0xb4/0x270
Aug 16 22:27:20 pve kernel:  text_poke_finish+0x1f/0x40
Aug 16 22:27:20 pve kernel:  arch_jump_label_transform_apply+0x1a/0x30
Aug 16 22:27:20 pve kernel:  __jump_label_update+0x126/0x140
Aug 16 22:27:20 pve kernel:  jump_label_update+0xba/0xe0
Aug 16 22:27:20 pve kernel:  static_key_enable_cpuslocked+0x77/0xa0
Aug 16 22:27:20 pve kernel:  static_key_enable+0x1b/0x30
Aug 16 22:27:20 pve kernel:  netstamp_clear+0x2d/0x40
Aug 16 22:27:20 pve kernel:  process_one_work+0x228/0x3d0
Aug 16 22:27:20 pve kernel:  worker_thread+0x53/0x420
Aug 16 22:27:20 pve kernel:  ? process_one_work+0x3d0/0x3d0
Aug 16 22:27:20 pve kernel:  kthread+0x127/0x150
Aug 16 22:27:20 pve kernel:  ? set_kthread_struct+0x50/0x50
Aug 16 22:27:20 pve kernel:  ret_from_fork+0x1f/0x30
Aug 16 22:27:20 pve kernel:  </TASK>
Aug 16 22:27:22 pve kernel: rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
Aug 16 22:27:22 pve kernel: rcu:         4-...0: (1 GPs behind) idle=ab3/1/0x4000000000000000 softirq=15910/15911 fqs=29142
Aug 16 22:27:22 pve kernel:         (detected by 0, t=60007 jiffies, g=49825, q=37754)
Aug 16 22:27:22 pve kernel: Sending NMI from CPU 0 to CPUs 4:
Aug 16 22:27:22 pve kernel: NMI backtrace for cpu 4
Aug 16 22:27:22 pve kernel: CPU: 4 PID: 20417 Comm: kvm Tainted: P        W  O L    5.15.39-3-pve #2
Aug 16 22:27:22 pve kernel: Hardware name: Gigabyte Technology Co., Ltd. B560M DS3H V2/B560M DS3H V2, BIOS F7 03/25/2022
Aug 16 22:27:22 pve kernel: RIP: 0010:qi_submit_sync+0x323/0x5c0
Aug 16 22:27:22 pve kernel: Code: fa 39 55 a0 0f 84 5f 02 00 00 41 f6 c5 40 0f 85 98 01 00 00 41 83 e5 20 0f 85 72 01 00 00 4c 89 ff c6 07 00 0f 1f 40 00 f3 90 <e8> 08 21 54 00 49>
Aug 16 22:27:22 pve kernel: RSP: 0018:ffffad61cfb67a68 EFLAGS: 00000046
Aug 16 22:27:22 pve kernel: RAX: ffffad61c0067000 RBX: 0000000000000004 RCX: ffff8d2780051c00
Aug 16 22:27:22 pve kernel: RDX: ffff8d2780051c00 RSI: 00000000000000ab RDI: ffff8d278021b880
Aug 16 22:27:22 pve kernel: RBP: ffffad61cfb67b10 R08: 00000000000002ac R09: ffff8d278021b880
Aug 16 22:27:22 pve kernel: R10: 0000000000000010 R11: 0000000000000004 R12: 00000000000002ac
Aug 16 22:27:22 pve kernel: R13: 0000000000000000 R14: ffff8d2780183000 R15: ffff8d278021b880
Aug 16 22:27:22 pve kernel: FS:  00007fd7528be1c0(0000) GS:ffff8d2f1c300000(0000) knlGS:0000000000000000
Aug 16 22:27:22 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 16 22:27:22 pve kernel: CR2: 0000563a15e065e8 CR3: 0000000409178005 CR4: 00000000003726e0
Aug 16 22:27:22 pve kernel: Call Trace:
Aug 16 22:27:22 pve kernel:  <TASK>
Aug 16 22:27:22 pve kernel:  qi_flush_iotlb+0x84/0xa0
Aug 16 22:27:22 pve kernel:  iommu_flush_iotlb_psi+0xca/0x1d0
Aug 16 22:27:22 pve kernel:  intel_iommu_tlb_sync+0xbd/0x130
Aug 16 22:27:22 pve kernel:  vfio_sync_unpin.isra.0+0x30/0xe0 [vfio_iommu_type1]
Aug 16 22:27:22 pve kernel:  vfio_unmap_unpin+0x202/0x350 [vfio_iommu_type1]
Aug 16 22:27:22 pve kernel:  vfio_remove_dma+0x31/0xe0 [vfio_iommu_type1]
Aug 16 22:27:22 pve kernel:  vfio_iommu_type1_ioctl+0xf91/0x1730 [vfio_iommu_type1]
Aug 16 22:27:22 pve kernel:  ? __check_object_size+0x14f/0x160
Aug 16 22:27:22 pve kernel:  ? _copy_to_user+0x20/0x30
Aug 16 22:27:22 pve kernel:  ? kvm_vm_ioctl+0x304/0xf70 [kvm]
Aug 16 22:27:22 pve kernel:  vfio_fops_unl_ioctl+0x68/0x2a0 [vfio]
Aug 16 22:27:22 pve kernel:  __x64_sys_ioctl+0x92/0xd0
Aug 16 22:27:22 pve kernel:  do_syscall_64+0x59/0xc0
Aug 16 22:27:22 pve kernel:  ? exit_to_user_mode_prepare+0x90/0x1b0
Aug 16 22:27:22 pve kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Aug 16 22:27:22 pve kernel:  ? do_syscall_64+0x69/0xc0
Aug 16 22:27:22 pve kernel:  ? syscall_exit_to_user_mode+0x27/0x50
Aug 16 22:27:22 pve kernel:  ? do_syscall_64+0x69/0xc0
Aug 16 22:27:22 pve kernel:  entry_SYSCALL_64_after_hwframe+0x61/0xcb
Aug 16 22:27:22 pve kernel: RIP: 0033:0x7fd75d28dcc7
Aug 16 22:27:22 pve kernel: Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff>
Aug 16 22:27:22 pve kernel: RSP: 002b:00007ffebf80f408 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 16 22:27:22 pve kernel: RAX: ffffffffffffffda RBX: 0000563a1670f4c0 RCX: 00007fd75d28dcc7
Aug 16 22:27:22 pve kernel: RDX: 00007ffebf80f420 RSI: 0000000000003b72 RDI: 000000000000001d
Aug 16 22:27:22 pve kernel: RBP: 00007ffebf80f420 R08: ffffffffffffffff R09: 00000000f0000000
Aug 16 22:27:22 pve kernel: R10: 00000000f0000000 R11: 0000000000000246 R12: 00000000f0000000
Aug 16 22:27:22 pve kernel: R13: 00007ffebf80f410 R14: 0000563a1670f4c0 R15: 0000000000000000
Aug 16 22:27:22 pve kernel:  </TASK>
Aug 16 22:27:32 pve kernel: watchdog: BUG: soft lockup - CPU#6 stuck for 179s! [kworker/6:3:7716]
Aug 16 22:27:32 pve kernel: Modules linked in: tcp_diag inet_diag nf_conntrack_netlink xt_MASQUERADE xfrm_user xfrm_algo xt_addrtype iptable_nat overlay xt_state xt_conntrack xt_t>
Aug 16 22:27:32 pve kernel:  rapl intel_cstate intel_wmi_thunderbolt gigabyte_wmi wmi_bmof pcspkr efi_pstore snd ee1004 mei_me soundcore mei parport_pc parport mac_hid acpi_pad ac>
Aug 16 22:27:32 pve kernel: CPU: 6 PID: 7716 Comm: kworker/6:3 Tainted: P        W  O L    5.15.39-3-pve #2
Aug 16 22:27:32 pve kernel: Hardware name: Gigabyte Technology Co., Ltd. B560M DS3H V2/B560M DS3H V2, BIOS F7 03/25/2022
Aug 16 22:27:32 pve kernel: Workqueue: events free_work
Aug 16 22:27:32 pve kernel: RIP: 0010:smp_call_function_many_cond+0x13c/0x360
Aug 16 22:27:32 pve kernel: Code: 01 41 89 c4 73 2d 4d 63 ec 48 8b 13 49 81 fd ff 1f 00 00 0f 87 e3 01 00 00 4a 03 14 ed e0 ca 4b a3 8b 42 08 a8 01 74 09 f3 90 <8b> 42 08 a8 01 75>
Aug 16 22:27:32 pve kernel: RSP: 0018:ffffad61ce417cb0 EFLAGS: 00000202
Aug 16 22:27:32 pve kernel: RAX: 0000000000000011 RBX: ffff8d2f1c3b1b80 RCX: 0000000000000004
Aug 16 22:27:32 pve kernel: RDX: ffff8d2f1c337a60 RSI: 0000000000000000 RDI: ffff8d27800679f8
Aug 16 22:27:32 pve kernel: RBP: ffffad61ce417d18 R08: 0000000000000000 R09: 0000000000000000
Aug 16 22:27:32 pve kernel: R10: 0000000000000004 R11: fffffffffffffff0 R12: 0000000000000004
Aug 16 22:27:32 pve kernel: R13: 0000000000000004 R14: 0000000000000001 R15: 000000000000000c
Aug 16 22:27:32 pve kernel: FS:  0000000000000000(0000) GS:ffff8d2f1c380000(0000) knlGS:0000000000000000
Aug 16 22:27:32 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 16 22:27:32 pve kernel: CR2: 00007fd590806670 CR3: 00000007eae10005 CR4: 00000000003726e0
Aug 16 22:27:32 pve kernel: Call Trace:
Aug 16 22:27:32 pve kernel:  <TASK>
Aug 16 22:27:32 pve kernel:  ? __flush_tlb_all+0x30/0x30
Aug 16 22:27:32 pve kernel:  on_each_cpu_cond_mask+0x22/0x30
Aug 16 22:27:32 pve kernel:  flush_tlb_kernel_range+0x41/0xa0
Aug 16 22:27:32 pve kernel:  __purge_vmap_area_lazy+0xb9/0x700
Aug 16 22:27:32 pve kernel:  ? __cond_resched+0x1a/0x50
Aug 16 22:27:32 pve kernel:  free_vmap_area_noflush+0x2ef/0x330
Aug 16 22:27:32 pve kernel:  remove_vm_area+0x9e/0xb0
Aug 16 22:27:32 pve kernel:  __vunmap+0x93/0x2a0
Aug 16 22:27:32 pve kernel:  free_work+0x25/0x40
Aug 16 22:27:32 pve kernel:  process_one_work+0x228/0x3d0
Aug 16 22:27:32 pve kernel:  worker_thread+0x53/0x420
Aug 16 22:27:32 pve kernel:  ? process_one_work+0x3d0/0x3d0
Aug 16 22:27:32 pve kernel:  kthread+0x127/0x150
Aug 16 22:27:32 pve kernel:  ? set_kthread_struct+0x50/0x50
Aug 16 22:27:32 pve kernel:  ret_from_fork+0x1f/0x30
Aug 16 22:27:32 pve kernel:  </TASK>
Aug 16 22:27:35 pve systemd-logind[1751]: Power key pressed.
Aug 16 22:27:35 pve systemd-logind[1751]: Powering Off...
Aug 16 22:27:35 pve systemd-logind[1751]: System is powering down.
 
your MB is a consumer board.. try updating your BIOS and disabling all power saving/cstate/.. features in the bios.
 
your MB is a consumer board.. try updating your BIOS and disabling all power saving/cstate/.. features in the bios.
Hi. I updated everything.

It looked a bit more stable for some time, but still, the win10 machine is getting dumb.

I tried to install a new one, but everything falls in the same bucket.

The VM looks like it's started, using RAM and CPU, but I can't ping, ssh or RDP. I can't stop the VM either, and I have to reset manually my proxmox server to be able to start the vm again.

1662017926881.png

Any ideas? It's just the win10 vm that it's not working; all the rest are perfect and fast.

EDIT: I could stop de VM, but when starting it this error came:


Task viewer: VM 102 - Start OutputStatus Stop mdev instance '00000000-0000-0000-0000-000000000102' already existed, using it. TASK ERROR: timeout waiting on systemd

1662019170858.png
It says that both mdev are used, but I don't know where. I have an lxc with the card passed trough, but it should be using it.
 
Last edited:
It says that both mdev are used, but I don't know where. I have an lxc with the card passed trough, but it should be using it.
seems the cleanup is/could not run...

does the vm work when you remove the passthrough?
 
seems the cleanup is/could not run...

does the vm work when you remove the passthrough?
I can't remove the passthrough while it's "locked". If I restart the machine, it works. I changed the mdev and I could boot again.

After 2 days, it's locked again In this state. RAM and CPU usage, but no ssh, RDP or vnc connection to the VM.
1662215653764.png
1662215726439.png
I think the case might be that I am passing my iGPU to a docker container too, but I don't know how to pass just one of the mdev o to do it not destroying the win10vm?

But also, not being able to access to the win10 vm by ssh makes me wonder if that would be the case.

Is there any command I could do to send you some status / logs that are useful for this?
 
UPDATE:

I removed the pci mdev from the igpu (the win10 one).

The machine is working correctly now, but I wanted to have 1 mdev there and 1 mdev in the docker lxc.

Is it possible to pass only 1 mdev to the lxc and not the whole pci card?

I will wait 24h more and then, I will try to bring back the mdev and take the pci passthrough from the lxc docker.
 
Is it possible to pass only 1 mdev to the lxc and not the whole pci card?
well you don't exactly 'pass through the whole pci card' to the container (thats not really possible), but only the device nodes (/dev/drm, etc) , which are part of the host driver
and that should work (1 mdev for vm, and the host/container doing also things with the device)

i am not aware how a container could use a single mdev, but you could move your docker container into a vm?
 
well you don't exactly 'pass through the whole pci card' to the container (thats not really possible), but only the device nodes (/dev/drm, etc) , which are part of the host driver
and that should work (1 mdev for vm, and the host/container doing also things with the device)

i am not aware how a container could use a single mdev, but you could move your docker container into a vm?
That would be cool, but I don't think I have the skills to do it. I've checked some guides but I'm not sure about the process...

Any suggestions? I would love to move my docker lxc to a debian or ubuntu vm.

In the end, that LXC is a Debian 10 with docker+portainer. I only made some mounts in fstab and created some users. I don't know if I can move easily de docker+portainer config to a new VM or if I need to do something special.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!