Hello fine proxmox users!
First and foremost let me thank anyone involved in this project for their hard work.
I'm a very new user and have so far set up a small home NAS with proxmox and installed a few VMs.
One VM is running TrueNAS Scale, with the physical scsi controllers forwarded to the VM to allow it to natively manage the attached disks.
This seems to have worked very well.
During a large copy to a TrueNAS share, the VM seemed to have gone down.
I though this is to be expected, some instability until I've found the right configuration of the VM.
In Proxmox I saw these logs:
From this I concluded that the VM had some sort of panic and though I'd reset it and check its logs.
However, when attempting to reset the VM, I got this error messages in Proxmox:
After this, I attempted a 'stop' command, which seemed to have worked.
However, when attempting to start the VM afterwards I got this error:
This led me to conclude that the hang/error wasn't exclusively inside the VM but something had gone wrong with Proxmox.
I attempted to reboot the Proxmox machine from the web console but the machine never ended up rebooting and I had to power-cycle manually.
After starting up again, everything works as normal and I couldn't find any errors in the TrueNAS machine.
To anyone who read my post, thank you! I'm not asking anyone to solve this problem for me but perhaps give me some pointers on things to look for?
Also, does my conclusion seem accurate that the instability might not have been isolated to the VM but something in Proxmox might have gone awry?
From searching forums of similar errors, someone suggested disabling "IO Thread", is this something known to affect stability.
Thanks,
Alex
First and foremost let me thank anyone involved in this project for their hard work.
I'm a very new user and have so far set up a small home NAS with proxmox and installed a few VMs.
One VM is running TrueNAS Scale, with the physical scsi controllers forwarded to the VM to allow it to natively manage the attached disks.
This seems to have worked very well.
During a large copy to a TrueNAS share, the VM seemed to have gone down.
I though this is to be expected, some instability until I've found the right configuration of the VM.
In Proxmox I saw these logs:
Jan 25 01:16:23 vault2-proxmox pvestatd[1821]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - got timeout
Jan 25 01:16:23 vault2-proxmox pvestatd[1821]: status update time (8.044 seconds)
Jan 25 01:16:32 vault2-proxmox kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kvm:190820]
Jan 25 01:16:32 vault2-proxmox kernel: Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip>
Jan 25 01:16:32 vault2-proxmox kernel: processor_thermal_device mei_me processor_thermal_rfim processor_thermal_mbox processor_thermal_rapl mei drm_kms_hel>
Jan 25 01:16:32 vault2-proxmox kernel: CPU: 1 PID: 190820 Comm: kvm Tainted: P O 6.5.11-7-pve #1
Jan 25 01:16:32 vault2-proxmox kernel: Hardware name: retsamarret 000-T6423-NT140-0001/Default string, BIOS 5.19 02/06/2023
Jan 25 01:16:32 vault2-proxmox kernel: RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x60
Jan 25 01:16:32 vault2-proxmox kernel: Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 49 89 f0 48 89 e5 c6 07 00 0f 1f 00 41 f7 c0 00 02 00 00 74 06 fb >
Jan 25 01:16:32 vault2-proxmox kernel: RSP: 0018:ffffa9d5e183fba0 EFLAGS: 00000206
Jan 25 01:16:32 vault2-proxmox kernel: RAX: 0000000000000000 RBX: ffff9d880138cca4 RCX: 0000000000000000
Jan 25 01:16:32 vault2-proxmox kernel: RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff9d880138cca4
Jan 25 01:16:32 vault2-proxmox kernel: RBP: ffffa9d5e183fba0 R08: 0000000000000246 R09: 0000000000000000
Jan 25 01:16:32 vault2-proxmox kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Jan 25 01:16:32 vault2-proxmox kernel: R13: ffff9d880138cc28 R14: ffff9d880138cc00 R15: 0000000000000246
Jan 25 01:16:32 vault2-proxmox kernel: FS: 00007fbacad874c0(0000) GS:ffff9d8f5fe80000(0000) knlGS:0000000000000000
Jan 25 01:16:32 vault2-proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 25 01:16:32 vault2-proxmox kernel: CR2: 0000561b1923d498 CR3: 00000001075b4000 CR4: 0000000000352ee0
Jan 25 01:16:32 vault2-proxmox kernel: Call Trace:
Jan 25 01:16:32 vault2-proxmox kernel: <IRQ>
Jan 25 01:16:32 vault2-proxmox kernel: ? show_regs+0x6d/0x80
Jan 25 01:16:32 vault2-proxmox kernel: ? watchdog_timer_fn+0x1d8/0x240
Jan 25 01:16:32 vault2-proxmox kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
Jan 25 01:16:32 vault2-proxmox kernel: ? __hrtimer_run_queues+0x105/0x280
Jan 25 01:16:32 vault2-proxmox kernel: ? hrtimer_interrupt+0xf6/0x250
Jan 25 01:16:32 vault2-proxmox kernel: ? __sysvec_apic_timer_interrupt+0x5f/0x140
Jan 25 01:16:32 vault2-proxmox kernel: ? sysvec_apic_timer_interrupt+0x8d/0xd0
Jan 25 01:16:32 vault2-proxmox kernel: </IRQ>
Jan 25 01:16:32 vault2-proxmox kernel: <TASK>
Jan 25 01:16:32 vault2-proxmox kernel: ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
Jan 25 01:16:32 vault2-proxmox kernel: ? _raw_spin_unlock_irqrestore+0x21/0x60
Jan 25 01:16:32 vault2-proxmox kernel: __synchronize_hardirq+0x8a/0xd0
Jan 25 01:16:32 vault2-proxmox kernel: free_irq+0x11d/0x350
Jan 25 01:16:32 vault2-proxmox kernel: vfio_intx_set_signal+0x4f/0x1f0 [vfio_pci_core]
Jan 25 01:16:32 vault2-proxmox kernel: vfio_intx_disable+0x49/0x80 [vfio_pci_core]
Jan 25 01:16:32 vault2-proxmox kernel: vfio_pci_set_intx_trigger+0x128/0x190 [vfio_pci_core]
Jan 25 01:16:32 vault2-proxmox kernel: vfio_pci_set_irqs_ioctl+0x3b/0x130 [vfio_pci_core]
Jan 25 01:16:32 vault2-proxmox kernel: vfio_pci_core_ioctl+0x9f2/0x11a0 [vfio_pci_core]
Jan 25 01:16:32 vault2-proxmox kernel: ? vfio_pci_set_intx_unmask+0x6b/0x100 [vfio_pci_core]
Jan 25 01:16:32 vault2-proxmox kernel: ? vfio_pci_core_ioctl+0xa05/0x11a0 [vfio_pci_core]
Jan 25 01:16:32 vault2-proxmox kernel: vfio_device_fops_unl_ioctl+0x7f/0x7a0 [vfio]
Jan 25 01:16:32 vault2-proxmox kernel: ? __pm_runtime_idle+0x7b/0xd0
Jan 25 01:16:32 vault2-proxmox kernel: ? __fget_light+0xa5/0x120
Jan 25 01:16:32 vault2-proxmox kernel: __x64_sys_ioctl+0xa0/0xf0
Jan 25 01:16:32 vault2-proxmox kernel: do_syscall_64+0x58/0x90
Jan 25 01:16:32 vault2-proxmox kernel: ? exit_to_user_mode_prepare+0x39/0x190
Jan 25 01:16:32 vault2-proxmox kernel: ? syscall_exit_to_user_mode+0x37/0x60
Jan 25 01:16:32 vault2-proxmox kernel: ? do_syscall_64+0x67/0x90
Jan 25 01:16:32 vault2-proxmox kernel: ? exit_to_user_mode_prepare+0x39/0x190
Jan 25 01:16:32 vault2-proxmox kernel: ? irqentry_exit_to_user_mode+0x17/0x20
Jan 25 01:16:32 vault2-proxmox kernel: ? irqentry_exit+0x43/0x50
Jan 25 01:16:32 vault2-proxmox kernel: ? common_interrupt+0x54/0xb0
Jan 25 01:16:32 vault2-proxmox kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Jan 25 01:16:32 vault2-proxmox kernel: RIP: 0033:0x7fbacdb41b5b
Jan 25 01:16:32 vault2-proxmox kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 >
Jan 25 01:16:32 vault2-proxmox kernel: RSP: 002b:00007ffc6722d420 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 25 01:16:32 vault2-proxmox kernel: RAX: ffffffffffffffda RBX: 00005574c0ad80d0 RCX: 00007fbacdb41b5b
Jan 25 01:16:32 vault2-proxmox kernel: RDX: 00007ffc6722d480 RSI: 0000000000003b6e RDI: 000000000000002c
Jan 25 01:16:32 vault2-proxmox kernel: RBP: 00005574c0ad75b0 R08: 0000000000000000 R09: 00007fbacdc16d80
Jan 25 01:16:32 vault2-proxmox kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00005574beb9e8f8
Jan 25 01:16:32 vault2-proxmox kernel: R13: 00005574c0ad75b0 R14: 00005574beb9e908 R15: 0000000000000007
Jan 25 01:16:32 vault2-proxmox kernel: </TASK>
Jan 25 01:16:33 vault2-proxmox pvestatd[1821]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 q>
Jan 25 01:16:33 vault2-proxmox pvestatd[1821]: status update time (8.066 seconds)
Jan 25 01:16:43 vault2-proxmox pvestatd[1821]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 q>
Jan 25 01:16:43 vault2-proxmox pvestatd[1821]: status update time (8.060 seconds)
Jan 25 01:16:53 vault2-proxmox pvestatd[1821]: VM 100 qmp command failed - VM 100 qmp command 'query-proxmox-support' failed - unable to connect to VM 100 q>
Jan 25 01:16:53 vault2-proxmox pvestatd[1821]: status update time (8.066 seconds)
From this I concluded that the VM had some sort of panic and though I'd reset it and check its logs.
However, when attempting to reset the VM, I got this error messages in Proxmox:
TASK ERROR: VM 100 qmp command 'system_reset' failed - unable to connect to VM 100 qmp socket - timeout after 51 retries
After this, I attempted a 'stop' command, which seemed to have worked.
VM quit/powerdown failed - terminating now with SIGTERM
VM still running - terminating now with SIGKILL
TASK OK
However, when attempting to start the VM afterwards I got this error:
TASK ERROR: timeout waiting on systemd
This led me to conclude that the hang/error wasn't exclusively inside the VM but something had gone wrong with Proxmox.
I attempted to reboot the Proxmox machine from the web console but the machine never ended up rebooting and I had to power-cycle manually.
After starting up again, everything works as normal and I couldn't find any errors in the TrueNAS machine.
To anyone who read my post, thank you! I'm not asking anyone to solve this problem for me but perhaps give me some pointers on things to look for?
Also, does my conclusion seem accurate that the instability might not have been isolated to the VM but something in Proxmox might have gone awry?
From searching forums of similar errors, someone suggested disabling "IO Thread", is this something known to affect stability.
Thanks,
Alex