Hi All, woke up to a VM in an inderterimate state causing various blackholes on my internal network.
When i figured out what VM i could ping it but i could not ssh into it, could not get the PVE console to connect to the VM output.
Tried stopping the VM and couldn't, it transitioned into an inditerminate HA state. other VMs on the same PVE node were 100% fine.
In my bleary, just woke up state i rebooted the PVE node, it would not reboot, had to do hard poweroff.
On reboot i disabled and enabled the failed VM and it started perfectly.
Looking at the historical dmesg via jouranlctl i saw the following in the VM, and wondered what on earth could have caused this.
any one have suggestions where i can look?
I saw multiple diference processes log this at 2am and then dmesg stopped logging (the tasks all had various names, it was not just this one task that failed)
When i figured out what VM i could ping it but i could not ssh into it, could not get the PVE console to connect to the VM output.
Tried stopping the VM and couldn't, it transitioned into an inditerminate HA state. other VMs on the same PVE node were 100% fine.
In my bleary, just woke up state i rebooted the PVE node, it would not reboot, had to do hard poweroff.
On reboot i disabled and enabled the failed VM and it started perfectly.
Looking at the historical dmesg via jouranlctl i saw the following in the VM, and wondered what on earth could have caused this.
any one have suggestions where i can look?
I saw multiple diference processes log this at 2am and then dmesg stopped logging (the tasks all had various names, it was not just this one task that failed)
Code:
Mar 10 02:05:21 Docker01 kernel: INFO: task glfs_posixhc:16203 blocked for more than 120 seconds.
Mar 10 02:05:21 Docker01 kernel: Not tainted 5.10.0-28-amd64 #1 Debian 5.10.209-2
Mar 10 02:05:21 Docker01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 10 02:05:21 Docker01 kernel: task:glfs_posixhc state:D stack: 0 pid:16203 ppid: 1 flags:0x00000000
Mar 10 02:05:21 Docker01 kernel: Call Trace:
Mar 10 02:05:21 Docker01 kernel: __schedule+0x282/0x870
Mar 10 02:05:21 Docker01 kernel: schedule+0x46/0xb0
Mar 10 02:05:21 Docker01 kernel: percpu_rwsem_wait+0xaf/0x190
Mar 10 02:05:21 Docker01 kernel: ? percpu_down_write+0xe0/0xe0
Mar 10 02:05:21 Docker01 kernel: __percpu_down_read+0x4d/0x60
Mar 10 02:05:21 Docker01 kernel: mnt_want_write+0x77/0xa0
Mar 10 02:05:21 Docker01 kernel: path_openat+0x1098/0x10c0
Mar 10 02:05:21 Docker01 kernel: ? finish_task_switch+0x72/0x260
Mar 10 02:05:21 Docker01 kernel: do_filp_open+0x99/0x140
Mar 10 02:05:21 Docker01 kernel: ? getname_flags.part.0+0x29/0x1b0
Mar 10 02:05:21 Docker01 kernel: ? __check_object_size+0x146/0x160
Mar 10 02:05:21 Docker01 kernel: do_sys_openat2+0x9b/0x150
Mar 10 02:05:21 Docker01 kernel: __x64_sys_openat+0x54/0xa0
Mar 10 02:05:21 Docker01 kernel: do_syscall_64+0x30/0x80
Mar 10 02:05:21 Docker01 kernel: entry_SYSCALL_64_after_hwframe+0x62/0xc7
Mar 10 02:05:21 Docker01 kernel: RIP: 0033:0x7f46fdc617e4
Mar 10 02:05:21 Docker01 kernel: RSP: 002b:00007f46eaffad70 EFLAGS: 00000297 ORIG_RAX: 0000000000000101
Mar 10 02:05:21 Docker01 kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f46fdc617e4
Mar 10 02:05:21 Docker01 kernel: RDX: 0000000000000241 RSI: 00007f46eaffb130 RDI: 00000000ffffff9c
Mar 10 02:05:21 Docker01 kernel: RBP: 00007f46eaffb130 R08: 0000000000000001 R09: 00007f46f80274c0
Mar 10 02:05:21 Docker01 kernel: R10: 00000000000001a4 R11: 0000000000000297 R12: 0000000000000241
Mar 10 02:05:21 Docker01 kernel: R13: 0000000000000134 R14: 0000000000000019 R15: 00007f46eaffaf30