Okay so I managed to move things around a little earlier than expected, and so the new information from me is:
- I stopped the only VM on this machine (my storage VM providing backup space for the proxmox cluster), so no workloads are running on the node
- i did "echo 1 > /proc/sys/kernel/sysrq " and "echo t > /proc/sysrq-trigger" with remote syslog enabled
I ran one "echo t" initially, then started the zpool scrub, then managed to run "echo t" several more times before everything started falling apart on the box (and then I had to power cycle it, it returns to service just fine and shows nothing about the zpool scrub at all - so I guess the scrub actually never really starts). Trying to run any process on the box at this point that requires disk I/O just stalls and cannot be returned from. Anything in memory/already running that doesn't need disk I/O seems to keep running just fine.
The log is attached - I annotated each section with ">>>" to show where the sysrq-triggers were fired, so hopefully this provides some useful info.
The last "section" of the log shows processes getting stuck because of no I/O i think. It's almost exactly as @sandor mentioned, it's like the box lost all disks.
edit: forgot to say, this server is fully up to date (with a community subscription)
- I stopped the only VM on this machine (my storage VM providing backup space for the proxmox cluster), so no workloads are running on the node
- i did "echo 1 > /proc/sys/kernel/sysrq " and "echo t > /proc/sysrq-trigger" with remote syslog enabled
I ran one "echo t" initially, then started the zpool scrub, then managed to run "echo t" several more times before everything started falling apart on the box (and then I had to power cycle it, it returns to service just fine and shows nothing about the zpool scrub at all - so I guess the scrub actually never really starts). Trying to run any process on the box at this point that requires disk I/O just stalls and cannot be returned from. Anything in memory/already running that doesn't need disk I/O seems to keep running just fine.
The log is attached - I annotated each section with ">>>" to show where the sysrq-triggers were fired, so hopefully this provides some useful info.
The last "section" of the log shows processes getting stuck because of no I/O i think. It's almost exactly as @sandor mentioned, it's like the box lost all disks.
edit: forgot to say, this server is fully up to date (with a community subscription)
Attachments
Last edited: