Symptoms:
When I upload something to the cephFS volume in my cluster the node i am uploading it on may hard hang at any point in that process, for example uploading an 4GB iso it didn't just after the copy operation seems to have complete.
I see no stdrr/stdout on the screen connected via KVM
The node cannot be reach on any network
I have to hard reboot.
When did this start happening?
Only after i did two things:
I am surprised there is nothing on the console to indicate the hang.
I would like to understand if the issue is:
1. the move to IPv6 (this is easy for me to test)
2. is it a general 6.5.2 kernel / ceph issue
3. is it specific to the code fixes I patched
Question
Beyond hooking up a debugger to a usb serial port is there anyway for me to capture dmesg/jouranlctl from the moment of the crash / is their a dump file on the system
(i am used to some basic windbg debugging on windows (aka just using !analyze... lol, but nothing beyond that).
If not we will do it the old fashioned way and just revert what i did.
When I upload something to the cephFS volume in my cluster the node i am uploading it on may hard hang at any point in that process, for example uploading an 4GB iso it didn't just after the copy operation seems to have complete.
I see no stdrr/stdout on the screen connected via KVM
The node cannot be reach on any network
I have to hard reboot.
When did this start happening?
Only after i did two things:
- when i rolled my own 6.5.2 linux kernel to use the thunderbolt patches
- moved to using IPv6 on ceph public/private
I am surprised there is nothing on the console to indicate the hang.
I would like to understand if the issue is:
1. the move to IPv6 (this is easy for me to test)
2. is it a general 6.5.2 kernel / ceph issue
3. is it specific to the code fixes I patched
Question
Beyond hooking up a debugger to a usb serial port is there anyway for me to capture dmesg/jouranlctl from the moment of the crash / is their a dump file on the system
(i am used to some basic windbg debugging on windows (aka just using !analyze... lol, but nothing beyond that).
If not we will do it the old fashioned way and just revert what i did.
Last edited: