Hello,
I recently ran into a new issue and not sure what would be the cause or where to start looking. I have a proxmox 7.4-18 cluster that communicates with a separate ceph reef cluster. Normally no issues, but recently had some drives start going bad on ceph, which resulted in getting slow ops warnings. Not usually a big deal and it's been only 1 or 2 slow ops and went away quickly.
Now onto the issue. It locked up all I/O on the RBD mount in the VM, Ubuntu server if it matters. Stopped the VM, but the slice was still there and had to reboot the proxmox server to start the VM. Status on qemu just show #.scope - kvm. pvesm status show running, ceph cluster shows healthy and I/O resumes find after restarting the server and VM. dmesg didn't show anything on VM or proxmox box.
I've had slow ops in the past, but never had it lock up a VM and now it's happened twice. Using KRBD and it's a EC4-2 pool. I'm sure I'm missing something, but wanted to see if anyone had any ideas or suggestions on cause or additional troubleshooting steps if it happens again.
Thanks
I recently ran into a new issue and not sure what would be the cause or where to start looking. I have a proxmox 7.4-18 cluster that communicates with a separate ceph reef cluster. Normally no issues, but recently had some drives start going bad on ceph, which resulted in getting slow ops warnings. Not usually a big deal and it's been only 1 or 2 slow ops and went away quickly.
Now onto the issue. It locked up all I/O on the RBD mount in the VM, Ubuntu server if it matters. Stopped the VM, but the slice was still there and had to reboot the proxmox server to start the VM. Status on qemu just show #.scope - kvm. pvesm status show running, ceph cluster shows healthy and I/O resumes find after restarting the server and VM. dmesg didn't show anything on VM or proxmox box.
I've had slow ops in the past, but never had it lock up a VM and now it's happened twice. Using KRBD and it's a EC4-2 pool. I'm sure I'm missing something, but wanted to see if anyone had any ideas or suggestions on cause or additional troubleshooting steps if it happens again.
Thanks