VM using ISCSI dies randomly

tymscar

New Member
Jan 5, 2023
2
0
1
28
tymscar.com
Hey there. I have a very peculiar problem that I can find some info about online but not much.

I run a proxmos server on which I host half a dozen or so VMs. All but one of them use the built in SSD in the proxmox server, but this one VM runs a service that needs a lot of storage, so I opted in using an iSCSI drive from my NAS. It's the only consumer of said drive, and it works great until, once every 6-12 hours or so, it kernel panics. It does not reboot, it just hangs there.

Now I have tried everything, I messed about with the iscsi settings, nothing really helped there. I tried to write a watchdog in python that runs 24/7 in a container and pings this server every couple of minutes, and if it sees it offline it reboots it. That also doesent really work as I have 2fa on my proxmox instance and even though the library I use says it keeps logged in using a key instead of the code, after the first connection it still seems to die sometime after a couple of hours, and with it dead, theres nothing to reboot the VM with the iscsi drive that I am concered about.

This is an example of what the console looks like when the VM is fully locked up.

To be honest I am a bit lost, if you have any ideas what else to try or do, or perhaps an alternative to iscsi that would offer me the same benefits, im all ears.

Thank you in advance!
 
Its extremely unlikely that iSCSI in particular "makes your VM crash". The VM needs much more space because its doing a task that is different from other VMs. This tasks involves resources that somehow lead to an in-VM kernel crash. Since you have not posted your PVE or VM software versions, I am going to assume they are up to date. If they are not - upgrade everything.

You have also not indicated how you use iSCSI - is the VM a direct consumer, or is PVE the initiator and then passes the disk to VM? Not that this changes the fact that in all likelihood the issue is not caused by iSCSI.

Next steps would involve trying to reduce the number of variables. You can stop services, remove unnecessary hardware (floppy?), etc. Further, there are many guides online on how to capture kernel dump in full. One way is to redirect console to a file.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Thanks for answering!

I can confirm I am on the latest version of proxmox as well as the packages and OSs within!

Sorry for not being more specific, if there is anything else that might help just ask and Ill try to provide!

So the iscsi is in fact connected straight to proxmox and the vm is just using it from there. I did it this way because at first I thought I might want to have multiple VMs accessing it. Ended up not doing that.

I did have this same setup running before without iscsi and it was fine, but it was also not on proxmox but rather in a qemu kvm vm on Arch Linux. Thats why I thought iscsi was the issue. That and googling for some of my errors would show others blaming iscsi.
I have removed mostly everything thats not necessary otherwise.

I checked for the dmesg logs after rebooting, theres like 6-7 of them always saved up, but there is no error anywhere to be found there. Perhaps it panics before it has time to write it?
 
I checked for the dmesg logs after rebooting, theres like 6-7 of them always saved up, but there is no error anywhere to be found there. Perhaps it panics before it has time to write it?
yes, the chances of a log message preceding a kernel crash are 50/50, which is why you should research enabling kernel dumps, kernel console debug etc. You need more information, at the very least the full panic message (yours had header cut off).

This will be a useful google search "how to debug kernel panic".

Since this VM is the only storage consumer, then moving iSCSI inside the VM should be relatively easy and would cut out multiple extra layers, reducing the debug surface.



Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox