How to restart VM when no IP address can be obtained

bmatt

New Member
Dec 9, 2024
2
0
1
Hi all :)

I am pretty new to Proxmox. I am running a test network with PfSense as Gateway in a Proxmox VM. Everything is running well but yesterday it happened that for some reason the PfSense DHCP server did not issue any IP addresses anymore. After a hard reset of the machine and reboot, everything went back to normal.

Now I was wondering if there is any way to detect on the Proxmox host machine if the IP address/network connection is not available anymore and use this to trigger an automatic restart of the PfSense VM?

If you need any further information I'll be happy to provide these and thanks a lot in advance for your help.

Best,
Matthias
 
PfSense as Gateway in a Proxmox VM
After a hard reset of the machine and reboot
If you run a router VM as a gateway on a PVE system of itself - you need one of two things (or both) - some sort of direct KVM or BMC to the host itself (not through the router VM) or a physical console & keyboard on the host itself. That way you can still manage the host even if somethings up with the router VM.

Now I was wondering if there is any way to detect on the Proxmox host machine if the IP address/network connection is not available anymore and use this to trigger an automatic restart of the PfSense VM?
You could write a simple script on the host that could check the router VM status/ping & act accordingly to restart/reboot that VM. You'll need to carefully adjust the script so that it gives the VM enough time on startup etc.
 
Hi Matthias,
did you investigate the cause of the malfunction of the PfSense VM? It seems to be better to solve the problem there instead of using a workaround like a reboot. Is the entire VM stuck or is it just some network issue?
If you have the QEMU Guest agent enabled for the vm and it is running in the vm too, you can retrieve a set of information from the guest (qm agent <vmid> <command>). Output is mostly JSON, so you need to filter it eventually. Maybe you find a detail which changes between working and non-working system, which you can use as trigger in a script. If the entire machine is not working, it will be a bit easier by just using qm agent <vmid> ping or installing a watchdog like described here: https://forum.proxmox.com/threads/hardware-watchdog-at-a-per-vm-level.104051/
 
Hi,

thanks a lot for your responses. I did not have much time to look further into this last year, that's why I have not been responding.
If you run a router VM as a gateway on a PVE system of itself - you need one of two things (or both) - some sort of direct KVM or BMC to the host itself (not through the router VM) or a physical console & keyboard on the host itself. That way you can still manage the host even if somethings up with the router VM.
I have one LAN-Port reserved to access the local machine directly. I didn't think about connecting it to my tablet for direct access at the time :D but will do so next time it happens.

You could write a simple script on the host that could check the router VM status/ping & act accordingly to restart/reboot that VM. You'll need to carefully adjust the script so that it gives the VM enough time on startup etc.
Is there a simple way to add/write scripts from the Proxmox UI? Or do I need to add them manually in the system?

did you investigate the cause of the malfunction of the PfSense VM?
I did not find any cause so far. When it happens again I will also check the logs as you described.

Is the entire VM stuck or is it just some network issue?
As I did a hard reset I cannot tell if the VM was stuck or if it was a network issue. However restarting the host machine did resolve the issue without rebooting any other network related equipment. So I assume it is related to the host (proxmox) being stuck or the VM (pfsense) being stuck (or only the DCHP Service on the VM).
Also thanks for the suggestion with the watchdog. I'll check if the entire VM was stuck, but I also think using ping can help, as this should fail when the DHCP service is not working anymore.

In any case thanks a lot already for your help gfngfn256 and fba.

Best,
Matthias
 
Or do I need to add them manually in the system?
Yes. Be careful with the script/service you create as you may get a situation where that script will constantly "think" the VM is unavailable & keep rebooting it. This will be a somewhat catch-22 situation. Timings will be crucial, available NW will be crucial. You are going to have to think of every potential scenario before you give it a try. I must be honest, with that VM actually being an integral part of your NW GW - there is a lot that can go wrong with such a script.

However you will also have to investigate why that "PfSense DHCP server did not issue any IP addresses anymore", it maybe that the VM itself was still perfectly "available" but for some reason stopped issuing further IP addresses anymore, your script is going to have to detect that scenario.

You may want to look at having a second DHCP server on the NW, this can be done (basically splitting the scope of addresses between the servers) - but that is a whole other tutorial - do your research, maybe starting here.

Another possibility maybe having a failover/redundancy HA PfSense server - but again this is a whole other system with a lot of research required. Maybe start here. Yet again I guess, whether or not this helps your original problem will depend on why it stopped working in the first place.

I must be honest I don't use Pfsense - so I can't help you much more.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!