vm in HA that normally is off starts automatically at PVE reboot

p-user

Member
Jan 26, 2024
71
4
13
I have a VM that backs up a smb share to the backup server, but since that's only once a week, it is normally off.
Through the crontab on one of the PVE nodes it is started and the internal crontab on the vm itself runs the job.
After it is finished (and some extra time margin) the crontab on the PVE node stops it again.

Since I've put the VM in HA, when the PVE node becomes unavailable, the vm will continue to run on another PVE node and then finish the backup. Which is what I wanted.

I know it is not the most beautiful solution, but it works. A clusterwide crontab would have been better, but that's not available, as far as I know.

This morning I updated the PVE node where the VM is normally situated and restarted it because of a new kernel. After the reboot the HA stuff had automatically started the VM, which is not what I wanted. When it is switched off it should remain off. Can I change this behaviour with the Request State setting in the HA resource settings? So far, Started and Ignored don't do the trick.

What is the best way to actually achieve this? So far I've set it up in a practical way (with all limitations), but maybe there's a better way?

Thanks in advance,

Albert
 
Hi!

How does the crontab on the PVE node schedule the starting and stopping of the HA resource?

If a HA resource is in the stopped state before the node is rebooted/shutdown and started, the HA resource should not start on its own again until it receives a request_start request, e.g. through qm start or ha-manager set <sid> --state started.
 
Hi Daniel,

here's my crontab on the pve node:

# m h dom mon dow command
55 18 * * 6 /usr/sbin/qm start 118 2>&1 > /dev/null
35 20 * * 6 /usr/sbin/qm stop 118 2>&1 > /dev/null

So I assume that it is in the stopped state.

Albert
 
Thanks! Can you also post the syslog in the time period where the HA resource was unexpectedly started when the node was rebooted? The syslog should contain at least the task starts and ends of starting vm:118 and the pve-ha-lrm's output, ideally also the decisions from pve-ha-crm, which will be on another node since the rebooting node cannot be the HA Manager.
 
Here's part of the syslog, after the reboot (note put hte HA to ignored). It is VM 118 that get's started, while it was set top stop just before the reboot of the pve node, note that vm 117 is always on should be started,.

2026-02-03T14:52:36.278369+01:00 pve1 pve-ha-lrm[97347]: starting service vm:118
2026-02-03T14:52:36.287605+01:00 pve1 pve-ha-lrm[97348]: start VM 118: UPID:pve1:00017C44:0014C108:6981FDA4:qmstart:118:root@pam:
2026-02-03T14:52:36.287717+01:00 pve1 pve-ha-lrm[97347]: <root@pam> starting task UPID:pve1:00017C44:0014C108:6981FDA4:qmstart:118:root@pam:
2026-02-03T14:52:37.173405+01:00 pve1 pve-ha-lrm[97348]: VM 118 started with PID 97361.
2026-02-03T14:52:37.183682+01:00 pve1 pve-ha-lrm[97347]: <root@pam> end task UPID:pve1:00017C44:0014C108:6981FDA4:qmstart:118:root@pam: OK
2026-02-03T14:52:37.186577+01:00 pve1 pve-ha-lrm[97347]: service status vm:118 started
2026-02-03T14:54:28.375823+01:00 pve1 systemd[1]: Stopping pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
2026-02-03T14:54:29.201089+01:00 pve1 pve-ha-lrm[1229]: received signal TERM
2026-02-03T14:54:29.210177+01:00 pve1 pve-ha-lrm[1229]: got shutdown request with shutdown policy 'conditional'
2026-02-03T14:54:29.210259+01:00 pve1 pve-ha-lrm[1229]: reboot LRM, stop and freeze all services
2026-02-03T14:54:36.224242+01:00 pve1 pve-ha-lrm[98351]: stopping service vm:117
2026-02-03T14:54:36.232178+01:00 pve1 pve-ha-lrm[98351]: <root@pam> starting task UPID:pve1:00018030:0014EFE3:6981FE1C:qmshutdown:117:root@pam:
2026-02-03T14:54:36.232501+01:00 pve1 pve-ha-lrm[98352]: shutdown VM 117: UPID:pve1:00018030:0014EFE3:6981FE1C:qmshutdown:117:root@pam:
2026-02-03T14:54:38.278320+01:00 pve1 pve-ha-lrm[98351]: <root@pam> end task UPID:pve1:00018030:0014EFE3:6981FE1C:qmshutdown:117:root@pam: OK
2026-02-03T14:54:38.278467+01:00 pve1 pve-ha-lrm[98351]: service status vm:117 stopped
2026-02-03T14:54:38.281610+01:00 pve1 pve-ha-lrm[1229]: watchdog closed (disabled)
2026-02-03T14:54:38.283118+01:00 pve1 pve-ha-lrm[1229]: server stopped
2026-02-03T14:54:39.216058+01:00 pve1 systemd[1]: pve-ha-lrm.service: Deactivated successfully.
2026-02-03T14:54:39.216287+01:00 pve1 systemd[1]: Stopped pve-ha-lrm.service - PVE Local HA Resource Manager Daemon.
2026-02-03T14:54:39.216364+01:00 pve1 systemd[1]: pve-ha-lrm.service: Consumed 13.955s CPU time, 244.2M memory peak.
2026-02-03T14:54:39.218186+01:00 pve1 systemd[1]: Stopping pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon...
2026-02-03T14:54:39.940042+01:00 pve1 pve-ha-crm[1204]: received signal TERM
2026-02-03T14:54:39.940284+01:00 pve1 pve-ha-crm[1204]: server received shutdown request
2026-02-03T14:54:41.944698+01:00 pve1 pve-ha-crm[1204]: server stopped
2026-02-03T14:54:42.953639+01:00 pve1 systemd[1]: pve-ha-crm.service: Deactivated successfully.
2026-02-03T14:54:42.953739+01:00 pve1 systemd[1]: Stopped pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon.
2026-02-03T14:54:42.953784+01:00 pve1 systemd[1]: pve-ha-crm.service: Consumed 4.670s CPU time, 231.1M memory peak.
2026-02-03T14:56:46.362744+01:00 pve1 systemd[1]: Starting pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon...
2026-02-03T14:56:47.058432+01:00 pve1 pve-ha-crm[1204]: starting server
2026-02-03T14:56:47.058563+01:00 pve1 pve-ha-crm[1204]: status change startup => wait_for_quorum
2026-02-03T14:56:47.073923+01:00 pve1 systemd[1]: Started pve-ha-crm.service - PVE Cluster HA Resource Manager Daemon.
2026-02-03T14:56:54.921857+01:00 pve1 systemd[1]: Starting pve-ha-lrm.service - PVE Local HA Resource Manager Daemon...
2026-02-03T14:56:55.608865+01:00 pve1 pve-ha-lrm[1232]: starting server
2026-02-03T14:56:55.608974+01:00 pve1 pve-ha-lrm[1232]: status change startup => wait_for_agent_lock
2026-02-03T14:56:55.626617+01:00 pve1 systemd[1]: Started pve-ha-lrm.service - PVE Local HA Resource Manager Daemon.
2026-02-03T14:57:02.071125+01:00 pve1 pve-ha-crm[1204]: status change wait_for_quorum => slave
2026-02-03T14:57:05.618426+01:00 pve1 pve-ha-lrm[1232]: successfully acquired lock 'ha_agent_pve1_lock'
2026-02-03T14:57:05.618501+01:00 pve1 pve-ha-lrm[1232]: watchdog active
2026-02-03T14:57:05.618559+01:00 pve1 pve-ha-lrm[1232]: status change wait_for_agent_lock => active
2026-02-03T14:57:05.628065+01:00 pve1 pve-ha-lrm[1434]: starting service vm:117
2026-02-03T14:57:05.635843+01:00 pve1 pve-ha-lrm[1435]: start VM 117: UPID:pve1:0000059B:00000AD3:6981FEB1:qmstart:117:root@pam:
2026-02-03T14:57:05.636703+01:00 pve1 pve-ha-lrm[1434]: <root@pam> starting task UPID:pve1:0000059B:00000AD3:6981FEB1:qmstart:117:root@pam:
2026-02-03T14:57:06.803283+01:00 pve1 pve-ha-lrm[1435]: VM 117 started with PID 1457.
2026-02-03T14:57:06.814192+01:00 pve1 pve-ha-lrm[1434]: <root@pam> end task UPID:pve1:0000059B:00000AD3:6981FEB1:qmstart:117:root@pam: OK
2026-02-03T14:57:06.817243+01:00 pve1 pve-ha-lrm[1434]: service status vm:117 started
 
I'm not sure when the reboot happened here, did it happen before the syslog excerpt (before 2026-02-03T14:52:36.278369+01:00) or in the middle at 2026-02-03T14:56:46.362744+01:00?

If it's the latter I don't see vm:118 being started here after the reboot. Can you also post the HA configuration with cat /etc/pve/ha/resources.cfg?