Hello
I'm facing a recurring issue with High Availability on my Proxmox cluster. If I stop a VM and later attempt to start it again, the VM fails to boot, and its HA status shows as "failed." This issue happens multiple times every week. HA is in State: started. Then it becomes failed as soon as I stop / start the VM. It does not make much sense.
Here’s the scenario:
1. I stop the VM without any issues.
2. When I try to start it again, it won’t boot because HA status shows as "failed."
3. To get the VM running again, I have to delete its HA configuration. Once HA is removed, the VM boots without any problems.
I’ve attached relevant logs below to help diagnose the problem. Any insights or suggestions on how to resolve this would be greatly appreciated.
Thank you!
I'm facing a recurring issue with High Availability on my Proxmox cluster. If I stop a VM and later attempt to start it again, the VM fails to boot, and its HA status shows as "failed." This issue happens multiple times every week. HA is in State: started. Then it becomes failed as soon as I stop / start the VM. It does not make much sense.
Here’s the scenario:
1. I stop the VM without any issues.
2. When I try to start it again, it won’t boot because HA status shows as "failed."
3. To get the VM running again, I have to delete its HA configuration. Once HA is removed, the VM boots without any problems.
I’ve attached relevant logs below to help diagnose the problem. Any insights or suggestions on how to resolve this would be greatly appreciated.
Thank you!
Code:
Nov 18 21:47:39 localhost pvedaemon[3790683]: <root@pam> starting task UPID:localhost:00119DE9:11334410:673BA7EB:hastop:1162:root@pam:
Nov 18 21:47:39 localhost pvedaemon[3790683]: <root@pam> end task UPID:localhost:00119DE9:11334410:673BA7EB:hastop:1162:root@pam: OK
Nov 18 21:47:44 localhost pve-ha-lrm[1154629]: stopping service vm:1162 (timeout=0)
Nov 18 21:47:44 localhost pve-ha-lrm[1154629]: <root@pam> starting task UPID:localhost:00119E49:1133464A:673BA7F0:qmstop:1162:root@pam:
Nov 18 21:47:44 localhost pve-ha-lrm[1154633]: stop VM 1162: UPID:localhost:00119E49:1133464A:673BA7F0:qmstop:1162:root@pam:
Nov 18 21:47:49 localhost pve-ha-lrm[1154629]: Task 'UPID:localhost:00119E49:1133464A:673BA7F0:qmstop:1162:root@pam:' still active, waiting
Nov 18 21:47:49 localhost pve-ha-lrm[1154633]: VM 1162 qmp command failed - VM 1162 qmp command 'quit' failed - got timeout
Nov 18 21:47:49 localhost pve-ha-lrm[1154633]: VM quit/powerdown failed - terminating now with SIGTERM
Nov 18 21:47:54 localhost pve-ha-lrm[1154629]: Task 'UPID:localhost:00119E49:1133464A:673BA7F0:qmstop:1162:root@pam:' still active, waiting
Nov 18 21:47:56 localhost pvestatd[2695]: VM 1162 qmp command failed - VM 1162 qmp command 'query-proxmox-support' failed - got timeout
Nov 18 21:47:57 localhost pvestatd[2695]: status update time (9.566 seconds)
Nov 18 21:47:57 localhost pvestatd[2695]: restarting server after 29 cycles to reduce memory usage (free 156628 (15556) KB)
Nov 18 21:47:57 localhost pvestatd[2695]: server shutdown (restart)
Nov 18 21:47:58 localhost pvestatd[2695]: restarting server
Nov 18 21:47:59 localhost pve-ha-lrm[1154629]: Task 'UPID:localhost:00119E49:1133464A:673BA7F0:qmstop:1162:root@pam:' still active, waiting
Nov 18 21:47:59 localhost pve-ha-lrm[1154633]: VM still running - terminating now with SIGKILL
Nov 18 21:48:01 localhost pve-ha-lrm[1154633]: can't unmap rbd device /dev/rbd-pve/6f2f3b31-bcba-47a5-ac2b-e08a7aed1b63/block-storage-metadata/vm-1162-disk-0: rbd: sysfs write failed
Nov 18 21:48:01 localhost pve-ha-lrm[1154633]: volume deactivation failed: block-storage:vm-1162-disk-0 at /usr/share/perl5/PVE/Storage.pm line 1258.
Nov 18 21:48:01 localhost pve-ha-lrm[1154629]: <root@pam> end task UPID:localhost:00119E49:1133464A:673BA7F0:qmstop:1162:root@pam: OK
Nov 18 21:48:01 localhost pve-ha-lrm[1154629]: unable to stop stop service vm:1162 (still running)
Nov 18 21:48:04 localhost pve-ha-lrm[1155163]: service vm:1162 is in an error state and needs manual intervention. Look up 'ERROR RECOVERY' in the documentation.
Nov 18 21:48:16 localhost pvestatd[2695]: VM 1162 qmp command failed - VM 1162 qmp command 'query-proxmox-support' failed - unable to connect to VM 1162 qmp socket - timeout after 51 retries
Last edited: