Hi,
I just updated a three node cluster to the latest version (4.1-22 from the repo). Did that one by one, the first node went fine but the second and third node crashed during the update, I had to run dpkg --configure -a to finish it.
Did that over the course of today, and those two crashes caused me a lot of trouble since it forced two heal of the glusterFS installed on the machines one after another, locking all the VMs, but that's not the point.
For each update, I live migrated all the VMs to the other two nodes and that worked fine.
Now that they are all three up to date, I wanted to move back some machines on the third node and discovered that I can't actually do anything, when I try to live migrate a VM, stop a VM, start a VM .. I just get an "OK" in the task list a few seconds later, and nothing happens.
In the log I just get the
pvedaemon[11735]: <root@pam> starting task UPID:ns3624511:00003668:005A2E0E:57118C87:hastart:101:root@pam:
And absolutely nothing else after that. Tried to move VMs from each nodes to other, both online and offline. I ended up powering off a VM through SSH to test, and I can't start it up again, I just get an OK again.
Any idea what I can do to unblock that ? I can't just reboot a node since that will cause another gluster heal.
It does look like I can move a disk between storages though, but ofcourse the VM gets rebooted at the end of the operation and just never starts again.
Thanks
EDIT : To be sure I tried hooking back up a disk from a different storage (in case the glusterFS is broken somehow) to a powered down VM, but it still won't start. Seems to be a legitimate proxmox problem this time.
EDIT 2 : When I use qm start 101, I get only this output "Executing HA start for VM 101". In the logs I see the starting task and one second later the end task, no errors.
I just updated a three node cluster to the latest version (4.1-22 from the repo). Did that one by one, the first node went fine but the second and third node crashed during the update, I had to run dpkg --configure -a to finish it.
Did that over the course of today, and those two crashes caused me a lot of trouble since it forced two heal of the glusterFS installed on the machines one after another, locking all the VMs, but that's not the point.
For each update, I live migrated all the VMs to the other two nodes and that worked fine.
Now that they are all three up to date, I wanted to move back some machines on the third node and discovered that I can't actually do anything, when I try to live migrate a VM, stop a VM, start a VM .. I just get an "OK" in the task list a few seconds later, and nothing happens.
In the log I just get the
pvedaemon[11735]: <root@pam> starting task UPID:ns3624511:00003668:005A2E0E:57118C87:hastart:101:root@pam:
And absolutely nothing else after that. Tried to move VMs from each nodes to other, both online and offline. I ended up powering off a VM through SSH to test, and I can't start it up again, I just get an OK again.
Any idea what I can do to unblock that ? I can't just reboot a node since that will cause another gluster heal.
It does look like I can move a disk between storages though, but ofcourse the VM gets rebooted at the end of the operation and just never starts again.
Thanks
EDIT : To be sure I tried hooking back up a disk from a different storage (in case the glusterFS is broken somehow) to a powered down VM, but it still won't start. Seems to be a legitimate proxmox problem this time.
EDIT 2 : When I use qm start 101, I get only this output "Executing HA start for VM 101". In the logs I see the starting task and one second later the end task, no errors.
Last edited: