migration to other node's local storage : qpm command got timeout

trent-- · Sep 2, 2022

Hello,

We have a Proxmox HA cluster with 3 nodes. Each node has vms on its local storage.
We know it's not the best scenario, but we are stuck with it for now, due to our hoster's shortcomings.

Anyways, when we try to live migrate a vm from a node's local storage to another node's local storage, it fails with a timeout error.

Here is the full migrate task log :

Code:

2022-09-02 16:19:16 use dedicated network address for sending migration traffic (REDACTED_IP)
2022-09-02 16:19:17 starting migration of VM 114 to node 'pve3' (REDACTED_IP)
2022-09-02 16:19:17 found local disk 'local:114/vm-114-disk-0.qcow2' (in current VM config)
2022-09-02 16:19:17 starting VM 114 on remote node 'pve3'
2022-09-02 16:20:22 [pve3] VM 114 qmp command 'nbd-server-add' failed - got timeout
2022-09-02 16:20:22 ERROR: online migrate failure - remote command failed with exit code 255
2022-09-02 16:20:22 aborting phase 2 - cleanup resources
2022-09-02 16:20:22 migrate_cancel
2022-09-02 16:20:23 ERROR: migration finished with problems (duration 00:01:07)
TASK ERROR: migration problems

This also spawns a VM start task with the following log :

Code:

Formatting '/var/lib/vz/images/114/vm-114-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=full compression_type=zlib size=53687091200 lazy_refcounts=off refcount_bits=16
migration listens on tcp:REDACTED_IP:60000

TASK ERROR: VM 114 qmp command 'nbd-server-add' failed - got timeout

This seems similar to an error some people get with PBS in that thread : https://forum.proxmox.com/threads/p...ut-qmp-command-cont-failed-got-timeout.95212/

The proposed fix for that problem :
https://forum.proxmox.com/threads/p...t-failed-got-timeout.95212/page-2#post-426261

Code:
I have changed

Code:

} else { $timeout = 3; # default to } else { $timeout = 8; # default

in line 134 of /usr/share/perl5/PVE/QMPClient.pm , restarted the pve daemons

Code:

for service in pvedaemon.service pveproxy.service pvestatd.service ;do echo "systemctl restart $service" systemctl restart $service done

So I wonder if this fix could also solve my problem ?
I also wonder about the consequences of restarting the pvedaemon, preproxy and pvestatd services on a node with running vms, wouldn't that cause disruptions to these vms ?

trent-- · Sep 5, 2022

Replying to myself after trying this out :

restarting these services doesn't affect running vms
it didn't work with 8 seconds, so I increased the timeout to 120 seconds, then it worked

Search

Search

migration to other node's local storage : qpm command got timeout

trent--

Member

trent--

Member

We value your privacy