migration to other node's local storage : qpm command got timeout

trent--

Member
Mar 19, 2021
19
1
6
Hello,

We have a Proxmox HA cluster with 3 nodes. Each node has vms on its local storage.
We know it's not the best scenario, but we are stuck with it for now, due to our hoster's shortcomings.

Anyways, when we try to live migrate a vm from a node's local storage to another node's local storage, it fails with a timeout error.

Here is the full migrate task log :
Code:
2022-09-02 16:19:16 use dedicated network address for sending migration traffic (REDACTED_IP)
2022-09-02 16:19:17 starting migration of VM 114 to node 'pve3' (REDACTED_IP)
2022-09-02 16:19:17 found local disk 'local:114/vm-114-disk-0.qcow2' (in current VM config)
2022-09-02 16:19:17 starting VM 114 on remote node 'pve3'
2022-09-02 16:20:22 [pve3] VM 114 qmp command 'nbd-server-add' failed - got timeout
2022-09-02 16:20:22 ERROR: online migrate failure - remote command failed with exit code 255
2022-09-02 16:20:22 aborting phase 2 - cleanup resources
2022-09-02 16:20:22 migrate_cancel
2022-09-02 16:20:23 ERROR: migration finished with problems (duration 00:01:07)
TASK ERROR: migration problems

This also spawns a VM start task with the following log :
Code:
Formatting '/var/lib/vz/images/114/vm-114-disk-0.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off preallocation=full compression_type=zlib size=53687091200 lazy_refcounts=off refcount_bits=16
migration listens on tcp:REDACTED_IP:60000

TASK ERROR: VM 114 qmp command 'nbd-server-add' failed - got timeout

This seems similar to an error some people get with PBS in that thread : https://forum.proxmox.com/threads/p...ut-qmp-command-cont-failed-got-timeout.95212/

The proposed fix for that problem :
https://forum.proxmox.com/threads/p...t-failed-got-timeout.95212/page-2#post-426261

I have changed
Code:
      } else {
            $timeout = 3; # default
to
      } else {
            $timeout = 8; # default

in line 134 of /usr/share/perl5/PVE/QMPClient.pm , restarted the pve daemons
Code:
for service in pvedaemon.service pveproxy.service pvestatd.service ;do
     echo "systemctl restart $service"
     systemctl restart $service
done


So I wonder if this fix could also solve my problem ?
I also wonder about the consequences of restarting the pvedaemon, preproxy and pvestatd services on a node with running vms, wouldn't that cause disruptions to these vms ?
 
Replying to myself after trying this out :
  • restarting these services doesn't affect running vms
  • it didn't work with 8 seconds, so I increased the timeout to 120 seconds, then it worked
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!