storage migration virtio failed

liska_ · Sep 17, 2014

Hi,
I am now in process of migrating some machines from proxmox 3.2 to proxmox 3.3. With one machine I have a following error after reaching 100%:
qmp command 'block-job-complete' failed - The active block job for device 'drive-virtio0' cannot be completed
I have successfully done live migration to second server but I am not able to finish moving image from nfs storage using virtio to another nfs storage.
With another vm I solved it by using command qm rescan --vmid 108 but it was a little different case.
Do you have any hint, how to solve this?
Thanks a lot in advance

tom · Sep 18, 2014

"move disk" does not always works live. power off your VM and then you should be able to move the disks.

nz_monkey · Sep 29, 2014

tom said:
"move disk" does not always works live. power off your VM and then you should be able to move the disks.

Hi Tom,

Is there a roadmap to make this feature reliable ?

spirit · Sep 29, 2014

nz_monkey said:
Hi Tom,

Is there a roadmap to make this feature reliable ?

Hi,
I have migrated hundred of disk without any problem. (It was nfs->nfs)

What is your source and target storage ?

nz_monkey · Sep 29, 2014

Hi,

RBD --> RBD

I get around a 60% success rate.

They usually end with "TASK ERROR: storage migration failed: mirroring error: VM 101 qmp command 'block-job-complete' failed - The active block job for device 'drive-scsi1' cannot be completed" or similar for virtio devices.

spirit · Sep 29, 2014

Ok, thanks, I'll try to reproduce. which ceph version ?

Do you use cache=writeback ?

nz_monkey · Sep 29, 2014

Thanks,

It is Ceph Firefly 0.80.5

And yes, this is with cache=writeback

spirit · Sep 29, 2014

nz_monkey said:
Thanks,

It is Ceph Firefly 0.80.5

And yes, this is with cache=writeback

ok thanks. (What I have in mind could be writeback cache not yet flushed when switching drives, but I'm really not sure about it)

nz_monkey · Oct 1, 2014

Hi spirit,

It may be relevant, I have tested this on around 20 VM's on our test cluster, and could not repeat the problem. But... VM's in our test cluster are much less busy than VM's on our production cluster.

The guests that seem to fail migration are all high IO machines, e.g. File Servers, Exchange Servers, SQL servers. They are running Windows Server 2008 with either virtio-blk or virtio-scsi disks.

jdw · Oct 1, 2014

liska_ said:
Hi,
I am now in process of migrating some machines from proxmox 3.2 to proxmox 3.3. With one machine I have a following error after reaching 100%:
qmp command 'block-job-complete' failed - The active block job for device 'drive-virtio0' cannot be completed

This happens to us at least 50% of the time as well. It seems to have 2-3 different failure modes that vary in the details, but in all cases it is the same in general: it goes through the entire process and then fails at the very end. Here is one example:

transferred: 8589934592 bytes remaining: 0 bytes total: 8589934592 bytes progression: 100.00 %
2014-10-01 11:20:57.270423 7f89cc03a760 -1 did not load config file, using default settings.
Removing all snapshots: 100% complete...done.
2014-10-01 11:20:57.311938 7fa55d8be760 -1 did not load config file, using default settings.
image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
TASK ERROR: storage migration failed: mirroring error: VM 101 qmp command 'block-job-complete' failed - The active block job for device 'drive-virtio0' cannot be completed

Sometimes it also fails but with much less detail, only:

TASK ERROR: storage migration failed: mirroring error: VM 140 qmp command 'block-job-complete' failed - The active block job for device 'drive-virtio0' cannot be completed

(Exactly as reported by the original poster.)

This is also with ceph firefly. It is particularly frustrating for large images, as they take a very long time to move and only fail at the very end. This is definitely something that should be addressed, as it is a terrible misuse of time and network bandwidth.

That message about "did not load config file, using default settings" often appears at the beginning of the storage migration as well, but that does not appear to be a predictor of success/failure.

It does seem like the busier a virtual disk is, the more likely the problem is to occur, but since we value our data we do not use writeback caching on any volume that cannot be trivially rebuilt in the event of a crash, so it definitely also occurs with cache set to "none." (Also there are some pretty dire warnings floating around out there to the effect of "live migration + writeback cache = data corruption.")

spirit · Oct 1, 2014

Just found a similar bug report on openstack with rbd:

https://www.mail-archive.com/ceph-users@lists.ceph.com/msg11976.html

Do you have snapshots on theses volume ?

Also, when you are doing the storage migration, do you check the feature "delete source" in gui ?

jdw · Oct 1, 2014

spirit said:
Do you have snapshots on theses volume ?

Speaking only for myself, we do not.

spirit · Oct 1, 2014

jdw said:
Speaking only for myself, we do not.

My understanding of the problem, is that the command "block-job-complete" hang, or don't return ACK.
But have correctly switched the disks.

so, proxmox see that as an error, and try to delete the new disk.

But as the new disk is open in qemu, ceph throw the error "image has watchers - not removing"/

I'll try to reproduce, maybe it's a timeout problem with block-job-complete.

jdw · Oct 1, 2014

spirit said:
I'll try to reproduce, maybe it's a timeout problem with block-job-complete.

That could be. Timeouts and UI errors do often pop up when deleting large, unused ceph volumes via the UI. (But they do eventually delete behind the scenes.)

However, we have tried separating the migration and deleting the old volume, and while the success rate seems slightly higher, it definitely does not prevent the migration-only step from failing.

Is there a way to do storage migration from the command line?

spirit · Oct 1, 2014

>>Is there a way to do storage migration from the command line?
yes (#qm move) , but it's use the same apis. It's not a gui timeout problem.

if you can reproduce the problem,

could you try to edit
/usr/share/perl5/PVE/QemuServer.pm

search:

vm_mon_cmd($vmid, "block-job-complete", device => "drive-$drive");

and replace it by

vm_mon_cmd($vmid, "block-job-complete", timeout => 10, device => "drive-$drive");

then restart

/etc/init.d/pvedaemon restart
/etc/init.d/pveproxy restart

and try the storage migration again.

jdw · Oct 1, 2014

spirit said:
could you try to edit
/usr/share/perl5/PVE/QemuServer.pm

On which Proxmox server shall I perform these steps? The one connected to for UI, the one where the VM is located, or all Proxmox servers in the cluster?

spirit · Oct 1, 2014

jdw said:
On which Proxmox server shall I perform these steps? The one connected to for UI, the one where the VM is located, or all Proxmox servers in the cluster?

the node where the vm with rbd is running.

Note that I'm note sure I'll help.

I have check in qemu code

I's hanging here:

Code:

static void mirror_complete(BlockJob *job, Error **errp)
{
    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
    Error *local_err = NULL;
    int ret;


    ret = bdrv_open_backing_file(s->target, NULL, &local_err);
    if (ret < 0) {
        error_propagate(errp, local_err);
        return;
    }


    if (!s->synced) {

       ------>>>>>>>>>>HERE

        error_set(errp, QERR_BLOCK_JOB_NOT_READY, job->bs->device_name);
        return;
    }


    /* check the target bs is not blocked and block all operations on it */
    if (s->replaces) {
        s->to_replace = check_to_replace_node(s->replaces, &local_err);
        if (!s->to_replace) {
            error_propagate(errp, local_err);
            return;
        }


        error_setg(&s->replace_blocker,
                   "block device is in use by block-job-complete");
        bdrv_op_block_all(s->to_replace, s->replace_blocker);
        bdrv_ref(s->to_replace);
    }


    s->should_complete = true;
    block_job_resume(job);
}

According to qemu doc

http://wiki.qemu.org/Features/BlockJob

We can only call block-job-complete, when we receive event "MIRROR_STATE_CHANGE : with state (synced: true/false).

This is to be sure that disk are synced before doing the switch.

Currently in proxmox, we don't check events, so after drive-mirror is completed, we assume that we can finish the job.

Do you use writeback with your rbd volume?

I'll to see if we can improve that

liska_ · Oct 1, 2014

I experienced this when moving disk from nfs to other nfs storage. Turning off the machine, move disk and turning on again solve the problem. Luckily this one was not important vm.

spirit · Oct 1, 2014

Another thing to try:

edit

/usr/share/perl5/PVE/QemuServer.pm

search

Code:

sub qemu_drive_mirror {
    my ($vmid, $drive, $dst_volid, $vmiddst, $maxwait) = @_;


    my $count = 1;
    my $old_len = 0;
    my $frozen = undef;

and add

Code:

sub qemu_drive_mirror {
    my ($vmid, $drive, $dst_volid, $vmiddst, $maxwait) = @_;


    my $count = 1;
    my $old_len = 0;
    my $frozen = undef;
    $maxwait = 10;   ####ADD THIS

This should force vm freeze/resume if too much write occur at the end the migration.

(restart pvedaemon and pveproxy after)

jdw · Oct 1, 2014

Added both of those changed. It actually made things worse. The transfer bailed at 11.37% done.

Code:

transferred: 4884463616 bytes remaining: 38065209344 bytes total: 42949672960 bytes progression: 11.37 %
trying to aquire lock...Removing all snapshots: 100% complete...done.
image has watchers - not removing
Removing image: 0% complete...failed.
rbd: error: image still has watchers
rbd rm 'vm-1343-disk-2' error: rbd: error: image still has watchers
storage migration failed: mirroring error: can't lock file '/var/lock/qemu-server/lock-1343.conf' - got timeout

storage migration virtio failed

Member

Proxmox Staff Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Member

Distinguished Member

Renowned Member