Can't unmap images of cloned running VMs on KRBD

hybrid512 · Feb 13, 2019

Hi,

I think I found a corner case.
We have VMs running on Ceph in KRBD mode.
I have a user who cloned such VM but in a living state, neither from a stopped VM nor a snapshot.
I don't know why but it works except that when you want to remove the VM, when trying to remove the image, I get a sysfs error because the image can't be unmapped.
I tried many ways to force the unmap but that fails constantly and the only way to get rid of the map is to reboot the system (which is quite complicated when you are on a production cluster).

I don't know the reason but one way to avoid that would be to disable the ability to clone a non snapshot based running VMs with a KRBD storage.

Regards

Alwin · Feb 13, 2019

hybrid512 said:
I don't know why but it works except that when you want to remove the VM, when trying to remove the image, I get a sysfs error because the image can't be unmapped.

Which VM do you want to remove, the origin or the clone?

hybrid512 said:
I don't know the reason but one way to avoid that would be to disable the ability to clone a non snapshot based running VMs with a KRBD storage.

The snapshot is done through 'rbd clone', can you try to do the clone directly with the command and see if it make any difference (removal with rbd rm)?

hybrid512 · Feb 13, 2019

Alwin said:
Which VM do you want to remove, the origin or the clone?

The snapshot is done through 'rbd clone', can you try to do the clone directly with the command and see if it make any difference (removal with rbd rm)?

The clone.
I think the prb mainly occures because we clone a "living" image with moving blocks and not a snapshot or a "cold" image.

One solution would be to create an automatic temporary snapshot before the clone, do the clone based on the snapshot then remove the temporary snapshot right after.

And by the way, this method could be done in any case for every underlying storage type that supports snapshoting.

Alwin · Feb 13, 2019

hybrid512 said:
I think the prb mainly occures because we clone a "living" image with moving blocks and not a snapshot or a "cold" image.

Live or not, this shouldn't matter. The clone is done through qemu's drive-mirror.

hybrid512 said:
One solution would be to create an automatic temporary snapshot before the clone, do the clone based on the snapshot then remove the temporary snapshot right after.

The drive mirror takes care of getting all blocks duplicated and runs as long as it has not copied all blocks up to the point where the last copy is enough to switch finally the image.

hybrid512 said:
And by the way, this method could be done in any case for every underlying storage type that supports snapshoting.

That's how the qemu drive-mirror is intended, to be independent of the underlying storage capabilities.

EDIT: OFC, clones of a live VM/CT needs more resources.

hybrid512 · Feb 14, 2019

Alwin said:
Live or not, this shouldn't matter. The clone is done through qemu's drive-mirror.

The drive mirror takes care of getting all blocks duplicated and runs as long as it has not copied all blocks up to the point where the last copy is enough to switch finally the image.

That's how the qemu drive-mirror is intended, to be independent of the underlying storage capabilities.

EDIT: OFC, clones of a live VM/CT needs more resources.

Then maybe this is a bug in qemu live mirror with krbd backend (it works well with librados).

Alwin · Feb 14, 2019

hybrid512 said:
Then maybe this is a bug in qemu live mirror with krbd backend (it works well with librados).

I hardly think so, otherwise more people would have these problems. Please post the logs from the clone.

hybrid512 · Feb 14, 2019

Alwin said:
I hardly think so, otherwise more people would have these problems. Please post the logs from the clone.

I don't have logs, the clone itselfs is working without any issue, the VM is cloned properly and you can start it but if you want to restart it, you can't and you get a sysfs write error.

I find many reports on the ceph ML when searching these errors on Google ... seems not so uncommon apparently.
eg. https://www.spinics.net/lists/ceph-devel/msg27435.html
It appears in case of high I/O load or degraded clusters mostly.

Alwin · Feb 14, 2019

hybrid512 said:
I don't have logs, the clone itselfs is working without any issue, the VM is cloned properly and you can start it but if you want to restart it, you can't and you get a sysfs write error.

Is there anything in the logs (ceph/syslog/journal) visible at this point?

hybrid512 said:
I find many reports on the ceph ML when searching these errors on Google ... seems not so uncommon apparently.
eg. https://www.spinics.net/lists/ceph-devel/msg27435.html
It appears in case of high I/O load or degraded clusters mostly.

Well, high I/O load is a very wide subject with many symptoms, but do you see high load in your case?

hybrid512 · Feb 14, 2019

nope but this bug happen either after a living VM clone with more than 100GB of data to clone for some of them or after a failed OSD (not allways though)

Search

Search

Can't unmap images of cloned running VMs on KRBD

hybrid512

Active Member

Alwin

Proxmox Retired Staff

hybrid512

Active Member

Alwin

Proxmox Retired Staff

hybrid512

Active Member

Alwin

Proxmox Retired Staff

hybrid512

Active Member

Alwin

Proxmox Retired Staff

hybrid512

Active Member

We value your privacy