Can't unmap images of cloned running VMs on KRBD

hybrid512

Active Member
Jun 6, 2013
76
4
28
Hi,

I think I found a corner case.
We have VMs running on Ceph in KRBD mode.
I have a user who cloned such VM but in a living state, neither from a stopped VM nor a snapshot.
I don't know why but it works except that when you want to remove the VM, when trying to remove the image, I get a sysfs error because the image can't be unmapped.
I tried many ways to force the unmap but that fails constantly and the only way to get rid of the map is to reboot the system (which is quite complicated when you are on a production cluster).

I don't know the reason but one way to avoid that would be to disable the ability to clone a non snapshot based running VMs with a KRBD storage.

Regards
 
I don't know why but it works except that when you want to remove the VM, when trying to remove the image, I get a sysfs error because the image can't be unmapped.
Which VM do you want to remove, the origin or the clone?

I don't know the reason but one way to avoid that would be to disable the ability to clone a non snapshot based running VMs with a KRBD storage.
The snapshot is done through 'rbd clone', can you try to do the clone directly with the command and see if it make any difference (removal with rbd rm)?
 
Which VM do you want to remove, the origin or the clone?


The snapshot is done through 'rbd clone', can you try to do the clone directly with the command and see if it make any difference (removal with rbd rm)?

The clone.
I think the prb mainly occures because we clone a "living" image with moving blocks and not a snapshot or a "cold" image.

One solution would be to create an automatic temporary snapshot before the clone, do the clone based on the snapshot then remove the temporary snapshot right after.

And by the way, this method could be done in any case for every underlying storage type that supports snapshoting.
 
I think the prb mainly occures because we clone a "living" image with moving blocks and not a snapshot or a "cold" image.
Live or not, this shouldn't matter. The clone is done through qemu's drive-mirror.

One solution would be to create an automatic temporary snapshot before the clone, do the clone based on the snapshot then remove the temporary snapshot right after.
The drive mirror takes care of getting all blocks duplicated and runs as long as it has not copied all blocks up to the point where the last copy is enough to switch finally the image.

And by the way, this method could be done in any case for every underlying storage type that supports snapshoting.
That's how the qemu drive-mirror is intended, to be independent of the underlying storage capabilities.

EDIT: OFC, clones of a live VM/CT needs more resources.
 
Live or not, this shouldn't matter. The clone is done through qemu's drive-mirror.


The drive mirror takes care of getting all blocks duplicated and runs as long as it has not copied all blocks up to the point where the last copy is enough to switch finally the image.


That's how the qemu drive-mirror is intended, to be independent of the underlying storage capabilities.

EDIT: OFC, clones of a live VM/CT needs more resources.

Then maybe this is a bug in qemu live mirror with krbd backend (it works well with librados).
 
Then maybe this is a bug in qemu live mirror with krbd backend (it works well with librados).
I hardly think so, otherwise more people would have these problems. Please post the logs from the clone.
 
I hardly think so, otherwise more people would have these problems. Please post the logs from the clone.

I don't have logs, the clone itselfs is working without any issue, the VM is cloned properly and you can start it but if you want to restart it, you can't and you get a sysfs write error.

I find many reports on the ceph ML when searching these errors on Google ... seems not so uncommon apparently.
eg. https://www.spinics.net/lists/ceph-devel/msg27435.html
It appears in case of high I/O load or degraded clusters mostly.
 
I don't have logs, the clone itselfs is working without any issue, the VM is cloned properly and you can start it but if you want to restart it, you can't and you get a sysfs write error.
Is there anything in the logs (ceph/syslog/journal) visible at this point?

I find many reports on the ceph ML when searching these errors on Google ... seems not so uncommon apparently.
eg. https://www.spinics.net/lists/ceph-devel/msg27435.html
It appears in case of high I/O load or degraded clusters mostly.
Well, high I/O load is a very wide subject with many symptoms, but do you see high load in your case?
 
nope but this bug happen either after a living VM clone with more than 100GB of data to clone for some of them or after a failed OSD (not allways though)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!