Error while moving disks to Ceph but not vice versa

ntnll

New Member
Jul 8, 2024
5
1
3
Hello everyone,

I would like to ask some help with some issue I have moving a virtual machine disk from local-lvm storage to a newly created Ceph pool. Despite extensive research and troubleshooting, I keep encountering the following error:


Code:
create full clone of drive ide0 (local-lvm:vm-100-disk-0)
drive mirror is starting for drive-ide0
drive-ide0: Cancelling block job
drive-ide0: Done.
Removing image: 1% complete...
[...]
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: mirroring error: VM 100 qmp command 'drive-mirror' failed - Could not open 'rbd:pool_vm/vm-100-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/pool_vm.keyring': No such file or directory


Interestingly, running the following command from the command line works fine, using the same parameters that fail during the move:

Code:
root@proxmox1:~# rbd info pool_vm/test-disk --conf /etc/pve/ceph.conf --id admin --keyring /etc/pve/priv/ceph/pool_vm.keyring
rbd image 'test-disk':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 2478f7de3df2d
block_name_prefix: rbd_data.2478f7de3df2d
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Sat Jul  6 10:52:32 2024
access_timestamp: Sat Jul  6 10:52:32 2024
modify_timestamp: Sat Jul  6 10:52:32 2024


Permissions and ceph conf:
Code:
root@proxmox1:~# ls -l /etc/pve/ceph.conf /etc/pve/priv/ceph/
-rw-r----- 1 root www-data 628 Jul  5 23:17 /etc/pve/ceph.conf


/etc/pve/priv/ceph/:
total 1
-rw------- 1 root www-data 151 Jul  5 23:18 pool_k8s.keyring
-rw------- 1 root www-data 151 Jul  6 10:55 pool_vm.keyring
root@proxmox1:~# cat /etc/pve/ceph.conf
[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 10.0.0.11/16
    fsid = 0e6f5a25-287d-41f7-a995-cf63f6a386fb
    mon_allow_pool_delete = true
    mon_host = 10.0.0.11 10.0.0.13 10.0.0.12
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 10.0.0.11/16


[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring


[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring


[mon.proxmox1]
    public_addr = 10.0.0.11


[mon.proxmox2]
    public_addr = 10.0.0.12


[mon.proxmox3]
    public_addr = 10.0.0.13


Here's the background:
  • Fresh installation with three new SSDs.
  • Ceph works perfectly for creating VMs directly on the pool
  • Moving disks from Ceph pool to local LVM works without issues.
  • I managed to move a disk successfully a couple of times initially, somehow I broke it
  • I can restore VM's dirctly on Ceph pool, it seems to fail only the migration
The issue I'm not sure, could have started when I played with users in ceph to allow my k8s cluster to connect to the second pool I've created, pool_k8s. Since then, I haven't been able to move disks to Ceph, but that could be a coincidence, I'm not sure. Anyway, after many failed tries, I decided to follow these tips to completely uninstall Ceph and start fresh: Removing Ceph Completely.
The issue persists after couple of re-installation tries, many forum searched and couple of night spent on it.

Can anybodoy please help?
Thanks in advance.
 
thanks @Neobin for your prompt response, I'll try to downgrade as suggested and let you know.

my pveversion:

Code:
root@proxmox1:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-14
pve-kernel-5.19: 7.2-15
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
pve-kernel-5.19.17-2-pve: 5.19.17-2
pve-kernel-5.15.158-1-pve: 5.15.158-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1
 
The following tries did not fixed the issue for me, unfortunately

apt-get install pve-qemu-kvm=8.2.2-1
apt-get install pve-qemu-kvm=8.1.5-6

I've restarted also some services even if don't think was necessary after the update:

systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pve-cluster
 
Last edited:
apt install pve-qemu-kvm:amd64=8.2.2-1 downgraded the package and solved this issue for me.
 
@entith Did this issue only occur for you when migrating towards Ceph and not the other way around? For me, it only happened in this case; all other migrations worked fine, and downgrade didn't helped
 
Keep in mind, that for the "new" QEMU version to actually be used by an already running VM, the VM needs either to be fully stopped and started again or migrated to a node that already has the "new" version, if possible (in this case here, possibly to a non-Ceph storage?); but I do not know, if a live migration is possible from a newer to an older version at all.

So, my suggestion, if possible, to be on the safe side would be to simply fully stop the VM(s) in question and start them again and give it another try.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!