Error while moving disks to Ceph but not vice versa

ntnll · Jul 8, 2024

Hello everyone,

I would like to ask some help with some issue I have moving a virtual machine disk from local-lvm storage to a newly created Ceph pool. Despite extensive research and troubleshooting, I keep encountering the following error:

Code:

create full clone of drive ide0 (local-lvm:vm-100-disk-0)
drive mirror is starting for drive-ide0
drive-ide0: Cancelling block job
drive-ide0: Done.
Removing image: 1% complete...
[...]
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: mirroring error: VM 100 qmp command 'drive-mirror' failed - Could not open 'rbd:pool_vm/vm-100-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/pool_vm.keyring': No such file or directory

Interestingly, running the following command from the command line works fine, using the same parameters that fail during the move:

Code:

root@proxmox1:~# rbd info pool_vm/test-disk --conf /etc/pve/ceph.conf --id admin --keyring /etc/pve/priv/ceph/pool_vm.keyring
rbd image 'test-disk':
size 1 GiB in 256 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 2478f7de3df2d
block_name_prefix: rbd_data.2478f7de3df2d
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Sat Jul  6 10:52:32 2024
access_timestamp: Sat Jul  6 10:52:32 2024
modify_timestamp: Sat Jul  6 10:52:32 2024

Permissions and ceph conf:

Code:

root@proxmox1:~# ls -l /etc/pve/ceph.conf /etc/pve/priv/ceph/
-rw-r----- 1 root www-data 628 Jul  5 23:17 /etc/pve/ceph.conf


/etc/pve/priv/ceph/:
total 1
-rw------- 1 root www-data 151 Jul  5 23:18 pool_k8s.keyring
-rw------- 1 root www-data 151 Jul  6 10:55 pool_vm.keyring
root@proxmox1:~# cat /etc/pve/ceph.conf
[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 10.0.0.11/16
    fsid = 0e6f5a25-287d-41f7-a995-cf63f6a386fb
    mon_allow_pool_delete = true
    mon_host = 10.0.0.11 10.0.0.13 10.0.0.12
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 10.0.0.11/16


[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring


[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring


[mon.proxmox1]
    public_addr = 10.0.0.11


[mon.proxmox2]
    public_addr = 10.0.0.12


[mon.proxmox3]
    public_addr = 10.0.0.13

Here's the background:

Fresh installation with three new SSDs.
Ceph works perfectly for creating VMs directly on the pool
Moving disks from Ceph pool to local LVM works without issues.
I managed to move a disk successfully a couple of times initially, somehow I broke it
I can restore VM's dirctly on Ceph pool, it seems to fail only the migration

The issue I'm not sure, could have started when I played with users in ceph to allow my k8s cluster to connect to the second pool I've created, pool_k8s. Since then, I haven't been able to move disks to Ceph, but that could be a coincidence, I'm not sure. Anyway, after many failed tries, I decided to follow these tips to completely uninstall Ceph and start fresh: Removing Ceph Completely.
The issue persists after couple of re-installation tries, many forum searched and couple of night spent on it.

Can anybodoy please help?
Thanks in advance.

Neobin · Jul 8, 2024

Please provide the full output of: pveversion -v

See also:

ntnll · Jul 8, 2024

thanks @Neobin for your prompt response, I'll try to downgrade as suggested and let you know.

my pveversion:

Code:

root@proxmox1:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-14
pve-kernel-5.19: 7.2-15
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
pve-kernel-5.19.17-2-pve: 5.19.17-2
pve-kernel-5.15.158-1-pve: 5.15.158-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-5
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

ntnll · Jul 8, 2024

The following tries did not fixed the issue for me, unfortunately

apt-get install pve-qemu-kvm=8.2.2-1
apt-get install pve-qemu-kvm=8.1.5-6

I've restarted also some services even if don't think was necessary after the update:

systemctl restart pvedaemon
systemctl restart pveproxy
systemctl restart pve-cluster

gfngfn256 · Jul 8, 2024

ntnll said:
apt-get install pve-qemu-kvm=8.1.5-6

I believe the command should be:
apt-get install pve-qemu-kvm:amd64=8.1.5-6

entith · Jul 8, 2024

apt install pve-qemu-kvm:amd64=8.2.2-1 downgraded the package and solved this issue for me.

ntnll · Jul 8, 2024

@entith Did this issue only occur for you when migrating towards Ceph and not the other way around? For me, it only happened in this case; all other migrations worked fine, and downgrade didn't helped

Neobin · Jul 8, 2024

Keep in mind, that for the "new" QEMU version to actually be used by an already running VM, the VM needs either to be fully stopped and started again or migrated to a node that already has the "new" version, if possible (in this case here, possibly to a non-Ceph storage?); but I do not know, if a live migration is possible from a newer to an older version at all.

So, my suggestion, if possible, to be on the safe side would be to simply fully stop the VM(s) in question and start them again and give it another try.

ntnll · Jul 8, 2024

thanks a lot @Neobin! I confirm that, if I shutdown the VM, the fix works! Thank you !!!!!!!

fiona · Jul 8, 2024

FYI, please post/see here for updates regarding this issue: https://forum.proxmox.com/threads/q...-no-subscription-as-of-now.149772/post-682012

hm-trustinfo · Jul 8, 2024

entith said:
apt install pve-qemu-kvm:amd64=8.2.2-1 downgraded the package and solved this issue for me.

Thank this unblock the import tool for me

Search

Search

Error while moving disks to Ceph but not vice versa

ntnll

New Member

Neobin

Distinguished Member

ntnll

New Member

ntnll

New Member

gfngfn256

Distinguished Member

entith

New Member

ntnll

New Member

Neobin

Distinguished Member

ntnll

New Member

fiona

Proxmox Staff Member

hm-trustinfo

New Member

We value your privacy