[SOLVED] "Cannot move disk - output file is smaller than input file" and "rbd error: rbd: listing images failed: (2) No such file or directory (500)"

elterminatore

Active Member
Jun 18, 2018
47
3
28
49
hey guys,

because of a datacenter move, i've migrated a lot of VMs to local storage (ssd) and moved these disks to a new proxmox cluster in the new datacenter. then i moved the vm disks into the new ceph (same proxmox cluster).

only one VM won't move the storage from local to ceph. the error was "Cannot move disk - output file is smaller than input file"
after a few tries i used the mentioned workaround
with vzdump backup and restore on ceph storage
as described here:
https://bugzilla.proxmox.com/show_bug.cgi?id=963
it worked.

after that, i want to take a look on the RBD via gui, but it failed with the same error as discribed here:
https://forum.proxmox.com/threads/r...failed-2-no-such-file-or-directory-500.56577/
"
rbd error: rbd: listing images failed: (2) No such file or directory (500)
"
this is the same error i get, when i use the command "rbd ls -l ssdpool1". it shows me all disks on RBD and this error on the end.

yes, i have
keyring for the pool in place. (/etc/pve/priv/ceph/ssdpool1.keyring)

any ideas?
which Information can i provide?

regards
stefan
 
Is there anything in the ceph logs or the journal/syslog?

On wich pveversion -v are you?
 
oh... because the list output from "rbd ls" ist long, i haven't seen the message on the beginning of the output....

Code:
~# rbd ls -l ssdpool1

2019-11-27 14:37:44.533 7f6dbe7fc700 -1 librbd::io::AioCompletion: 0x565279493400 fail: (2) No such file or directory

rbd: error opening vm-9072-disk-2: (2) No such file or directory

2019-11-27 14:37:44.541 7f6dbe7fc700 -1 librbd::io::AioCompletion: 0x5652794fbda0 fail: (2) No such file or directory

rbd: error opening vm-9072-disk-1: (2) No such file or directory

2019-11-27 14:37:44.549 7f6dbe7fc700 -1 librbd::io::AioCompletion: 0x565279179e40 fail: (2) No such file or directory

rbd: error opening vm-9072-disk-3: (2) No such file or directory

NAME           SIZE    PARENT FMT PROT LOCK

vm-1000-disk-0  20 GiB          2           

vm-9003-disk-0 200 GiB          2      excl

vm-9004-disk-0  10 GiB          2      excl

vm-9005-disk-0  32 GiB          2      excl

[...]

vm-9084-disk-1 100 GiB          2      excl

vm-9085-disk-0  10 GiB          2      excl

vm-9085-disk-1 100 GiB          2           

rbd: listing images failed: (2) No such file or directory

... but i can't find anything in the log files. (at this time and for this command. i can try to find the something from the failed disk move a few days ago.)
i think the referenced images are in the "no such file or directory" error are the leftover from the failed disk move to ceph. but how can i fix this?

~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-5-pve)
pve-manager: 6.0-15 (running version: 6.0-15/52b91481)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-4-pve: 5.0.21-9
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-4
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-8
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-11
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-9
pve-cluster: 6.0-9
pve-container: 3.0-13
pve-docs: 6.0-9
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-8
pve-firmware: 3.0-4
pve-ha-manager: 3.0-5
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-16
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
 
2019-11-27 14:37:44.533 7f6dbe7fc700 -1 librbd::io::AioCompletion: 0x565279493400 fail: (2) No such file or directory rbd: error opening vm-9072-disk-2: (2) No such file or directory
Is the dmesg showing anything, that might give a clue? Did you run updates recently?
 
Then there might be something in the ceph logs, /var/log/ceph/. When you run the command, does this message show up on all nodes?
 
no... nothing in /var/log/ceph/*log
yes... the message is shown on all nodes, when i execute the command. (also on compute nodes without osd/mon/mgr)
 
i found the solution here:
https://forum.proxmox.com/threads/r...-such-file-or-directory-500.56577/post-263056

the error is gone after i deleted the (not visible) disk images from the failed "qm move_disk" command.

rbd rm vm-9072-disk-1 -p ssdpool1
rbd rm vm-9072-disk-2 -p ssdpool1
rbd rm vm-9072-disk-3 -p ssdpool1

it took a while (1,5 TB) and it doesn't free any space in the pool. but now the "rbd ls -l ssdpool1" works as well as the image listing in the gui.

EDIT:
i am wondering about myself, because i've linked the same URL in my first post. maybe i'm a little bit confused about the fact, that the disk image was neither visible nor found ("No such file or directory"), but i have to delete it.
 
Last edited:
i am wondering about myself, because i've linked the same URL in my first post. maybe i'm a little bit confused about the fact, that the disk image was neither visible nor found ("No such file or directory"), but i have to delete it.
Fascinating. My inner Spock tells me that: That should not have happened. Do you still have the log/output of the move disk command?
 
only a unspectacular task log. see below. (i've shortended it)
after 99,99% the "cancelling block job" ... and i don't know why.
but this was the reason for the not visible disks on rbd as discribed above. the "removing image" after that cancellation probably did not work.

~# cat /var/log/pve/tasks/2/UPID\:node0217\:000E1D62\:012ADE32\:5DDB5E12\:qmmove\:9072\:root@pam\:
create full clone of drive scsi1 (localmigrate:9072/vm-9072-disk-1.raw)
drive mirror is starting for drive-scsi1
drive-scsi1: transferred: 0 bytes remaining: 536870912000 bytes total: 536870912000 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi1: transferred: 75497472 bytes remaining: 536795414528 bytes total: 536870912000 bytes progression: 0.01 % busy: 1 ready: 0
drive-scsi1: transferred: 130023424 bytes remaining: 536740888576 bytes total: 536870912000 bytes progression: 0.02 % busy: 1 ready: 0
drive-scsi1: transferred: 180355072 bytes remaining: 536690556928 bytes total: 536870912000 bytes progression: 0.03 % busy: 1 ready: 0
drive-scsi1: transferred: 213909504 bytes remaining: 536657002496 bytes total: 536870912000 bytes progression: 0.04 % busy: 1 ready: 0
drive-scsi1: transferred: 268435456 bytes remaining: 536602476544 bytes total: 536870912000 bytes progression: 0.05 % busy: 1 ready: 0
[...]
drive-scsi1: transferred: 536574164992 bytes remaining: 300548096 bytes total: 536874713088 bytes progression: 99.94 % busy: 1 ready: 0
drive-scsi1: transferred: 536640225280 bytes remaining: 234487808 bytes total: 536874713088 bytes progression: 99.96 % busy: 1 ready: 0
drive-scsi1: transferred: 536711528448 bytes remaining: 163184640 bytes total: 536874713088 bytes progression: 99.97 % busy: 1 ready: 0
drive-scsi1: transferred: 536787025920 bytes remaining: 87687168 bytes total: 536874713088 bytes progression: 99.98 % busy: 1 ready: 0
drive-scsi1: transferred: 536833163264 bytes remaining: 41549824 bytes total: 536874713088 bytes progression: 99.99 % busy: 1 ready: 0
drive-scsi1: Cancelling block job
drive-scsi1: Done.
Removing image: 1% complete...
Removing image: 2% complete...
Removing image: 3% complete...
[...]
Removing image: 97% complete...
Removing image: 98% complete...
Removing image: 99% complete...
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: mirroring error: drive-scsi1: mirroring has been cancelled


i moved over 80 disks fom local ssd storages to ceph and only one disk failed to migrate. ¯\_(ツ)_/¯
 
i moved over 80 disks fom local ssd storages to ceph and only one disk failed to migrate. ¯\_(ツ)_/¯
Hm... then nothing to reproduce. Glad that you found a solution though.
 
i have a copy from the raw disk on the local storage and can try to reproduce it. give me some time. any debug options available for qm move_disk? i havo found nothing.
 
i have a copy from the raw disk on the local storage and can try to reproduce it.
Within moving of 80 disks, there would have been more than one incident.

give me some time. any debug options available for qm move_disk? i havo found nothing.
There are none. The tools are Ceph and Qemu that are run. Debugging would start there. But only if this was reproducible.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!