[SOLVED] "Cannot move disk - output file is smaller than input file" and "rbd error: rbd: listing images failed: (2) No such file or directory (500)"

elterminatore · Nov 26, 2019

hey guys,

because of a datacenter move, i've migrated a lot of VMs to local storage (ssd) and moved these disks to a new proxmox cluster in the new datacenter. then i moved the vm disks into the new ceph (same proxmox cluster).

only one VM won't move the storage from local to ceph. the error was "Cannot move disk - output file is smaller than input file"
after a few tries i used the mentioned workaround

with vzdump backup and restore on ceph storage

as described here:
https://bugzilla.proxmox.com/show_bug.cgi?id=963
it worked.

after that, i want to take a look on the RBD via gui, but it failed with the same error as discribed here:
https://forum.proxmox.com/threads/r...failed-2-no-such-file-or-directory-500.56577/
"

rbd error: rbd: listing images failed: (2) No such file or directory (500)

"
this is the same error i get, when i use the command "rbd ls -l ssdpool1". it shows me all disks on RBD and this error on the end.

yes, i have

keyring for the pool in place. (/etc/pve/priv/ceph/ssdpool1.keyring)

any ideas?
which Information can i provide?

regards
stefan

Alwin · Nov 27, 2019

Is there anything in the ceph logs or the journal/syslog?

On wich pveversion -v are you?

elterminatore · Nov 27, 2019

oh... because the list output from "rbd ls" ist long, i haven't seen the message on the beginning of the output....

Code:

~# rbd ls -l ssdpool1

2019-11-27 14:37:44.533 7f6dbe7fc700 -1 librbd::io::AioCompletion: 0x565279493400 fail: (2) No such file or directory

rbd: error opening vm-9072-disk-2: (2) No such file or directory

2019-11-27 14:37:44.541 7f6dbe7fc700 -1 librbd::io::AioCompletion: 0x5652794fbda0 fail: (2) No such file or directory

rbd: error opening vm-9072-disk-1: (2) No such file or directory

2019-11-27 14:37:44.549 7f6dbe7fc700 -1 librbd::io::AioCompletion: 0x565279179e40 fail: (2) No such file or directory

rbd: error opening vm-9072-disk-3: (2) No such file or directory

NAME           SIZE    PARENT FMT PROT LOCK

vm-1000-disk-0  20 GiB          2           

vm-9003-disk-0 200 GiB          2      excl

vm-9004-disk-0  10 GiB          2      excl

vm-9005-disk-0  32 GiB          2      excl

[...]

vm-9084-disk-1 100 GiB          2      excl

vm-9085-disk-0  10 GiB          2      excl

vm-9085-disk-1 100 GiB          2           

rbd: listing images failed: (2) No such file or directory

... but i can't find anything in the log files. (at this time and for this command. i can try to find the something from the failed disk move a few days ago.)
i think the referenced images are in the "no such file or directory" error are the leftover from the failed disk move to ceph. but how can i fix this?

~# pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-5-pve)
pve-manager: 6.0-15 (running version: 6.0-15/52b91481)
pve-kernel-helper: 6.0-12
pve-kernel-5.0: 6.0-11
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-4-pve: 5.0.21-9
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-4
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-8
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-11
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-9
pve-cluster: 6.0-9
pve-container: 3.0-13
pve-docs: 6.0-9
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-8
pve-firmware: 3.0-4
pve-ha-manager: 3.0-5
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-16
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1

Alwin · Nov 27, 2019

elterminatore said:
2019-11-27 14:37:44.533 7f6dbe7fc700 -1 librbd::io::AioCompletion: 0x565279493400 fail: (2) No such file or directory rbd: error opening vm-9072-disk-2: (2) No such file or directory

Is the dmesg showing anything, that might give a clue? Did you run updates recently?

elterminatore · Nov 27, 2019

Alwin said:
Is the dmesg showing anything, that might give a clue? Did you run updates recently?

no :-(

Alwin said:
Did you run updates recently?

it's a fresh cluster.
ceph is installed and initialized a few days ago. after that no updates.

Alwin · Nov 27, 2019

Then there might be something in the ceph logs, /var/log/ceph/. When you run the command, does this message show up on all nodes?

elterminatore · Nov 27, 2019

no... nothing in /var/log/ceph/*log
yes... the message is shown on all nodes, when i execute the command. (also on compute nodes without osd/mon/mgr)

elterminatore · Nov 27, 2019

i found the solution here:
https://forum.proxmox.com/threads/r...-such-file-or-directory-500.56577/post-263056

the error is gone after i deleted the (not visible) disk images from the failed "qm move_disk" command.

rbd rm vm-9072-disk-1 -p ssdpool1
rbd rm vm-9072-disk-2 -p ssdpool1
rbd rm vm-9072-disk-3 -p ssdpool1

it took a while (1,5 TB) and it doesn't free any space in the pool. but now the "rbd ls -l ssdpool1" works as well as the image listing in the gui.

EDIT:
i am wondering about myself, because i've linked the same URL in my first post. maybe i'm a little bit confused about the fact, that the disk image was neither visible nor found ("No such file or directory"), but i have to delete it.

Alwin · Nov 28, 2019

elterminatore said:
i am wondering about myself, because i've linked the same URL in my first post. maybe i'm a little bit confused about the fact, that the disk image was neither visible nor found ("No such file or directory"), but i have to delete it.

Fascinating. My inner Spock tells me that: That should not have happened. Do you still have the log/output of the move disk command?

elterminatore · Nov 28, 2019

only a unspectacular task log. see below. (i've shortended it)
after 99,99% the "cancelling block job" ... and i don't know why.
but this was the reason for the not visible disks on rbd as discribed above. the "removing image" after that cancellation probably did not work.

~# cat /var/log/pve/tasks/2/UPID\:node0217\:000E1D62\:012ADE32\:5DDB5E12\:qmmove\:9072\:root@pam\:
create full clone of drive scsi1 (localmigrate:9072/vm-9072-disk-1.raw)
drive mirror is starting for drive-scsi1
drive-scsi1: transferred: 0 bytes remaining: 536870912000 bytes total: 536870912000 bytes progression: 0.00 % busy: 1 ready: 0
drive-scsi1: transferred: 75497472 bytes remaining: 536795414528 bytes total: 536870912000 bytes progression: 0.01 % busy: 1 ready: 0
drive-scsi1: transferred: 130023424 bytes remaining: 536740888576 bytes total: 536870912000 bytes progression: 0.02 % busy: 1 ready: 0
drive-scsi1: transferred: 180355072 bytes remaining: 536690556928 bytes total: 536870912000 bytes progression: 0.03 % busy: 1 ready: 0
drive-scsi1: transferred: 213909504 bytes remaining: 536657002496 bytes total: 536870912000 bytes progression: 0.04 % busy: 1 ready: 0
drive-scsi1: transferred: 268435456 bytes remaining: 536602476544 bytes total: 536870912000 bytes progression: 0.05 % busy: 1 ready: 0
[...]
drive-scsi1: transferred: 536574164992 bytes remaining: 300548096 bytes total: 536874713088 bytes progression: 99.94 % busy: 1 ready: 0
drive-scsi1: transferred: 536640225280 bytes remaining: 234487808 bytes total: 536874713088 bytes progression: 99.96 % busy: 1 ready: 0
drive-scsi1: transferred: 536711528448 bytes remaining: 163184640 bytes total: 536874713088 bytes progression: 99.97 % busy: 1 ready: 0
drive-scsi1: transferred: 536787025920 bytes remaining: 87687168 bytes total: 536874713088 bytes progression: 99.98 % busy: 1 ready: 0
drive-scsi1: transferred: 536833163264 bytes remaining: 41549824 bytes total: 536874713088 bytes progression: 99.99 % busy: 1 ready: 0
drive-scsi1: Cancelling block job
drive-scsi1: Done.
Removing image: 1% complete...
Removing image: 2% complete...
Removing image: 3% complete...
[...]
Removing image: 97% complete...
Removing image: 98% complete...
Removing image: 99% complete...
Removing image: 100% complete...done.
TASK ERROR: storage migration failed: mirroring error: drive-scsi1: mirroring has been cancelled

i moved over 80 disks fom local ssd storages to ceph and only one disk failed to migrate. ¯\_(ツ)_/¯

Alwin · Nov 28, 2019

elterminatore said:
i moved over 80 disks fom local ssd storages to ceph and only one disk failed to migrate. ¯\_(ツ)_/¯

Hm... then nothing to reproduce. Glad that you found a solution though.

elterminatore · Nov 28, 2019

i have a copy from the raw disk on the local storage and can try to reproduce it. give me some time. any debug options available for qm move_disk? i havo found nothing.

Alwin · Nov 28, 2019

elterminatore said:
i have a copy from the raw disk on the local storage and can try to reproduce it.

Within moving of 80 disks, there would have been more than one incident.

elterminatore said:
give me some time. any debug options available for qm move_disk? i havo found nothing.

There are none. The tools are Ceph and Qemu that are run. Debugging would start there. But only if this was reproducible.

Search

Search

[SOLVED] "Cannot move disk - output file is smaller than input file" and "rbd error: rbd: listing images failed: (2) No such file or directory (500)"

elterminatore

Active Member

Alwin

Proxmox Retired Staff

elterminatore

Active Member

Alwin

Proxmox Retired Staff

elterminatore

Active Member

Alwin

Proxmox Retired Staff

elterminatore

Active Member

elterminatore

Active Member

Alwin

Proxmox Retired Staff

elterminatore

Active Member

Alwin

Proxmox Retired Staff

elterminatore

Active Member

Alwin

Proxmox Retired Staff

We value your privacy