[SOLVED] Ceph Migrations Are Failing

MrPaul

Active Member
Apr 27, 2019
34
0
26
50
My ceph cluster shows all is healthy and my VMs on seperate servers that are running on ceph mounts are up but I'm unable to migrate from one host to another.

When I attempt to migrate I get this message.

2020-03-27 09:10:32 starting migration of VM 110 to node 'proxmox-ceph-1' (10.237.195.4)
Job for mnt-pve-cephfs.mount failed.
2020-03-27 09:10:32 ERROR: Failed to sync data - mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details.
2020-03-27 09:10:32 aborting phase 1 - cleanup resources
2020-03-27 09:10:32 ERROR: migration aborted (duration 00:00:00): Failed to sync data - mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details.
TASK ERROR: migration aborted

Looking at the logs I'm seeing indications of a config issue.

root@proxmox-compute-1:~# systemctl status mnt-pve-cephfs.mount
● mnt-pve-cephfs.mount - /mnt/pve/cephfs
Loaded: loaded (/run/systemd/system/mnt-pve-cephfs.mount; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-03-27 09:19:07 CDT; 1s ago
Where: /mnt/pve/cephfs
What: 192.168.0.4,192.168.0.5,192.168.0.6:/

Mar 27 09:19:07 proxmox-compute-1 systemd[1]: Mounting /mnt/pve/cephfs...
Mar 27 09:19:07 proxmox-compute-1 mount[13149]: mount error 22 = Invalid argument
Mar 27 09:19:07 proxmox-compute-1 systemd[1]: mnt-pve-cephfs.mount: Mount process exited, code=exited, status=22/n/a
Mar 27 09:19:07 proxmox-compute-1 systemd[1]: mnt-pve-cephfs.mount: Failed with result 'exit-code'.
Mar 27 09:19:07 proxmox-compute-1 systemd[1]: Failed to mount /mnt/pve/cephfs.
root@proxmox-compute-1:~# journalctl -xe
-- The job identifier is 4168 and the job result is failed.
Mar 27 09:19:17 proxmox-compute-1 pvestatd[1817]: mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details.
Mar 27 09:19:17 proxmox-compute-1 kernel: libceph: bad option at 'conf=/etc/pve/ceph.conf'
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: Reloading.
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: Mounting /mnt/pve/cephfs...
-- Subject: A start job for unit mnt-pve-cephfs.mount has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit mnt-pve-cephfs.mount has begun execution.
--
-- The job identifier is 4181.
Mar 27 09:19:27 proxmox-compute-1 mount[13299]: mount error 22 = Invalid argument
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: mnt-pve-cephfs.mount: Mount process exited, code=exited, status=22/n/a
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An n/a= process belonging to unit mnt-pve-cephfs.mount has exited.
--
-- The process' exit code is 'exited' and its exit status is 22.
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: mnt-pve-cephfs.mount: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit mnt-pve-cephfs.mount has entered the 'failed' state with result 'exit-code'.
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: Failed to mount /mnt/pve/cephfs.
-- Subject: A start job for unit mnt-pve-cephfs.mount has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit mnt-pve-cephfs.mount has finished with a failure.
--
-- The job identifier is 4181 and the job result is failed.
Mar 27 09:19:27 proxmox-compute-1 kernel: libceph: bad option at 'conf=/etc/pve/ceph.conf'
Mar 27 09:19:27 proxmox-compute-1 pvestatd[1817]: mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details.

This is the contents of the file mentioned in the journal but I don't see anything that stands out.

root@proxmox-compute-1:~# cat /etc/pve/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.0.4/24
fsid = f8d6430f-0df8-4ec5-b78a-d8956832b0de
mon_allow_pool_delete = true
mon_host = 192.168.0.4 192.168.0.5 192.168.0.6
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 192.168.0.4/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.proxmox-ceph-2]
host = proxmox-ceph-2
mds_standby_for_name = pve

[mds.proxmox-ceph-1]
host = proxmox-ceph-1
mds_standby_for_name = pve

[mds.proxmox-ceph-3]
host = proxmox-ceph-3
mds standby for name = pve
 
Last edited:
My ceph cluster shows all is healthy and my VMs on seperate servers that are running on ceph mounts are up but I'm unable to migrate from one host to another.
Are you using CephFS as storage? Possibly dmesg has more output to libceph.
 
Yes, I'm using cephfs on my "compute" hosts.

dmesg doesn't have any output that seems useful either.

[ 2254.215075] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2264.629411] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2273.844205] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2284.299461] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2293.917627] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2304.381354] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2314.841525] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2324.253654] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2334.544215] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2344.187184] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2354.604291] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2363.901426] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2374.575711] libceph: bad option at 'conf=/etc/pve/ceph.conf'
 
Yes, I'm using cephfs on my "compute" hosts.
But as VM/CT disk storage? Or do you have the installation ISOs there? If they are still mounted on the VM, than it needs them on the target side as well. Migration should still work for those VM/CT that do not use the cephfs storage.

[ 2254.215075] libceph: bad option at 'conf=/etc/pve/ceph.conf'
Can you post the storage.cfg?
 
Although there are ISOs on that as well the VM that I'm attempting to migrate doesn't have have one mounted. Below is a screenshot showing that VM setup.

1585322159214.png

root@proxmox-compute-1:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content iso,backup,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

rbd: ceph
content images,rootdir
krbd 0
pool ceph

cephfs: cephfs
path /mnt/pve/cephfs
content vztmpl,backup,iso
 
Here is my version info.

root@proxmox-compute-1:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.18-1-pve: 5.3.18-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

I've tried quite a few different mechanisims to mount this with no luck.

root@proxmox-compute-1:~# mount -t ceph 192.168.0.4:/ tmpmnt/
mount error 22 = Invalid argument
root@proxmox-compute-1:~# mount -t ceph 192.168.0.4:/ /root/tmpmnt/
mount error 22 = Invalid argument
root@proxmox-compute-1:~# # The following is exactly how the mount shows on the ceph nodes (where this works)
root@proxmox-compute-1:~# mount -t ceph 192.168.0.4,192.168.0.5,192.168.0.6:/ /root/tmpmnt/
mount error 22 = Invalid argument

Looking through the dmesg output after doing this a few times I saw this.

[11724.593731] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11733.892814] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11744.800401] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11754.009752] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11763.693132] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11772.208864] libceph: no secret set (for auth_x protocol)
[11772.209309] libceph: error -22 on auth protocol 2 init
[11773.911437] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11784.519757] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11787.344037] libceph: no secret set (for auth_x protocol)
[11787.344468] libceph: error -22 on auth protocol 2 init
[11793.802249] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11804.421218] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11808.078088] libceph: no secret set (for auth_x protocol)
[11808.078542] libceph: error -22 on auth protocol 2 init
[11813.671256] libceph: bad option at 'conf=/etc/pve/ceph.conf'

I don't recall setting up authentication but maybe I did. Been some time. Anyway, I logged into one of the ceph nodes and grabbed this.

root@proxmox-ceph-1:~# ceph auth ls
{% REDATED %}
client.admin
key: {% REDATED %}
caps: [mds] allow *
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *
{% REDATED %}

Using that admin user I gave it a few more tries with SUCCESS.

root@proxmox-compute-1:~# mount -t ceph 192.168.0.4:/ /root/tmpmnt/ -o name=admin,secret={% REDATED %}
root@proxmox-compute-1:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 126G 0 126G 0% /dev
tmpfs 26G 27M 26G 1% /run
/dev/mapper/pve-root 15G 5.7G 7.9G 42% /
tmpfs 126G 63M 126G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/fuse 30M 68K 30M 1% /etc/pve
tmpfs 26G 0 26G 0% /run/user/0
192.168.0.4:/ 14T 67G 14T 1% /root/tmpmnt
root@proxmox-compute-1:~# umount tmpmnt
root@proxmox-compute-1:~#

Also with the whole cluster.

root@proxmox-compute-1:~# mount -t ceph 192.168.0.4,192.168.0.5,192.168.0.6:/ /root/tmpmnt/ -o name=admin,secret={% REDATED %}
root@proxmox-compute-1:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 126G 0 126G 0% /dev
tmpfs 26G 27M 26G 1% /run
/dev/mapper/pve-root 15G 5.7G 7.9G 42% /
tmpfs 126G 63M 126G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/fuse 30M 68K 30M 1% /etc/pve
tmpfs 26G 0 26G 0% /run/user/0
192.168.0.4,192.168.0.5,192.168.0.6:/ 14T 67G 14T 1% /root/tmpmnt
root@proxmox-compute-1:~# umount tmpmnt

Now that we know what the problem is can you help me with the solution?
  • Should I be mounting this with the admin account? I don't really have any security concerns as this is a lab enviornment that's under lock & key.
  • How do I fix the mounts so they auth properly? I don't see anything in /etc/fstab so I'm not sure where this is being called.
 
Should I be mounting this with the admin account? I don't really have any security concerns as this is a lab enviornment that's under lock & key.
The secret needs to fit the user it is intended for. So, if the secret is from the admin user, then it needs to be the admin for login.
 
My cephfs clients are configured as follows.

Code:
/etc/pve/priv/ceph/ceph.keyring
[client.admin]
        key = {{ REDATED }}
        caps mds = "allow *"
        caps mgr = "allow *"
        caps mon = "allow *"
        caps osd = "allow *"

Code:
/etc/pve/priv/ceph/cephfs.secret
{{ REDATED }}

The {{ REDATED }} key is the exact same one that works successfully in this mount command.

Code:
mount -t ceph 192.168.0.4,192.168.0.5,192.168.0.6:/ /root/tmpmnt/ -o name=admin,secret={{ REDATED }}

I'm still unsure what the configuration issue is that's causing this output in dmesg.

Code:
[273663.098327] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[273672.327422] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[273682.826444] libceph: bad option at 'conf=/etc/pve/ceph.conf'


Code:
cat /etc/pve/ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 192.168.0.4/24
         fsid = f8d6430f-0df8-4ec5-b78a-d8956832b0de
         mon_allow_pool_delete = true
         mon_host = 192.168.0.4 192.168.0.5 192.168.0.6
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 192.168.0.4/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.proxmox-ceph-2]
         host = proxmox-ceph-2
         mds_standby_for_name = pve

[mds.proxmox-ceph-1]
         host = proxmox-ceph-1
         mds_standby_for_name = pve

[mds.proxmox-ceph-3]
         host = proxmox-ceph-3
         mds standby for name = pve
 
The Ceph cluster is external to the Proxmox VE nodes? What Ceph version is it running?
 
Ceph is 14.2.8
Proxmox VE is 6.1-8

Yes, I've got a 3 node ceph cluster with 7 compute hosts (10 servers total)
 
  • Like
Reactions: MrPaul
Thanks alot for your time here. I was under the impression that the cephfs clients didn't need the ceph stuff installed since it was working without it until a few weeks ago...not sure exactly what upgrade broke this. I do see the documentation clearly states that ceph clients also need this.

After adding that repo and updating all is well with the clients again now.
 
After adding that repo and updating all is well with the clients again now.
Nice to hear that.

Thanks alot for your time here. I was under the impression that the cephfs clients didn't need the ceph stuff installed since it was working without it until a few weeks ago...not sure exactly what upgrade broke this. I do see the documentation clearly states that ceph clients also need this.
The stock packages would usually be enough to connect to other Ceph clusters. But not all features/fixes may be in the older version. It is always recommended to run the same Ceph version on the client as on the cluster.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!