[SOLVED] Ceph Migrations Are Failing

MrPaul · Mar 27, 2020

My ceph cluster shows all is healthy and my VMs on seperate servers that are running on ceph mounts are up but I'm unable to migrate from one host to another.

When I attempt to migrate I get this message.

2020-03-27 09:10:32 starting migration of VM 110 to node 'proxmox-ceph-1' (10.237.195.4)
Job for mnt-pve-cephfs.mount failed.
2020-03-27 09:10:32 ERROR: Failed to sync data - mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details.
2020-03-27 09:10:32 aborting phase 1 - cleanup resources
2020-03-27 09:10:32 ERROR: migration aborted (duration 00:00:00): Failed to sync data - mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details.
TASK ERROR: migration aborted

Looking at the logs I'm seeing indications of a config issue.

root@proxmox-compute-1:~# systemctl status mnt-pve-cephfs.mount
● mnt-pve-cephfs.mount - /mnt/pve/cephfs
Loaded: loaded (/run/systemd/system/mnt-pve-cephfs.mount; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-03-27 09:19:07 CDT; 1s ago
Where: /mnt/pve/cephfs
What: 192.168.0.4,192.168.0.5,192.168.0.6:/

Mar 27 09:19:07 proxmox-compute-1 systemd[1]: Mounting /mnt/pve/cephfs...
Mar 27 09:19:07 proxmox-compute-1 mount[13149]: mount error 22 = Invalid argument
Mar 27 09:19:07 proxmox-compute-1 systemd[1]: mnt-pve-cephfs.mount: Mount process exited, code=exited, status=22/n/a
Mar 27 09:19:07 proxmox-compute-1 systemd[1]: mnt-pve-cephfs.mount: Failed with result 'exit-code'.
Mar 27 09:19:07 proxmox-compute-1 systemd[1]: Failed to mount /mnt/pve/cephfs.
root@proxmox-compute-1:~# journalctl -xe
-- The job identifier is 4168 and the job result is failed.
Mar 27 09:19:17 proxmox-compute-1 pvestatd[1817]: mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details.
Mar 27 09:19:17 proxmox-compute-1 kernel: libceph: bad option at 'conf=/etc/pve/ceph.conf'
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: Reloading.
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: Mounting /mnt/pve/cephfs...
-- Subject: A start job for unit mnt-pve-cephfs.mount has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit mnt-pve-cephfs.mount has begun execution.
--
-- The job identifier is 4181.
Mar 27 09:19:27 proxmox-compute-1 mount[13299]: mount error 22 = Invalid argument
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: mnt-pve-cephfs.mount: Mount process exited, code=exited, status=22/n/a
-- Subject: Unit process exited
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- An n/a= process belonging to unit mnt-pve-cephfs.mount has exited.
--
-- The process' exit code is 'exited' and its exit status is 22.
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: mnt-pve-cephfs.mount: Failed with result 'exit-code'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit mnt-pve-cephfs.mount has entered the 'failed' state with result 'exit-code'.
Mar 27 09:19:27 proxmox-compute-1 systemd[1]: Failed to mount /mnt/pve/cephfs.
-- Subject: A start job for unit mnt-pve-cephfs.mount has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit mnt-pve-cephfs.mount has finished with a failure.
--
-- The job identifier is 4181 and the job result is failed.
Mar 27 09:19:27 proxmox-compute-1 kernel: libceph: bad option at 'conf=/etc/pve/ceph.conf'
Mar 27 09:19:27 proxmox-compute-1 pvestatd[1817]: mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details.

This is the contents of the file mentioned in the journal but I don't see anything that stands out.

root@proxmox-compute-1:~# cat /etc/pve/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.0.4/24
fsid = f8d6430f-0df8-4ec5-b78a-d8956832b0de
mon_allow_pool_delete = true
mon_host = 192.168.0.4 192.168.0.5 192.168.0.6
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 192.168.0.4/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.proxmox-ceph-2]
host = proxmox-ceph-2
mds_standby_for_name = pve

[mds.proxmox-ceph-1]
host = proxmox-ceph-1
mds_standby_for_name = pve

[mds.proxmox-ceph-3]
host = proxmox-ceph-3
mds standby for name = pve

Alwin · Mar 27, 2020

MrPaul said:
My ceph cluster shows all is healthy and my VMs on seperate servers that are running on ceph mounts are up but I'm unable to migrate from one host to another.

Are you using CephFS as storage? Possibly dmesg has more output to libceph.

MrPaul · Mar 27, 2020

Yes, I'm using cephfs on my "compute" hosts.

dmesg doesn't have any output that seems useful either.

[ 2254.215075] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2264.629411] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2273.844205] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2284.299461] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2293.917627] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2304.381354] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2314.841525] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2324.253654] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2334.544215] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2344.187184] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2354.604291] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2363.901426] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[ 2374.575711] libceph: bad option at 'conf=/etc/pve/ceph.conf'

Alwin · Mar 27, 2020

MrPaul said:
Yes, I'm using cephfs on my "compute" hosts.

But as VM/CT disk storage? Or do you have the installation ISOs there? If they are still mounted on the VM, than it needs them on the target side as well. Migration should still work for those VM/CT that do not use the cephfs storage.

MrPaul said:
[ 2254.215075] libceph: bad option at 'conf=/etc/pve/ceph.conf'

Can you post the storage.cfg?

MrPaul · Mar 27, 2020

Although there are ISOs on that as well the VM that I'm attempting to migrate doesn't have have one mounted. Below is a screenshot showing that VM setup.

root@proxmox-compute-1:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content iso,backup,vztmpl

lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir

rbd: ceph
content images,rootdir
krbd 0
pool ceph

cephfs: cephfs
path /mnt/pve/cephfs
content vztmpl,backup,iso

Alwin · Mar 27, 2020

On what pveversion -v are you? And can you manually mount the cephfs storage on the target node?
https://docs.ceph.com/docs/nautilus/cephfs/kernel/

MrPaul · Mar 27, 2020

Here is my version info.

root@proxmox-compute-1:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.18-1-pve: 5.3.18-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-17
libpve-guest-common-perl: 3.0-5
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 4.0.1-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-23
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-6
pve-ha-manager: 3.0-9
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-7
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

I've tried quite a few different mechanisims to mount this with no luck.

root@proxmox-compute-1:~# mount -t ceph 192.168.0.4:/ tmpmnt/
mount error 22 = Invalid argument
root@proxmox-compute-1:~# mount -t ceph 192.168.0.4:/ /root/tmpmnt/
mount error 22 = Invalid argument
root@proxmox-compute-1:~# # The following is exactly how the mount shows on the ceph nodes (where this works)
root@proxmox-compute-1:~# mount -t ceph 192.168.0.4,192.168.0.5,192.168.0.6:/ /root/tmpmnt/
mount error 22 = Invalid argument

Looking through the dmesg output after doing this a few times I saw this.

[11724.593731] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11733.892814] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11744.800401] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11754.009752] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11763.693132] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11772.208864] libceph: no secret set (for auth_x protocol)
[11772.209309] libceph: error -22 on auth protocol 2 init
[11773.911437] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11784.519757] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11787.344037] libceph: no secret set (for auth_x protocol)
[11787.344468] libceph: error -22 on auth protocol 2 init
[11793.802249] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11804.421218] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[11808.078088] libceph: no secret set (for auth_x protocol)
[11808.078542] libceph: error -22 on auth protocol 2 init
[11813.671256] libceph: bad option at 'conf=/etc/pve/ceph.conf'

I don't recall setting up authentication but maybe I did. Been some time. Anyway, I logged into one of the ceph nodes and grabbed this.

root@proxmox-ceph-1:~# ceph auth ls
{% REDATED %}
client.admin
key: {% REDATED %}
caps: [mds] allow *
caps: [mgr] allow *
caps: [mon] allow *
caps: [osd] allow *
{% REDATED %}

Using that admin user I gave it a few more tries with SUCCESS.

root@proxmox-compute-1:~# mount -t ceph 192.168.0.4:/ /root/tmpmnt/ -o name=admin,secret={% REDATED %}
root@proxmox-compute-1:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 126G 0 126G 0% /dev
tmpfs 26G 27M 26G 1% /run
/dev/mapper/pve-root 15G 5.7G 7.9G 42% /
tmpfs 126G 63M 126G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/fuse 30M 68K 30M 1% /etc/pve
tmpfs 26G 0 26G 0% /run/user/0
192.168.0.4:/ 14T 67G 14T 1% /root/tmpmnt
root@proxmox-compute-1:~# umount tmpmnt
root@proxmox-compute-1:~#

Also with the whole cluster.

root@proxmox-compute-1:~# mount -t ceph 192.168.0.4,192.168.0.5,192.168.0.6:/ /root/tmpmnt/ -o name=admin,secret={% REDATED %}
root@proxmox-compute-1:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 126G 0 126G 0% /dev
tmpfs 26G 27M 26G 1% /run
/dev/mapper/pve-root 15G 5.7G 7.9G 42% /
tmpfs 126G 63M 126G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 126G 0 126G 0% /sys/fs/cgroup
/dev/fuse 30M 68K 30M 1% /etc/pve
tmpfs 26G 0 26G 0% /run/user/0
192.168.0.4,192.168.0.5,192.168.0.6:/ 14T 67G 14T 1% /root/tmpmnt
root@proxmox-compute-1:~# umount tmpmnt

Now that we know what the problem is can you help me with the solution?

Should I be mounting this with the admin account? I don't really have any security concerns as this is a lab enviornment that's under lock & key.
How do I fix the mounts so they auth properly? I don't see anything in /etc/fstab so I'm not sure where this is being called.

MrPaul · Mar 27, 2020

Hoping it was something as simple as the authentication being setup wrong I followed the steps in https://pve.proxmox.com/wiki/Storage:_CephFS. Before I made any changes I validated that the appropriate keys were in place and they all are.

Alwin · Mar 30, 2020

MrPaul said:
Should I be mounting this with the admin account? I don't really have any security concerns as this is a lab enviornment that's under lock & key.

The secret needs to fit the user it is intended for. So, if the secret is from the admin user, then it needs to be the admin for login.

MrPaul · Mar 30, 2020

My cephfs clients are configured as follows.

Code:

/etc/pve/priv/ceph/ceph.keyring
[client.admin]
        key = {{ REDATED }}
        caps mds = "allow *"
        caps mgr = "allow *"
        caps mon = "allow *"
        caps osd = "allow *"

Code:

/etc/pve/priv/ceph/cephfs.secret
{{ REDATED }}

The {{ REDATED }} key is the exact same one that works successfully in this mount command.

Code:

mount -t ceph 192.168.0.4,192.168.0.5,192.168.0.6:/ /root/tmpmnt/ -o name=admin,secret={{ REDATED }}

I'm still unsure what the configuration issue is that's causing this output in dmesg.

Code:

[273663.098327] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[273672.327422] libceph: bad option at 'conf=/etc/pve/ceph.conf'
[273682.826444] libceph: bad option at 'conf=/etc/pve/ceph.conf'

Code:

cat /etc/pve/ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 192.168.0.4/24
         fsid = f8d6430f-0df8-4ec5-b78a-d8956832b0de
         mon_allow_pool_delete = true
         mon_host = 192.168.0.4 192.168.0.5 192.168.0.6
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 192.168.0.4/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.proxmox-ceph-2]
         host = proxmox-ceph-2
         mds_standby_for_name = pve

[mds.proxmox-ceph-1]
         host = proxmox-ceph-1
         mds_standby_for_name = pve

[mds.proxmox-ceph-3]
         host = proxmox-ceph-3
         mds standby for name = pve

Alwin · Mar 30, 2020

The Ceph cluster is external to the Proxmox VE nodes? What Ceph version is it running?

MrPaul · Mar 30, 2020

Ceph is 14.2.8
Proxmox VE is 6.1-8

Yes, I've got a 3 node ceph cluster with 7 compute hosts (10 servers total)

Alwin · Mar 31, 2020

MrPaul said:
Ceph is 14.2.8

Best install the Nautilus packages on the Proxmox VE compute nodes. As they seem to be Luminous ceph-fuse: 12.2.11+dfsg1-2.1+b1 clients.

Add the repository and update the existing packages.
https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_package_repositories_ceph

MrPaul · Mar 31, 2020

Thanks alot for your time here. I was under the impression that the cephfs clients didn't need the ceph stuff installed since it was working without it until a few weeks ago...not sure exactly what upgrade broke this. I do see the documentation clearly states that ceph clients also need this.

After adding that repo and updating all is well with the clients again now.

Alwin · Mar 31, 2020

MrPaul said:
After adding that repo and updating all is well with the clients again now.

Nice to hear that.

MrPaul said:
Thanks alot for your time here. I was under the impression that the cephfs clients didn't need the ceph stuff installed since it was working without it until a few weeks ago...not sure exactly what upgrade broke this. I do see the documentation clearly states that ceph clients also need this.

The stock packages would usually be enough to connect to other Ceph clusters. But not all features/fixes may be in the older version. It is always recommended to run the same Ceph version on the client as on the cluster.

Search

Search

[SOLVED] Ceph Migrations Are Failing

MrPaul

Active Member

Alwin

Proxmox Retired Staff

MrPaul

Active Member

Alwin

Proxmox Retired Staff

MrPaul

Active Member

Alwin

Proxmox Retired Staff

MrPaul

Active Member

MrPaul

Active Member

Alwin

Proxmox Retired Staff

MrPaul

Active Member

Alwin

Proxmox Retired Staff

MrPaul

Active Member

Alwin

Proxmox Retired Staff

MrPaul

Active Member

Alwin

Proxmox Retired Staff