Error adding (existing) CephFS

Max P · Dec 27, 2018

Hi,

We have a 4 node proxmox cluster that I just updated to proxmox 5.3 (from 5.2) without any problems.
Now I want to test the new CephFS support in Proxmox 5.3 , but after I add it via the storage menu in the webinterface the cephfs storage entry only has a grey question mark on it.
The syslog contains the following errors:

Code:

Dec 27 16:02:29 pve1 pvestatd[16591]: A filesystem is already mounted on /mnt/pve/cephfs
Dec 27 16:02:29 pve1 pvestatd[16591]: Use of uninitialized value in sort at /usr/share/perl5/PVE/Storage/CephTools.pm line 61.
Dec 27 16:02:29 pve1 pvestatd[16591]: Use of uninitialized value in sort at /usr/share/perl5/PVE/Storage/CephTools.pm line 61.
Dec 27 16:02:29 pve1 pvestatd[16591]: Use of uninitialized value in sort at /usr/share/perl5/PVE/Storage/CephTools.pm line 61.
Dec 27 16:02:29 pve1 pvestatd[16591]: Use of uninitialized value in join or string at /usr/share/perl5/PVE/Storage/CephTools.pm line 63.
Dec 27 16:02:29 pve1 pvestatd[16591]: mount error: exit code 16

After these errors show up the cephfs is mounted in /mnt/pve/..., but there wasn't anything mounted on that path before trying to add cephfs via the webif.
We already had cephfs running before the update to 5.3 (but not mounted locally).
Maybe this is the problem? But how can I get my existing cephfs to show up as a storage location?

Here is the output of "ceph -s" and "ceph fs status".
(I know that there is a health warning and I wanted to get rid of it by moving the locally stored ISOs of pve1 to cephfs now that it's supported as an ISO storage location)

Code:

root@pve1:~# ceph -s
  cluster:
    id:     e9f42f14-bed0-4839-894b-0ca3e598320e
    health: HEALTH_WARN
            mon pve1 is low on available space

  services:
    mon: 3 daemons, quorum pve1,pve2,pve3
    mgr: pve1(active), standbys: pve3, pve2
    mds: cephfs-1/1/1 up  {0=pve1=up:active}
    osd: 48 osds: 48 up, 48 in

  data:
    pools:   10 pools, 3128 pgs
    objects: 5.67M objects, 17.3TiB
    usage:   52.1TiB used, 297TiB / 349TiB avail
    pgs:     3128 active+clean

  io:
    client:   676B/s rd, 21.0KiB/s wr, 0op/s rd, 2op/s wr

Code:

root@pve1:~# ceph fs status
cephfs - 0 clients
======
+------+--------+------+---------------+-------+-------+
| Rank | State  | MDS  |    Activity   |  dns  |  inos |
+------+--------+------+---------------+-------+-------+
|  0   | active | pve1 | Reqs:    0 /s | 41.0k | 41.0k |
+------+--------+------+---------------+-------+-------+
+-------------+----------+-------+-------+
|     Pool    |   type   |  used | avail |
+-------------+----------+-------+-------+
| cephfs_meta | metadata |  189M | 89.2T |
| cephfs_data |   data   | 2059G | 89.2T |
+-------------+----------+-------+-------+

+-------------+
| Standby MDS |
+-------------+
+-------------+
MDS version: ceph version 12.2.10 (fc2b1783e3727b66315cc667af9d663d30fe7ed4) luminous (stable)

regards
max

ftrojahn · Dec 28, 2018

Hi Max,

Max P said:
We have a 4 node proxmox cluster that I just updated to proxmox 5.3 (from 5.2) without any problems.
Now I want to test the new CephFS support in Proxmox 5.3 , but after I add it via the storage menu in the webinterface the cephfs storage entry only has a grey question mark on it.
The syslog contains the following errors:

Dec 27 16:02:29 pve1 pvestatd[16591]: A filesystem is already mounted on /mnt/pve/cephfs

yes, I'm seeing this behaviour here, too. When something changes, e.g. adding mon/mgr,
I have to manually umount /mnt/pve/cephfs on all nodes. After that I can disable/enable the
storage entry and storage is mounted correctly - green without question mark. May be reboot
solved it, too, on the corresponding node.

Best regards,
Falko

gurubert · Dec 31, 2018

I have the same issue but the workaround (disable, umount everywhere, enable) does not work.
The logfile only shows the "mount error: exit code 16" from pvestatd.

ftrojahn · Dec 31, 2018

Well, in the meantime I've not even disabled/enabled, but only dismounted and then in gui clicking on
'cephfs with question mark' entry summary -> contents did revive the mount.

gurubert · Dec 31, 2018

ftrojahn said:
Well, in the meantime I've not even disabled/enabled, but only dismounted and then in gui clicking on
'cephfs with question mark' entry summary -> contents did revive the mount.

Nope. Filesystem gets mounted but GUI still reports error.

Code:

Dez 31 14:04:32 ramsey pvestatd[2440]: mount error: exit code 16
Dez 31 14:04:42 ramsey pvestatd[2440]: mount error: exit code 16
Dez 31 14:04:52 ramsey pvestatd[2440]: mount error: exit code 16
Dez 31 14:05:03 ramsey pvestatd[2440]: mount error: exit code 16
Dez 31 14:05:12 ramsey pvestatd[2440]: mount error: exit code 16
Dez 31 14:05:22 ramsey pvestatd[2440]: mount error: exit code 16
Dez 31 14:05:32 ramsey pvestatd[2440]: mount error: exit code 16
Dez 31 14:05:42 ramsey pvestatd[2440]: mount error: exit code 16
Dez 31 14:05:53 ramsey pvestatd[2440]: mount error: exit code 16

Alwin · Jan 2, 2019

gurubert said:
mount error: exit code 16

Does the mount folder '/mnt/pve/<cephfs_storage>' show any subfolders or files? With 'pvesm status' you can get the folder directly mounted again, before the pvestatd tries to mount it. This may give some extra information, as the exit code is from the ceph mount command used.

Max P · Jan 2, 2019

Does the mount folder '/mnt/pve/<cephfs_storage>' show any subfolders or files?

Yes, after adding cephfs via the webinterface the mount folder was created and contained the data that is on cephfs, so the mount was successfull. But pvestatd still spammed the syslog with the mount errors and the webinterface showed the grey question mark.

gurubert · Jan 2, 2019

Does not work here:

Code:

root@dehmelt:~# umount /mnt/pve/cephfs
root@dehmelt:~# grep cephfs /proc/mounts
root@dehmelt:~# pvesm status
mount error 16 = Device or resource busy
mount error: exit code 16
Name             Type     Status           Total            Used       Available        %
backup            nfs   disabled               0               0               0      N/A
ceph              rbd     active     12826510317      5517448685      7309061632   43.02%
cephfs         cephfs   inactive               0               0               0    0.00%
local             dir     active        34571888         2400620        30385412    6.94%
local-lvm     lvmthin     active        81920000               0        81920000    0.00%
lxc               rbd     active      7324964460        15902828      7309061632    0.22%
root@dehmelt:~# grep cephfs /proc/mounts
192.168.44.65,192.168.44.66,192.168.44.67:/proxmox /mnt/pve/cephfs ceph rw,relatime,name=admin,secret=<hidden>,acl,wsize=16777216 0 0
root@dehmelt:~# pvesm list cephfs
mount error 16 = Device or resource busy
mount error: exit code 16
root@dehmelt:~# umount /mnt/pve/cephfs
root@dehmelt:~# pvesm list cephfs
mount error 16 = Device or resource busy
mount error: exit code 16
root@dehmelt:~# grep cephfs /proc/mounts
192.168.44.65,192.168.44.66,192.168.44.67:/proxmox /mnt/pve/cephfs ceph rw,relatime,name=admin,secret=<hidden>,acl,wsize=16777216 0 0

gurubert · Jan 2, 2019

I just saw that the path in storage.cfg ended in a slash which I removed.

Now an additonal error message is printed:

Code:

root@dehmelt:~# pvesm status
A filesystem is already mounted on /mnt/pve/cephfs
mount error 16 = Device or resource busy
mount error: exit code 16
Name             Type     Status           Total            Used       Available        %
backup            nfs   disabled               0               0               0      N/A
ceph              rbd     active     12835492340      5517466612      7318025728   42.99%
cephfs         cephfs   inactive               0               0               0    0.00%
local             dir     active        34571888         2400960        30385072    6.94%
local-lvm     lvmthin     active        81920000               0        81920000    0.00%
lxc               rbd     active      7333928556        15902828      7318025728    0.22%

and syslog contains:

Code:

Jan  2 14:39:06 dehmelt pvestatd[2417]: A filesystem is already mounted on /mnt/pve/cephfs
Jan  2 14:39:06 dehmelt pvestatd[2417]: mount error: exit code 16
Jan  2 14:39:16 dehmelt pvestatd[2417]: A filesystem is already mounted on /mnt/pve/cephfs
Jan  2 14:39:17 dehmelt pvestatd[2417]: mount error: exit code 16
Jan  2 14:39:26 dehmelt pvestatd[2417]: A filesystem is already mounted on /mnt/pve/cephfs
Jan  2 14:39:26 dehmelt pvestatd[2417]: mount error: exit code 16

Alwin · Jan 2, 2019

@gurubert, on what 'pveversion -v' are you? And how does the storage.cfg entry for cephfs look like?

gurubert said:
A filesystem is already mounted on /mnt/pve/cephfs mount error 16 = Device or resource busy

This may also occur, if you are running the list in the same moment as pvestatd tries to mount the storage. The pvestatd tries auto-mount all activated storage, the 'pvesm list' command tries to do the same. This may trigger the above message.

@Max P, how does your storage.cfg for the cephfs look like?

Max P said:
Yes, after adding cephfs via the webinterface the mount folder was created and contained the data that is on cephfs, so the mount was successfull. But pvestatd still spammed the syslog with the mount errors and the webinterface showed the grey question mark.

Is the cephfs mounted on the node with the error message? Can all MONs & MDS (at least one standby is recommended) be reached from that client?

Max P · Jan 2, 2019

@Max P, how does your storage.cfg for the cephfs look like?

Code:

root@pve1:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content vztmpl,iso,images,backup,rootdir
        maxfiles 1
        shared 0

lvmthin: local-lvm
        thinpool data
        vgname pve
        content rootdir,images

rbd: rbd_hdd_vm
        content images
        krbd 0
        pool rbd_hdd

rbd: rbd_hdd_ct
        content rootdir
        krbd 1
        pool rbd_hdd

cephfs: cephfs
        path /mnt/pve/cephfs
        content vztmpl,iso,backup

Is the cephfs mounted on the node with the error message?

After adding it via the webinterface cephfs is mounted (successfully, files are all there) on all nodes (on /mnt/pve/cephfs/ ) and also the same error messages appear on all nodes.

Can all MONs & MDS (at least one standby is recommended) be reached from that client?

All my nodes (4) are also ceph nodes with all OSDs up and in so I assume everything is reachable. We currently only have 1 MDS.
The webinterface under pve -> Ceph -> CephFS shows the cephfs with it's data and metadata pool and the one MDS.

So for me it looks like everything worked, but pve wrongly thinks it didn't.

gurubert · Jan 2, 2019

Alwin said:
@gurubert, on what 'pveversion -v' are you?

Code:

root@dehmelt:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-6 (running version: 5.3-6/37b3c8df)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-8-pve: 4.15.18-28
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-34
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

Alwin said:
And how does the storage.cfg entry for cephfs look like?

Code:

cephfs: cephfs
   path /mnt/pve/cephfs
   content vztmpl,iso,backup
   monhost ceph01 ceph02 ceph03
   subdir /proxmox
   username admin

Alwin said:
Is the cephfs mounted on the node with the error message? Can all MONs & MDS (at least one standby is recommended) be reached from that client?

It is mounted and all MONs and the MDS are reachable.

Code:

root@dehmelt:~# ls -l /mnt/pve/cephfs/
total 0
drwxr-xr-x 1 root root 0 Dez 31 12:08 dump
drwxr-xr-x 1 root root 2 Dez 31 11:06 template

Alwin · Jan 2, 2019

@Max P, does the ceph.conf list the three MONs with their mon address?

@gurubert, did you install the ceph luminous packages on the hosts to connect to the ceph cluster? These are needed to make the client work.

gurubert · Jan 2, 2019

Alwin said:
@gurubert, did you install the ceph luminous packages on the hosts to connect to the ceph cluster? These are needed to make the client work.

Yes:

Code:

root@dehmelt:~# dpkg -l|grep ceph
ii  ceph-base                            12.2.10-pve1                   amd64        common ceph daemon libraries and management tools
ii  ceph-common                          12.2.10-pve1                   amd64        common utilities to mount and interact with a ceph storage cluster
ii  ceph-fuse                            12.2.10-pve1                   amd64        FUSE-based client for the Ceph distributed file system
ii  libcephfs2                           12.2.10-pve1                   amd64        Ceph distributed file system client library
ii  python-cephfs                        12.2.10-pve1                   amd64        Python 2 libraries for the Ceph libcephfs library

Max P · Jan 2, 2019

Alwin said:
@Max P, does the ceph.conf list the three MONs with their mon address?

Yes:

Code:

root@pve1:~# cat /etc/pve/ceph.conf
[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         bluestore_block_db_size = 21474836480
         cluster network = 10.10.1.0/24
         fsid = e9f42f14-bed0-4839-894b-0ca3e598320e
         keyring = /etc/pve/priv/$cluster.$name.keyring
         mon allow pool delete = true
         osd journal size = 5120
         osd pool default min size = 2
         osd pool default size = 3
         public network = 10.10.1.0/24

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mon]
         mon clock drift allowed = 0.06

[mon.pve3]
         host = pve3
         mon addr = 10.10.1.3:6789

[mon.pve1]
         host = pve1
         mon addr = 10.10.1.1:6789

[mon.pve2]
         host = pve2
         mon addr = 10.10.1.2:6789

[client.radosgw.pve1]
         host = pve1
         keyring = /var/lib/ceph/radosgw/ceph-pve1/keyring
         log file = /var/log/ceph/client.radosgw.$host.log
         rgw_dns_name = s3.local

[mds.pve1]
         host = pve1

[client]
         mon host = 10.10.1.1:6789, 10.10.1.2:6789, 10.10.1.3:6789

Max P · Jan 7, 2019

@[URL='https://forum.proxmox.com/members/alwin.48816/']Alwin[/URL], I found a bug in the PVE/Storage/CephTools.pm perl script. It still doesn't work, but at least I get a different error now.

This is the fix:

Code:

+++ /usr/share/perl5/PVE/Storage/CephTools.pm   2019-01-07 16:31:05.170790597 +0100
+++ /usr/share/perl5/PVE/Storage/CephTools.pm   2019-01-07 16:23:08.110136566 +0100
@@ -58,7 +58,7 @@
     }

     my $config = $parse_ceph_file->($configfile);
-    @$server = sort map { $config->{$_}->{'mon addr'} } grep {/mon/} %{$config};
+    @$server = sort map { $config->{$_}->{'mon addr'} } grep {/mon./} %{$config};

     return join(',', @$server);
 };

The problem is that I have a general "mon" section in my config (for the clock drift setting)

Max P said:
config

and this general mon section is matched with the "mon" grep but, but this section doesn't have a 'mon addr' field and so is undefined.

I patched this file on one of my nodes and now I get this error in the webinterface when trying to add the cephfs:

Code:

create storage failed: error with cfs lock 'file-storage_cfg': mount error: exit code 16 (500)

but no errors in syslog

Alwin · Jan 7, 2019

@Max P, please update to the latest version, AFAIR, this has been fixed already. If it still persist, can you please post 'pveversion -v'?

Max P · Jan 7, 2019

Alwin said:
pveversion -v

Code:

root@pve1:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-5-pve: 4.15.18-24
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 12.2.10-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1

Code:

root@pve1:~# apt update
Ign:1 http://ftp.at.debian.org/debian stretch InRelease
Hit:2 http://ftp.at.debian.org/debian stretch-updates InRelease
Hit:3 http://ftp.at.debian.org/debian stretch Release
Hit:4 http://security.debian.org stretch/updates InRelease
Hit:5 http://deb.debian.org/debian stretch-backports InRelease
Hit:6 http://download.proxmox.com/debian/ceph-luminous stretch InRelease
Hit:7 https://enterprise.proxmox.com/debian/pve stretch InRelease
Reading package lists... Done
Building dependency tree
Reading state information... Done
2 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@pve1:~# apt list --upgradable
Listing... Done
libarchive13/stable 3.2.2-2+deb9u1 amd64 [upgradable from: 3.2.2-2]
tzdata/stable-updates 2018i-0+deb9u1 all [upgradable from: 2018g-0+deb9u1]

Does one of those packages contain the fix? Looks unrelated.

Alwin · Jan 7, 2019

I checked the repository and it seems not to be included. I will build up my test cluster to reproduce this. But this is separate from the mount error, if the storage isn't mounted, is the directory empty?

Max P · Jan 7, 2019

Alwin said:
if the storage isn't mounted, is the directory empty?

I got rid of the mount error by patching the perl script on all nodes (but I had limited the storage to only pve1 on my first try).
The mount succeeded and I can see the content of the cephfs in /mnt/pve/cephfs, but the webinterface and syslog errors are the same as before.
The error in the syslog is the same (exactly the same...) .
Are those perl scripts cached/precompiled (sorry, not really familiar with perl) or something like that? Do I have to reboot the nodes when I change something in the perl scripts?
Maybe I am just too tired today and overlooked something. Will try again tomorrow.

Error adding (existing) CephFS

Active Member

Active Member

Distinguished Member

Active Member

Distinguished Member

Proxmox Retired Staff

Active Member

Distinguished Member

Distinguished Member

Proxmox Retired Staff

Active Member

Distinguished Member

Proxmox Retired Staff

Distinguished Member

Active Member

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

We value your privacy