[SOLVED] cephfs? on all 4 nodes - mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details. (500)

GCS_IT

New Member
Sep 13, 2020
2
0
1
To anyone that is smart on this stuff,

I've been having numerous issues including a node restarting due to bad ram, here's my current status:
1. A few VMs wont start due to "Job for mnt-pve-cephfs.mount failed. TASK ERROR: mount error: See "systemctl status mnt-pve-cephfs.mount" and "journalctl -xe" for details.
2. Under each node, storage cephfs has a question mark
3. The result of systemctl status mnt-pve-cephfs.mount shows the following:
mnt-pve-cephfs.mount - /mnt/pve/cephfs
Loaded: loaded (/run/systemd/system/mnt-pve-cephfs.mount; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2020-09-13 02:05:45 PDT; 2s ago
Where: /mnt/pve/cephfs
What: 10.0.1.1,10.0.1.2,10.0.1.3,10.0.1.5,10.0.1.6:/

Sep 13 02:05:45 pve06 systemd[1]: Mounting /mnt/pve/cephfs...
Sep 13 02:05:45 pve06 mount[51782]: mount error: no mds server is up or the cluster is laggy
Sep 13 02:05:45 pve06 systemd[1]: mnt-pve-cephfs.mount: Mount process exited, code=exited, status=32/n/a
Sep 13 02:05:45 pve06 systemd[1]: mnt-pve-cephfs.mount: Failed with result 'exit-code'.
Sep 13 02:05:45 pve06 systemd[1]: Failed to mount /mnt/pve/cephfs.

4. Ceph Health in the GUI shows "1 filesystem degraded"
5. /mnt/pve/cephfs appears to be empty

I am very new to this but have no one to go to. I have some production servers down as a result of this, however, some are up, which is confusing to me.
Any advice, or some diagnostics that might help I'm willing to run. Thanks.
 
Sep 13 02:05:45 pve06 mount[51782]: mount error: no mds server is up or the cluster is laggy
Is the ceph cluster healthy (ceph -s)? The MDS will need some time to replay and sync up with any member MDS.
 
I had one PG that was broken. I finally got that fixed a few hours later and everything healed.