[SOLVED] pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable

lazypaul

Member
Aug 20, 2020
47
1
8
39
My cluster comes error as below, any one who can help.

Code:
Sep 08 18:18:12 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:18:22 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:18:22 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:18:32 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:18:32 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:18:42 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:18:42 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:18:52 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:18:52 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:19:00 g8kvm13 systemd[1]: Starting Proxmox VE replication runner...
Sep 08 18:19:01 g8kvm13 systemd[1]: pvesr.service: Succeeded.
Sep 08 18:19:01 g8kvm13 systemd[1]: Started Proxmox VE replication runner.
Sep 08 18:19:02 g8kvm13 kernel: mce: [Hardware Error]: Machine check events logged
Sep 08 18:19:02 g8kvm13 kernel: mce: [Hardware Error]: Machine check events logged
Sep 08 18:19:02 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:19:02 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:19:13 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:27:22 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:27:27 g8kvm13 pmxcfs[1591]: [status] notice: received log
Sep 08 18:27:32 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:27:32 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:27:43 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:27:43 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:27:51 g8kvm13 pmxcfs[1591]: [status] notice: received log
Sep 08 18:27:52 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:27:52 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:28:00 g8kvm13 systemd[1]: Starting Proxmox VE replication runner...
Sep 08 18:30:02 g8kvm13 systemd[1]: pvesr.service: Succeeded.
Sep 08 18:30:02 g8kvm13 systemd[1]: Started Proxmox VE replication runner.
Sep 08 18:30:02 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:30:02 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:30:12 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:30:12 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:30:22 g8kvm13 pvestatd[1873]: got timeout
Sep 08 18:30:22 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:31:02 g8kvm13 pvestatd[1873]: unable to activate storage 'cephfs' - directory '/mnt/pve/cephfs' does not exist or is unreachable
Sep 08 18:31:36 g8kvm13 kernel: mce_notify_irq: 13 callbacks suppressed
Sep 08 18:31:36 g8kvm13 kernel: mce: [Hardware Error]: Machine check events logged
Sep 08 18:31:43 g8kvm13 pvestatd[1873]: got timeout
 
Hi,

do you see problems when you call

Code:
ras-mc-ctl --summary

you have maybe to install "rasdaemon"
 
Hi,

do you see problems when you call

Code:
ras-mc-ctl --summary

you have maybe to install "rasdaemon"


It seems no errors, the log show this log frequency , the Ceph show health is OK, but run " df -h " hand there.


root@g8kvm37:~# ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.
No MCE errors.
root@g8kvm37:~#
 
Can you please send the /etc/pve/storage.cfg and the output of "mount"?
 
Can you please send the /etc/pve/storage.cfg and the output of "mount"?







Code:
root@g8kvm37:~# cat /etc/pve/storage.cfg
dir: local
    path /var/lib/vz
    content iso,vztmpl,backup

lvmthin: local-lvm
    thinpool data
    vgname pve
    content images,rootdir

rbd: G8KvmData
    content rootdir,images
    krbd 0
    pool G8KvmData

cephfs: cephfs
    path /mnt/pve/cephfs
    content backup,iso,vztmpl

lvm: test-nvme
    vgname test-nvme
    content images,rootdir
    nodes g8kvm06
    shared 0

pbs: pbs-data-g9
    datastore pbs-data
    server 10.0.142.0
    content backup
    fingerprint 67:1f:7c:39:ce:1b:f7:40:64:ac:c2:cd:16:57:e5:cd:3a:ae:72:2b:00:af:1a:16:59:05:ea:1b:87:a7:43:3a
    maxfiles 100
    username root@pam

root@g8kvm37:~# mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=49449688k,nr_inodes=12362422,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=9894896k,mode=755)
/dev/mapper/pve-root on / type ext4 (rw,relatime,errors=remount-ro)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=33,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=46821)
mqueue on /dev/mqueue type mqueue (rw,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)
lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
/dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
10.0.141.1,10.0.141.2,10.0.141.3,10.0.141.4,10.0.141.5:/ on /mnt/pve/cephfs type ceph (rw,relatime,name=admin,secret=<hidden>,acl)
tmpfs on /var/lib/ceph/osd/ceph-305 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-306 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-307 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-308 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-309 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-310 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-311 type tmpfs (rw,relatime)
tmpfs on /var/lib/ceph/osd/ceph-312 type tmpfs (rw,relatime)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=9894892k,mode=700)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)
root@g8kvm37:~#
 
Can you write or read to/from /mnt/pve/cephfs?
 
Can you write or read to/from /mnt/pve/cephfs?


I can not enter the folder, even I type chmod 755 cephfs, no use

Code:
root@g8kvm37:/mnt/pve#
root@g8kvm37:/mnt/pve# cd cephfs
-bash: cd: cephfs: Permission denied
root@g8kvm37:/mnt/pve# ls -al
ls: cannot access 'cephfs': Permission denied
total 8
drwxr-xr-x 3 root root 4096 Sep  4 17:27 .
drwxr-xr-x 4 root root 4096 Sep  4 17:27 ..
d????????? ? ?    ?       ?            ? cephfs
root@g8kvm37:/mnt/pve#
root@g8kvm37:/mnt/pve#
root@g8kvm37:/mnt/pve#
 
Is the secret correct?
compare this two secrets

Code:
ceph auth get-key client.admin
cat /etc/pve/priv/ceph/cephfs.secret
 
Is the secret correct?
compare this two secrets

Code:
ceph auth get-key client.admin
cat /etc/pve/priv/ceph/cephfs.secret


Reboot the node is ok. But I still have many node over 10 nodes have this problem. Reboot is the only option?
 
Try to unmount the dir.
It gets mounted automatically again if there are no kernel problems.
If there are hanging processes in the kernel you have to reboot all nodes.
 
  • Like
Reactions: markc and kifeo
Try to unmount the dir.
It gets mounted automatically again if there are no kernel problems.
If there are hanging processes in the kernel you have to reboot all nodes.


Umount the folder is ok, Thank you very much !!!!!!



Another question, do you know if ceph has ackage updated, does it need the reboot the node ?
 
No there is no need to reboot the node.

You have to restart the ceph services to get the latest version running.
This can be done over the GUI.
A node reboot is only necessary if a new kernel is available.
 
No there is no need to reboot the node.

You have to restart the ceph services to get the latest version running.
This can be done over the GUI.
A node reboot is only necessary if a new kernel is available.


One more question, do you know why no HA enable(no grooups, no resource) , restart the corosync could make the cluseter reboot ?
 
There is a discussion on the developer list about this.
I guess the problem is resolved soon with an update.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!