Mounting CephFS in VM works briefly then gets blocklisted

Aug 1, 2024
6
0
1
Hi all,

Going through the motions to get CephFS as a shared mount point for a set of VMs.

- Created a new CephFS called 'docker' using Proxmox UI
- Added a ceph user called 'docker' with rw access to fs 'docker'
- set up VM with /etc/ceph/ containing basic ceph.conf and keyring for 'docker' user
- create /mnt/ceph folder for the new mount
- add mount details using fuse.ceph for fstab and run mount -a

mount point is listed successfully, and I can go to the mount, create a file, ls contents, everything is great!

Then an intermittent time later I'll try to use the client mount point and it:
- takes a really long time and then throws the error 'Cannot send after transport endpoint shutdown'
- using 'ceph osd blocklist ls' on one of the hosts shows the client as blocked.
- remove it from the blocklist and the client immediately comes back.

The time can literally be 1-2 days or as short as 30 seconds.

I've tried a few different logs to try and find a cause and everything looks like normal operations.

Can anyone suggest a next step for me to look at?
 
And it's gone back blocklisting my kernel OR fuse client again with:

0 log_channel(cluster) log [WRN] : evicting unresponsive client <client name> (21545209), after 303.302 seconds

Do I have to increase the timeout? or turn off blocklisting to make this work?