I had a weird cephFS issue i couldn't source the root of.
i only noticed it on the two nodes that had opt-in 6.14 kernels, that could just be coincidence.
I have reverted to the 6.8 kenel on those two nodes.
I am back working but want to do some log archaeology to see if i could figure out what went wrong.
This all happened when i was playing with bridging two thunderbolt ports from my ceph-mesh-network on just a single proxmox/cephnode.
Symptoms:
I am in a stable position. Before i see if i get thunderbolt bridging to work / risk tanking the system what can i search for in the journal that might help.
I have allready tried:
i have updated from reef to squid this morning as it was on my todo list, everything seems great so far - this is just about trying to figure out what happened and why
i only noticed it on the two nodes that had opt-in 6.14 kernels, that could just be coincidence.
I have reverted to the 6.8 kenel on those two nodes.
I am back working but want to do some log archaeology to see if i could figure out what went wrong.
This all happened when i was playing with bridging two thunderbolt ports from my ceph-mesh-network on just a single proxmox/cephnode.
Symptoms:
- on any pve host i could
ls
the /mnt/pve/cephfs-name - i could read files
- i could touch a new file and see the 0 byte file be created
- i could start a nano to create a file, but when i went to write a file the write hung (and hung the SSH ssession running nano too)
- i tried restarting the MDS
- i tried restating the OSDs
- i tried rebooting node 1 and 3 without the opt-in kernel (node 2 always had the 6.8.x kernel so that didn't get rebooted as part of that test)
- removing the PWL cache settings and rbd plugin setting (i had introduced these a few days a go)
- in the end i shut down the cluster gracefully and then brought all nodes up one by waiting for things to converge - this seems to do the trick
I am in a stable position. Before i see if i get thunderbolt bridging to work / risk tanking the system what can i search for in the journal that might help.
I have allready tried:
Code:
journalctl | grep -i "docker-cephFS"
journalctl -u ceph-mds@* | grep -i "read only"
I have seen a bunch of these from when i was having issues, but they only confirm i was seeing issues:
Apr 25 15:06:00 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:06:08 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:06:45 pve3 systemd[1]: mnt-pve-docker\x2dcephFS.mount: Directory /mnt/pve/docker-cephFS to mount over is not empty, mounting anyway.
Apr 25 15:06:45 pve3 systemd[1]: Mounting mnt-pve-docker\x2dcephFS.mount - /mnt/pve/docker-cephFS...
Apr 25 15:06:45 pve3 systemd[1]: Mounted mnt-pve-docker\x2dcephFS.mount - /mnt/pve/docker-cephFS.
Apr 25 15:09:44 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:10:20 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:10:22 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:38:42 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:38:59 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:39:01 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:39:18 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:39:32 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:40:00 pve3 pvedaemon[197243]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:40:02 pve3 pvedaemon[187221]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:40:03 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:40:20 pve3 pvestatd[1835]: unable to activate storage 'docker-cephFS' - directory '/mnt/pve/docker-cephFS' does not exist or is unreachable
Apr 25 15:40:35 pve3 systemd[1]: mnt-pve-docker\x2dcephFS.mount: Directory /mnt/pve/docker-cephFS to mount over is not empty, mounting anyway.
Apr 25 15:40:35 pve3 systemd[1]: Mounting mnt-pve-docker\x2dcephFS.mount - /mnt/pve/docker-cephFS...
Apr 25 15:42:05 pve3 systemd[1]: Failed to mount mnt-pve-docker\x2dcephFS.mount - /mnt/pve/docker-cephFS.
Apr 25 15:42:05 pve3 systemd[1]: mnt-pve-docker\x2dcephFS.mount: Directory /mnt/pve/docker-cephFS to mount over is not empty, mounting anyway.
i have updated from reef to squid this morning as it was on my todo list, everything seems great so far - this is just about trying to figure out what happened and why
Last edited: