Search results

  1. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    Good news. I fixed the "pg incomplete" error in ceph with the help of this post and now that the cluster in healthy, the slow mds message has gone away too!
  2. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    It has now become clearer to me what happens. I removed all the mds's and then the message changed. It seems that the active mds generates this message, although I have trouble finding the message from the console. The messager pertains to the active mds. Previously it was mdssm1, but now...
  3. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    This cluster is primarily used as a backup. We run Proxmox Backup Server on it, replicate some databases to it and use it for testing, so it's not primary production. We have had old drives fail a couple of time though, but I hear what you're saying about too many mds's.
  4. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    I will do this. However, I have created many mds daemons because these machines are old. Any of them could go down at some point and if the two that host the mds daemons go down at the same time I'd be screwed. Is there a downside to having many mds daemons?
  5. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    Yes, the node first removed and then rebuilt. The node was completely removed before I added the rebuilt one. # ceph -s cluster: id: a6092407-216f-41ff-bccb-9bed78587ac3 health: HEALTH_ERR 1 MDSs report slow metadata IOs 1 MDSs report slow requests...
  6. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    <bump>. Someone must have run into this issue before? Maybe it's just an annoyance and doesn't affect the cluster, but then again, maybe it does. I'd really like remove that.
  7. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    I had a failed node, which I replaced, but the MDS (for cephfs) that was on that node is still reported in the GUI as slow. How can I remove that? It's not in ceph.conf or storage.conf MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdssm1(mds.0): 6 slow metadata IOs are blocked > 30 secs...
  8. L

    How to force quorum if the 3rd monitor is down

    I have a situation where a node failed (due to the boot drive failing) and then another node failed (due to RAM failure). There are 7 nodes in the cluster, so things kept running, but eventually there were many writes that could not be redundantly stored and the whole thing ground to a halt...
  9. L

    Replace failed boot drive without trashing ceph OSD's?

    I have a failed boot drive in a 7 node proxmox cluster with ceph. If I replace the drive and do a fresh install, I would need to trash the OSD's attached to that node. If I could somehow recover the OSD's instead it would be great and probably save time too. Is that possible?
  10. L

    Fileserver with backup

    PBS as a VM is definitely a good idea. However, PBS generates a fingerprint that you need to save, otherwise another instance won't be able to read to backups. You can attach storage to PBS in many ways. On the underlying OS (Debian) you can mount the storage and simply link to it in the...
  11. L

    Ceph RBD features for Proxmox

    I have found that fast-diff is very useful, which requires exclusive-lock and object-map to be enabled as well. While the selection of features at RBD image create time is nicely documented, how to modify an existing volume is not easy to find. I wanted to enable fast-diff on images to I can...
  12. L

    [SOLVED] How can i install the ceph dashboard in proxmox 6

    Ah, I found the solution. https://forum.proxmox.com/threads/ceph-dashboard-not-working-after-update-to-proxmox-7-from-6-4.104911/post-498277
  13. L

    [SOLVED] How can i install the ceph dashboard in proxmox 6

    It's been a while, but it seems that all is not well with newer versions of ceph mgr and this... I get this: FT1-NodeA:~# apt-get install ceph-mgr-dashboard Reading package lists... Done Building dependency tree... Done Reading state information... Done ceph-mgr-dashboard is already the newest...
  14. L

    Fileserver with backup

    You should really look into Proxmox Backup server. It will take the pain out of backups. I configured some older metal into a proxmox ceph cluster, run PBS in a VM and it's works really well. If you want to just protect yourself against user errors or similar, use snapshots. They're much...
  15. L

    [BUG] Network is only working selectively, can't see why

    This morning a restart of a node that had not been restarted for quite some time caused the same symptoms as those reported here. It dawned on me that this might be due to a new running kernel that this swapping of ports occurs. On further investigation, here's what I found. NodeA was running...
  16. L

    ceph thin provisioning for lxc's not working as expected?

    :redface: Of course, the command has to run on the node on which the container is running...! ~# pct fstrim 192 /var/lib/lxc/192/rootfs/: 88.9 GiB (95446147072 bytes) trimmed /var/lib/lxc/192/rootfs/home/user-data/owncloud: 1.6 TiB (1795599138816 bytes) trimmed However, when I ask rbd for the...
  17. L

    ceph thin provisioning for lxc's not working as expected?

    I don't think it's a good idea to run privileged containers for clients, not? If a UID matches one of the host's UIDs that has rights to locations a client should not have access to, it may create a big problem...
  18. L

    ceph thin provisioning for lxc's not working as expected?

    Does it mean that if you have a mountpoint (over and above the boot drive), thin-provisioning doesn't work? ~# cat /etc/pve/lxc/192.conf arch: amd64 cores: 4 features: nesting=1 hostname: productive memory: 8192 nameserver: 8.8.8.8 net0...
  19. L

    ceph thin provisioning for lxc's not working as expected?

    Of course that gives the same result. For some reason the container believes that the storage doesn't support trimming, i.e. it's not thin provisioned. However, some other volumes on the same ceph storage pool are completely ok with trimming. Could there be something that's set in the...
  20. L

    ceph thin provisioning for lxc's not working as expected?

    The response is: fstrim: /: FITRIM ioctl failed: Operation not permitted This is Ubuntu 22.04 running a ceph storage cluster. Why is this?