Search results

  1. L

    What happens during VM migration?

    I have a FreeBSD 12.3 guest running a poller node and when it gets installed everything runs just fine. We can stop and start the guest too, no problem. The guest uses VirtIO SCSI and uses an ceph RBD image of 120GB. The FreeBSD qemu-guest-agent is installed. If for some reason the VM is...
  2. L

    Change WAL and DB location for running (slow) OSD's

    However, when I attempt to do this I get an error which is not documented anywhere afaict # lsblk /dev/sdb NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdb...
  3. L

    Change WAL and DB location for running (slow) OSD's

    That's excellent, I wasn't aware of it! It will save a lot of time since rebalancing HDD based OSD's is time-consuming to put it mildly!
  4. L

    Change WAL and DB location for running (slow) OSD's

    I got advice from a seasoned ceph expert to do the following: Split the NVMe drive into 3 OSD (with LVM) to really optimise the use the speed the NVMe offers. So I created additional 2 volumes (5% of the NVMe/ 47GB each) in the NVME to hold the RocksDB and WAL for 2 HDD drives. I'm in the...
  5. L

    Change WAL and DB location for running (slow) OSD's

    I need to do something about the horrible performance I get from the HDD pool on a production cluster. (I get around 500KB/s benchmark speeds!). As the disk usage has been increasing, so the performance has been dropping. I'm not sure why this is, since I have a test cluster, which higher...
  6. L

    Diagnosing slow ceph performance

    Did any of you every find what the cause was of the poor performance?
  7. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    Good news. I fixed the "pg incomplete" error in ceph with the help of this post and now that the cluster in healthy, the slow mds message has gone away too!
  8. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    It has now become clearer to me what happens. I removed all the mds's and then the message changed. It seems that the active mds generates this message, although I have trouble finding the message from the console. The messager pertains to the active mds. Previously it was mdssm1, but now...
  9. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    This cluster is primarily used as a backup. We run Proxmox Backup Server on it, replicate some databases to it and use it for testing, so it's not primary production. We have had old drives fail a couple of time though, but I hear what you're saying about too many mds's.
  10. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    I will do this. However, I have created many mds daemons because these machines are old. Any of them could go down at some point and if the two that host the mds daemons go down at the same time I'd be screwed. Is there a downside to having many mds daemons?
  11. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    Yes, the node first removed and then rebuilt. The node was completely removed before I added the rebuilt one. # ceph -s cluster: id: a6092407-216f-41ff-bccb-9bed78587ac3 health: HEALTH_ERR 1 MDSs report slow metadata IOs 1 MDSs report slow requests...
  12. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    <bump>. Someone must have run into this issue before? Maybe it's just an annoyance and doesn't affect the cluster, but then again, maybe it does. I'd really like remove that.
  13. L

    [SOLVED] How to remove old mds from ceph? (actually slow mds message)

    I had a failed node, which I replaced, but the MDS (for cephfs) that was on that node is still reported in the GUI as slow. How can I remove that? It's not in ceph.conf or storage.conf MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdssm1(mds.0): 6 slow metadata IOs are blocked > 30 secs...
  14. L

    How to force quorum if the 3rd monitor is down

    I have a situation where a node failed (due to the boot drive failing) and then another node failed (due to RAM failure). There are 7 nodes in the cluster, so things kept running, but eventually there were many writes that could not be redundantly stored and the whole thing ground to a halt...
  15. L

    Replace failed boot drive without trashing ceph OSD's?

    I have a failed boot drive in a 7 node proxmox cluster with ceph. If I replace the drive and do a fresh install, I would need to trash the OSD's attached to that node. If I could somehow recover the OSD's instead it would be great and probably save time too. Is that possible?
  16. L

    Fileserver with backup

    PBS as a VM is definitely a good idea. However, PBS generates a fingerprint that you need to save, otherwise another instance won't be able to read to backups. You can attach storage to PBS in many ways. On the underlying OS (Debian) you can mount the storage and simply link to it in the...
  17. L

    Ceph RBD features for Proxmox

    I have found that fast-diff is very useful, which requires exclusive-lock and object-map to be enabled as well. While the selection of features at RBD image create time is nicely documented, how to modify an existing volume is not easy to find. I wanted to enable fast-diff on images to I can...
  18. L

    [SOLVED] How can i install the ceph dashboard in proxmox 6

    Ah, I found the solution. https://forum.proxmox.com/threads/ceph-dashboard-not-working-after-update-to-proxmox-7-from-6-4.104911/post-498277
  19. L

    [SOLVED] How can i install the ceph dashboard in proxmox 6

    It's been a while, but it seems that all is not well with newer versions of ceph mgr and this... I get this: FT1-NodeA:~# apt-get install ceph-mgr-dashboard Reading package lists... Done Building dependency tree... Done Reading state information... Done ceph-mgr-dashboard is already the newest...
  20. L

    Fileserver with backup

    You should really look into Proxmox Backup server. It will take the pain out of backups. I configured some older metal into a proxmox ceph cluster, run PBS in a VM and it's works really well. If you want to just protect yourself against user errors or similar, use snapshots. They're much...