Search results

  1. L

    Hyper-converged PVE and CEPH, Single PVE cluster with multiple CEPH clusters?

    You might want to talk a look at: https://github.com/lephisto/crossover I use this to have incremental DR cold-standby copies in separate clusters, do (near) live migration with minimum downtime between different clusters and so on.
  2. L

    Doing crazy things with Proxmox+Ceph reloaded: Cross pool replication/migration

    What do You mean with 'protection snapshot'? I don't get it right now
  3. L

    Doing crazy things with Proxmox+Ceph reloaded: Cross pool replication/migration

    Hi, since I have to maintain some geographic disjunct locations for Services, I was looking for a possibility to get a nearly migrate VMs across different pools with the least downtime possible. Sure, you can backup to PBS or export and import, but depending on the size of the Images you will...
  4. L

    Ceph OSD crash

    There is Progress on this: https://tracker.ceph.com/issues/48276#note-32 The PR isn't merged upstream yet, so I guess we will see this (important) fix only in 14.2.16 or later.
  5. L

    Ceph OSD crash

    Just an update: I filed an Issue on the Ceph Redmine. There's a patch proposed which enables a verbose logging in case of this specific fail, but it's unclear as of now, when it gets backported to 14.x. https://tracker.ceph.com/issues/48276 so long..
  6. L

    Ceph OSD crash

    *bump* happened a second time on another node within 24 hours now.
  7. L

    Ceph OSD crash

    Hello, i guess I have the same Issue here. OSD Crash with no obvious Hardware Issues: root@X# ceph crash info 2020-11-18_02:24:35.429967Z_800333e3-630a-406b-9a0e-c7c345336087 { "os_version_id": "10", "utsname_machine": "x86_64", "entity_name": "osd.29", "backtrace": [...
  8. L

    Compatible with amd epyc

    I am running several clusters with epyc Rome CPUs in produxtion since half a year without hiccups.
  9. L

    Ceph Update: clarification on possible snaptrim() regression on ceph-14.2.10

    *bump Since i follow ceph devlopment very closely i can tell that there are a few additional regressions in ceph 14.2.10, i advise you to not upgrade to ceph 14.2.10 at the moment.
  10. L

    Ceph Update: clarification on possible snaptrim() regression on ceph-14.2.10

    Hi, is there anyone who already ran into this issue: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/PJO4WEZEOJUKF2FDXMWHD7O7WRL6H3JO/#PJO4WEZEOJUKF2FDXMWHD7O7WRL6H3JO might be related to: https://tracker.ceph.com/issues/44818 According to the Ceph Channel this seems to...
  11. L

    Clarification on build_incremental_map_msg

    Yes, I opened that issue :) thanks for caring. regards.
  12. L

    Proxmox host and client: backup and restore steps

    Hi Bruce, I understand the brevity of the Proxmox guys. You initial post was pretty confused and more a case for an IT Consultant that gives you a basic understanding of how virtualisation stacks work and what backup concepts there are. Basically there are two ways in which you should backup...
  13. L

    Clarification on build_incremental_map_msg

    Hello, I'm running some HC Clusters w/ Proxmox Enterprise + Ceph. All latest versions (6.2-4, Ceph 14.2.9). On one of them I have syslog messages, that come now and then, preferably when backfill or snaptrim is in progress: May 20 06:46:23 px2 ceph-osd[3802]: 2020-05-20 06:46:23.080...
  14. L

    Changing Ceph IP Range For Data Sync

    Hi, A solution might be: Temporarily change cluster_network in ceph.conf to the same range as public_network, so you don't need to shutdown everything and can do it on a live cluster (it's fine for ceph do have public/cluster network on the same l2/3 partition). After that, restart all osd's...
  15. L

    rbd error: rbd: listing images failed: (2) No such file or directory (500)

    Out of curiosity: how can an rbd image become "faulty" if there was no power or hardware related crash?
  16. L

    CEPH: outdated OSDs after minor upgrade

    I ran into the same issue some time ago on a Lab system. Only do a dist-upgrade procedure, because it checks dependencies in a smarter way. Especially the ceph components can be pretty fucked up that way..
  17. L

    AMD EPYC 7502P 32 Cores "Rome" With proxmox

    I am running quite a number of Romes on Supermicro boards without any hicups ..
  18. L

    [SOLVED] CEPH resilience: self-heal flawed?

    Just tested it, now it makes perfect sense. mon_osd_min_in_ratio default = 0.75. As asked above, what are the implications when i set this value to 0.6 in a 5 node setting to enable it to self heal down to the smalles quorum size? I guess there is a good reason this value is 0.75 by default.