Search results

  1. F

    [SOLVED] High io delay after loosing a node

    Thanks to the community for all this reluctant informations. Everything is clear
  2. F

    [SOLVED] High io delay after loosing a node

    Thanks @SteveITS , If I change this parameter, will ceph create all the "missing pg" and use 50% of additional storage ? Regards
  3. F

    [SOLVED] High io delay after loosing a node

    Hum, I think I understood what happenned. The ceph conf is as below : AFAICT, if an OSD is down there's a high probability that the fact of missing OSD is considered as an emergency, So it will react to rebalance the missing pg Am I correct ?
  4. F

    [SOLVED] High io delay after loosing a node

    Hum, I think this is a priority issue in the operations. When the recovery has ended all the IO delays has decreased and everything is running smoothly on a 3 node. What can I setup to give priority to client usage when there is no "emergency" ? Regards
  5. F

    [SOLVED] High io delay after loosing a node

    The ceph status indicate only warnings : Yet the iodelay still very high
  6. F

    [SOLVED] High io delay after loosing a node

    Additional information, we haven't activate the HA , When the server is coming back to life, everything is running smoothly. We're looking for the reason why other nodes are overreacting to this.
  7. F

    [SOLVED] High io delay after loosing a node

    Hi, We have a 4 node cluster with ceph installed on 4 disks per node so 16 OSD and only 50% is used. All disks are nvme, the scheduler is setup with option mq-deadline in /sys/block/nvmeXXX/queue/scheduler Our node have enough RAM and CPU Today, a node has been shutdown and as we're expecting...
  8. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    So for the record, I've succeed in making ceph working. First there was some ghost monitors that I've succeed to delete from monmap Then, I had some ACL issues on the directory structure : rocksdb: IO error: While opening a file for sequentially reading...
  9. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    Hi @fabian , FMI, do you think that the official support of proxmox could cover this case ? I mean, for changing the ceph conf to make it work as it used to recently. Regards
  10. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    Thanks for your advice, I'll try to restore the behaviour that was working and upgrade ASAP.
  11. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    additionnal infos : the one node ceph was working perfectly, it has just failed recently and I'm looking for the reason. a recent update change the NIC name and maybe that's the reason but I don't find any clue for this hypothesis
  12. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    understood. Sadly I have some VM on this server that hasn't been saved for a long time. not so critics but with a long setup process. If I reinstall everything, will I be able to recover the osd ?
  13. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    hum actually it won't be possible to upgrade without making ceph work : pve6to7 fail
  14. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    My question behind this is : if I upgrade the server, will it allow ceph to run on this single node ? Regards
  15. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    Hi @fabian Thanks for this precision, you're right. So If I resume, ceph can't run on a single node ? OR do we need to adapt the corum also for ceph. Thanks for the time spent to read and answer
  16. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    so maybe it's not corosync the issue but ceph. The ceph logs are throwing : e13 handle_auth_request failed to assign global_id 2024-12-11T14:55:53.720+0100 7f815b4c5700 -1 mon.server@1(probing) e13 get_health_metrics reporting 4 slow ops, oldest is auth(proto 0 29 bytes epoch 0)...
  17. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    I've also tried to turn this into a standalone server : https://forum.proxmox.com/threads/proxmox-ve-6-removing-cluster-configuration.56259/#post-259203 Yet ceph is not starting but I have no more pmxfs issues at boot
  18. F

    [SOLVED] PVE6.4 pmxcfs fail to initialize and ceph failed on a one node cluster

    Hi everyone, I have a one node server that has been part of a 4 nodes cluster. The current server has 2 disk with OSD and VM + CT using ceph. A few days ago ceph had turned unresponsive with question mark and got timeout (500) in the web UI. We updated the PXVE6.4 to the latest release, all the...
  19. F

    Server crash when backup to PBS

    could you add SOLVED prefix to your issue