Search results

  1. L

    [SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update

    Hm.. even with these params a snapshot delete brings i/o for client operations to nearly 0: ceph tell osd.* injectargs "--osd_mclock_profile=custom" ceph tell osd.* injectargs "--osd_mclock_scheduler_client_wgt=4" ceph tell osd.* injectargs "--osd_mclock_scheduler_background_recovery_lim=100"...
  2. L

    [SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update

    An interessting Fact: - This issues does not occur on Clusters that were "born" as 7.x / Pacific or Quincy - The I/O impact is so hard, that workload can barely run, I/O is very laggy.. I will look into the mclock tuning thing.
  3. L

    [SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update

    Hi, I upgraded a Cluster right all the way from Proxmox 6.2/Ceph 14.x to Proxmox 8.0/Ceph 17.x (latest). Hardware is Epyc Servers, all flash / NVME. I can rule out Hardware issues. I can reproduce the issue as well. All running fine so far, except that my whole system gehts slowed down when i...
  4. L

    pverados segfault

    Thanks for quick reply @fiona. Can you explain why I only see this on Clusters, that have been upgraded all the way up from 6.x to 8 but not on Clusters that were born as 7.x? I am just curious. //edit: sorry, I have to correct myself. i also see this on clusters that came from 7.x. regards.
  5. L

    pverados segfault

    fyi, still segfaults on 6.2.16-6-pve. [ 277.638986] pverados[19363]: segfault at 5588cfa55010 ip 00005588cb7fc09d sp 00007ffe2d1e6360 error 7 in perl[5588cb721000+195000] likely on CPU 28 (core 5, socket 0) [ 277.638999] Code: 0f 95 c2 c1 e2 05 08 55 00 41 83 47 08 01 48 8b 53 08 22 42 23 0f...
  6. L

    PBS starting up stopped VMs on Backup

    Well, it applies to all backups that rely on qemu snapshot mechanisms. There are backup/replication tools, that utilize ceph or zfs snapshotting and bypass this whole qemu thingie. thanks again, i will implement additional checks. :)
  7. L

    PBS starting up stopped VMs on Backup

    Ok, forget about it. It was not a deadman alert but a qemu-ga alert, which I implemented into our obeservability setup to get informed when Guest-Agents crash somewhere. ( https://github.com/lephisto/check-ga ) It basically issues a qmp guest-ping every 5 minutes. If that happens while the...
  8. L

    PBS starting up stopped VMs on Backup

    I have noticed this behaviour because the telegraf agent inside the VM has started transmitting telemetry to influxdb and when it was shutdown at the end of the backup a deadman alert was raised. i guess this is not an intended behaviour?
  9. L

    PBS starting up stopped VMs on Backup

    Hi, i noticed one weird behaviour: I have a few VMs in my cluster that are stopped on purpose. HA is set to request state = stopped. Now when the PBS Backup is running is see this: INFO: starting kvm to execute backup task The VM is being booted up, which can cause some trouble. Proxmox /...
  10. L

    'All' Backup Jobs runs in parallel

    Is tehre some solution for this already? In the current form, the PBS does not scale, unless you have all-flash Backup storages.
  11. L

    Hyper-converged PVE and CEPH, Single PVE cluster with multiple CEPH clusters?

    You might want to talk a look at: https://github.com/lephisto/crossover I use this to have incremental DR cold-standby copies in separate clusters, do (near) live migration with minimum downtime between different clusters and so on.
  12. L

    Doing crazy things with Proxmox+Ceph reloaded: Cross pool replication/migration

    What do You mean with 'protection snapshot'? I don't get it right now
  13. L

    Doing crazy things with Proxmox+Ceph reloaded: Cross pool replication/migration

    Hi, since I have to maintain some geographic disjunct locations for Services, I was looking for a possibility to get a nearly migrate VMs across different pools with the least downtime possible. Sure, you can backup to PBS or export and import, but depending on the size of the Images you will...
  14. L

    Ceph OSD crash

    There is Progress on this: https://tracker.ceph.com/issues/48276#note-32 The PR isn't merged upstream yet, so I guess we will see this (important) fix only in 14.2.16 or later.
  15. L

    Ceph OSD crash

    Just an update: I filed an Issue on the Ceph Redmine. There's a patch proposed which enables a verbose logging in case of this specific fail, but it's unclear as of now, when it gets backported to 14.x. https://tracker.ceph.com/issues/48276 so long..
  16. L

    Ceph OSD crash

    *bump* happened a second time on another node within 24 hours now.
  17. L

    Ceph OSD crash

    Hello, i guess I have the same Issue here. OSD Crash with no obvious Hardware Issues: root@X# ceph crash info 2020-11-18_02:24:35.429967Z_800333e3-630a-406b-9a0e-c7c345336087 { "os_version_id": "10", "utsname_machine": "x86_64", "entity_name": "osd.29", "backtrace": [...
  18. L

    Compatible with amd epyc

    I am running several clusters with epyc Rome CPUs in produxtion since half a year without hiccups.
  19. L

    Ceph Update: clarification on possible snaptrim() regression on ceph-14.2.10

    *bump Since i follow ceph devlopment very closely i can tell that there are a few additional regressions in ceph 14.2.10, i advise you to not upgrade to ceph 14.2.10 at the moment.

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!