Search results

  1. L

    [PSA] ZFS Silent Data Corruption

    What's not totally clear to me: Does this also apply to zvol's or just if I use ZFS as posix Filesystem?
  2. L

    Useless vzdump hook --script

    Hi, question to the devs: Since the inception of PBS the hook script for vzdump got a bit useless. I want to use it to extract some backup performance and success metrics and put it in my timeseries. DB. How can I retrieve things like Success, size and so on when using a proxmox backup...
  3. L

    How to stop syncjob flooting syslog

    Is there any news on that? My logs get flooded on every sync. :(
  4. L

    GA Monitoring tool

    Thought i want to share this, it might be helpful for someone: In case you want to be alerted if you have a guest running without guest-agent: https://github.com/lephisto/check-ga This comes helpful if you want to make sure that backups are consistent. Sometimes a GA crashes of someone...
  5. L

    PBS breaking customer SQL backups. Backups without FS-Freeze?

    Hi, it is still unclear to me, which versions of the qemu-ga have this option. Neither an installer of the latest(?) version asks me for this, nor do i have the Registry key. Can someone clarify please? From my understanding for Snapshot Backups of the underlying blockdevice vss-copy ist...
  6. L

    [SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update

    I suspected this might be required and indeed: Recreating osd.3 and osd.10 solved my whole issue. As it turns out here and there OSD's seem to be somehow corrupted when getting internal structures converted on the upgrade to pacific. Still - i couldn't detect anything wrong on the logfiles...
  7. L

    [SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update

    I think i am narrowing the problem down. On a Production cluster I have the effect that also with the old WPQ scheduler I have a huge performance impact - not as bad, but still not funny - on a few osd's in the Cluster;: osd commit_latency(ms) apply_latency(ms) 13 0...
  8. L

    Backup Slowdown and Windows VM Corruption Issues after Recent Update

    Is this bug still an issue? Can you please point out which Proxmox Versions are affected, and describe exact circumstances when it can occur?
  9. L

    [SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update

    Okay a report of my progress: After trying to finetune on the OPS weights and limits... it just does not work for me. Going back to the old scheduler solved the thing for me and everything works like charme again. I know this will be deprecated in the future. I will continue trying to figure...
  10. L

    [SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update

    Hm.. even with these params a snapshot delete brings i/o for client operations to nearly 0: ceph tell osd.* injectargs "--osd_mclock_profile=custom" ceph tell osd.* injectargs "--osd_mclock_scheduler_client_wgt=4" ceph tell osd.* injectargs "--osd_mclock_scheduler_background_recovery_lim=100"...
  11. L

    [SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update

    An interessting Fact: - This issues does not occur on Clusters that were "born" as 7.x / Pacific or Quincy - The I/O impact is so hard, that workload can barely run, I/O is very laggy.. I will look into the mclock tuning thing.
  12. L

    [SOLVED] Ceph snaptrim causing perforamnce impact on whole Cluster since update

    Hi, I upgraded a Cluster right all the way from Proxmox 6.2/Ceph 14.x to Proxmox 8.0/Ceph 17.x (latest). Hardware is Epyc Servers, all flash / NVME. I can rule out Hardware issues. I can reproduce the issue as well. All running fine so far, except that my whole system gehts slowed down when i...
  13. L

    pverados segfault

    Thanks for quick reply @fiona. Can you explain why I only see this on Clusters, that have been upgraded all the way up from 6.x to 8 but not on Clusters that were born as 7.x? I am just curious. //edit: sorry, I have to correct myself. i also see this on clusters that came from 7.x. regards.
  14. L

    pverados segfault

    fyi, still segfaults on 6.2.16-6-pve. [ 277.638986] pverados[19363]: segfault at 5588cfa55010 ip 00005588cb7fc09d sp 00007ffe2d1e6360 error 7 in perl[5588cb721000+195000] likely on CPU 28 (core 5, socket 0) [ 277.638999] Code: 0f 95 c2 c1 e2 05 08 55 00 41 83 47 08 01 48 8b 53 08 22 42 23 0f...
  15. L

    PBS starting up stopped VMs on Backup

    Well, it applies to all backups that rely on qemu snapshot mechanisms. There are backup/replication tools, that utilize ceph or zfs snapshotting and bypass this whole qemu thingie. thanks again, i will implement additional checks. :)
  16. L

    PBS starting up stopped VMs on Backup

    Ok, forget about it. It was not a deadman alert but a qemu-ga alert, which I implemented into our obeservability setup to get informed when Guest-Agents crash somewhere. ( https://github.com/lephisto/check-ga ) It basically issues a qmp guest-ping every 5 minutes. If that happens while the...
  17. L

    PBS starting up stopped VMs on Backup

    I have noticed this behaviour because the telegraf agent inside the VM has started transmitting telemetry to influxdb and when it was shutdown at the end of the backup a deadman alert was raised. i guess this is not an intended behaviour?
  18. L

    PBS starting up stopped VMs on Backup

    Hi, i noticed one weird behaviour: I have a few VMs in my cluster that are stopped on purpose. HA is set to request state = stopped. Now when the PBS Backup is running is see this: INFO: starting kvm to execute backup task The VM is being booted up, which can cause some trouble. Proxmox /...
  19. L

    'All' Backup Jobs runs in parallel

    Is tehre some solution for this already? In the current form, the PBS does not scale, unless you have all-flash Backup storages.