Search results

  1. D

    [SOLVED] Ceph 14.2.8 - HEALTH_ERR - telemetry module

    Wasn't amused when someone brought my attention to a Ceph cluster with a HEALTH_ERR state: cluster: id: 97eac23e-10e4-4b53-aa2d-e013e50ff782 health: HEALTH_ERR Module 'telemetry' has failed: cannot concatenate 'str' and 'UUID' objects This is apparently a minor...
  2. D

    Any Proxmox Ceph Users Interested in helping test Benji

    The following is another legacy script where we backup a statically maintained list of Proxmox VMs: #!/bin/bash # /etc/cron.d/proxmox-network-backup # 0 18 * * 1-5 root /root/proxmox-network-backup . /usr/local/benji/bin/activate; . /usr/local/bin/benji-backup.sh; get_disk () { # Limit to...
  3. D

    Any Proxmox Ceph Users Interested in helping test Benji

    Proxmox VMs using Ceph RBD provide writeback caching with flushing support. A standard Ceph snapshot at any point should be transactionally safe as, for example, Microsoft SQL will flush important transactions whenever it needs to. We have done one production and several test restores with no...
  4. D

    [SOLVED] kvm bug? - Problem with new Intel Xeon Gold 6248

    Okay, so this is detailed in the following article: https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/tsx_async_abort.html 2nd generation Intel Scalable CPUs, essentially Cascade Lake, allow TSX to be disabled when no Intel microcode updates provides mitigation to known security...
  5. D

    [SOLVED] kvm bug? - Problem with new Intel Xeon Gold 6248

    We added two new nodes to our cluster but are not able to migrate VMs to it as qemu appears to incorrectly indicate that the CPU doesn't support necessary 'hle' and 'rtm' features. Error: task started by HA resource agent 2020-02-14 22:19:16 starting migration of VM 374 to node 'kvm5h'...
  6. D

    create pool of selective osd in proxmox ceph cluster

    Herewith a sample of what it would look like: [admin@kvm5d ~]# ceph df RAW STORAGE: CLASS SIZE AVAIL USED...
  7. D

    Any Proxmox Ceph Users Interested in helping test Benji

    Also running one at our offices for internal use VMs. Reconfigured a NAS by installing Proxmox and setting the Ceph failure domain to OSD. I've configured mine to automatically scrub 15% of all backup images each day after completing incremental backups.
  8. D

    Any Proxmox Ceph Users Interested in helping test Benji

    I've been running this about 4 months now and it really has been great. I setup the destination to use Ceph's rados gateway (S3 compatible) with 256 bit AES encryption and compression. Duplication and backing up source RBD images using fast-diff (now possible with recent Ceph kernel module...
  9. D

    Ceph OSD down intermittently since PVE6/Nautilus upgrade

    This is an old thread but you may simply be running out of memory on your OSD processes. Memory limit defaults were changed in this release, perhaps try limit the individual OSDs by setting the following in /etc/pve/ceph.conf The following is from an old node with limited memory: [osd.50]...
  10. D

    [SOLVED] Ceph - Unable to remove config entry

    Many thanks, that works perfectly! [admin@kvm1a ~]# ceph config rm global rbd_default_features; ceph config-key rm config/global/rbd_default_features; ceph config dump | grep -e WHO -e rbd_default_features; ceph config set global rbd_default_features 31; ceph config dump | grep -e WHO -e...
  11. D

    [SOLVED] Ceph - Unable to remove config entry

    Commands sometimes land on other monitors, herewith simultaneous debugs from all three monitors. Commands: [root@kvm1a ~]# ceph config dump | grep -e WHO -e rbd_default_features WHO MASK LEVEL OPTION VALUE RO global advanced rbd_default_features...
  12. D

    [SOLVED] Ceph - Unable to remove config entry

    Setting this to 20/20 or 5/5 only generates the following debug information: Enable debugging: [admin@kvm1a ~]# ceph daemon mon.kvm1a config show | grep -i debug_rbd # "debug_rbd": "0/5", # "debug_rbd_mirror": "0/5", # "debug_rbd_replay": "0/5", [admin@kvm1a ~]# ceph daemon...
  13. D

    [SOLVED] Ceph - Unable to remove config entry

    I've been lost in trying to figure out how to use the 'ceph config-key' command and unfortunately don't recall figuring out how to remove an entry... Logged a bug with Ceph: https://tracker.ceph.com/issues/43296
  14. D

    [SOLVED] Ceph - Unable to remove config entry

    Herewith the thread in the Ceph user mailing list. I'll log a bug report shortly... http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-December/037701.html
  15. D

    [SOLVED] Ceph - Unable to remove config entry

    Hrm... Another user on the Ceph mailing list appears to have confirmed this as a bug since he upgraded to Mimic...
  16. D

    [SOLVED] Ceph - Unable to remove config entry

    Apologies, yes the cluster is healthy: [admin@kvm1c ~]# ceph -s cluster: id: 31f6ea46-12cb-47e8-a6f3-60fb6bbd1782 health: HEALTH_OK services: mon: 3 daemons, quorum kvm1a,kvm1b,kvm1c (age 20h) mgr: kvm1c(active, since 20h), standbys: kvm1b, kvm1a mds: cephfs:1...
  17. D

    [SOLVED] Ceph - Unable to remove config entry

    Ceph Nautilus 14.2.4.1, Herewith our Ceph Nautilus upgrade nodes that we ran through on 7 separate clusters: cd /etc/pve; ceph config assimilate-conf -i ceph.conf -o ceph.conf.new; mv ceph.conf.new ceph.conf; pico /etc/ceph/ceph.conf # add back: cluster_network # public_network...
  18. D

    [SOLVED] Ceph - Unable to remove config entry

    Ceph config file: [global] cluster_network = 10.248.1.0/24 filestore_xattr_use_omap = true fsid = 31f6ea46-12cb-47e8-a6f3-60fb6bbd1782 mon_host = 10.248.1.60 10.248.1.61 10.248.1.62 public_network = 10.248.1.0/24 [client] keyring = /etc/pve/priv/$cluster.$name.keyring
  19. D

    [SOLVED] Ceph - Unable to remove config entry

    We assimilated Ceph's ceph.conf file during our Nautilus upgrade and subsequently have a minimal configuration file. We are however now unable to remove configuration entries. [admin@kvm1b ~]# ceph config dump | grep -e WHO -e rbd_default_features WHO MASK LEVEL OPTION...
  20. D

    download.proxmox.com unreachable

    The outages I've hit typically last about an hour and yes, it is working again now... Any chance of automating DNS updates to avoid situations where we aren't able to complete updates during scheduled maintenance events? Or Proxmox perhaps considering signing up with a forward caching service...