Search results

  1. J

    Can't Reshard OSD's under Ceph Pacific 16.2.4

    In unrelated news, don't be a smarty-pants like me, and attempt to implement the PR noted above. If you're not careful, you'll upgrade yourself to Ceph version 17.0.0.XX (Quincy), and be unable to downgrade due to changes in "Mon disk structure".
  2. J

    Can't Reshard OSD's under Ceph Pacific 16.2.4

    Confirmed, the command does indeed reshard successfully, and from there does indeed allow the OSD to start. Again, recommend marking out, allowing data transfer, then removing and recreating any OSD that gets this fix, as I am unsure of the performance implications of removing the "Binned, and...
  3. J

    Can't Reshard OSD's under Ceph Pacific 16.2.4

    Also hit with this bug. The timing was god awful too, but I've managed to mostly recover. I also attempted to manually shard OSDs with: ceph-bluestore-tool --path <data path> --sharding="m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} L P" reshard per ceph's docs. Reading the bug...
  4. J

    2x 56GbE Optimization, slow ceph recovery, and MLNX-OS S.O.S.

    Actually, that was a really useful excercise. So, one node was showing as above, super high throughput, but a second not. The second one, I brought down one of two connections no change in iperf3 speed. Thats odd. Beginning to suspect internal routing issues
  5. J

    2x 56GbE Optimization, slow ceph recovery, and MLNX-OS S.O.S.

    While I know this counts for very little, iperf3 running server and client on the same node looks awesome, and confirms that bonding appears to be functioning properly [ ID] Interval Transfer Bitrate Retr [ 5] 0.00-10.00 sec 132 GBytes 113 Gbits/sec 3509...
  6. J

    2x 56GbE Optimization, slow ceph recovery, and MLNX-OS S.O.S.

    Thanks Spirit, I've already been playing with ceph's configuration, and while the new QOS is new to me, it doens't appear to have effectived recovery speed at all. This further leads me to beleive the issue is likely in the network itself, not ceph, or IO limitations. checking iostat for this...
  7. J

    2x 56GbE Optimization, slow ceph recovery, and MLNX-OS S.O.S.

    I had this as a comment in another thread, but moved it here to it's own thread. I have a three node Ceph cluster that I am diagnosing the presumed slow throughput of. Each node has a on a connectx-3 Pro 2 port qsfp+. While I was running them via ib_ipoib, in an attempt to get past the low...
  8. J

    Guest-agent fs-freeze command breaks the system on backup

    Yup. haha Disabling qemu-agent does appear to work around this, but that's hardly a solution.
  9. J

    Couldn't Take Backup when QEMU Agent is Running

    I am experiencing this issue too, as are these few dozen individuals: https://forum.proxmox.com/threads/guest-agent-fs-freeze-command-breaks-the-system-on-backup.69605/ https://forum.proxmox.com/threads/problem-with-fsfreeze-freeze-and-qemu-guest-agent.65707/...
  10. J

    Follow-up: Multiple iothreads per disk?

    With regard to: https://forum.proxmox.com/threads/ceph-read-performance.25785/page-3 I don't see on the roadmap parallelization of disk IO, but recognize that it would have a substantial benefit to small-medium sized ceph clusters, which tend to have low single threaded IO. Currently, I've...
  11. J

    Ceph Health Warning: Module 'telemetry' has failed dependency: No module named 'requests';

    Hey, I've sortof ignored this health warning for a long while, but decided to try to take a crack at fixing it. I recognize the super common python error, but upon review, my python installation(s) do indeed have requests installed, as shown below... I beleive my use of anaconda3 for virtual...
  12. J

    Ceph Deduplication (14.2.4)

    As this is a top performing page in Google: VDO appears to work on top of RBD volumes in my testing. I had some trouble with boot ordering, but seems fine otherwise. If you're working on this stuff for fun, a word of caution, be very very careful to keep duplicate copies of your VDO data for...
  13. J

    Proxmox hyper-converged (ceph) cascading failure upon single node crash/power loss

    Update here, I beleive that issue is tracked down to the opensm subnet manager used to run the infiniband network. When the active SM went offline, everything else went to hell before the new SM picked up, which I am not totally use how to resolve, but that's where I appear to be at.
  14. J

    Proxmox hyper-converged (ceph) cascading failure upon single node crash/power loss

    Absolutely, I an running an infiniband cluster network via ib_ipoib for cluster communication and corosync, otherwise primarily just normal LAN.. auto lo iface lo inet loopback auto eno1 iface eno1 inet manual #going to vmbr0 auto eno2 iface eno2 inet manual #going to vmbr0 auto eno3 iface...
  15. J

    Proxmox hyper-converged (ceph) cascading failure upon single node crash/power loss

    I have three Mons, sometimes I've had four, but that didn't seem to impact anything. Similarly, three managers, and three MDS. Dang, seems like the osd logs were wiped when the osds restarted, but I do have ceph.log, which shows the timeline of the issue. (I've heavily filtered the attached to...
  16. J

    Proxmox hyper-converged (ceph) cascading failure upon single node crash/power loss

    Hello, I seem to be having troubles with failover / redundancy that I am hoping someone in the community might be able to help me understand. I have a four node cluster, which I am working to ensure high-availability of the vms and containers being managed. This is a hobby cluster in my...
  17. J

    Verifying and changing te Compression Level of Backups

    Sorry to dredge up this old post. Can anyone point out where gzip is called by proxmox when processing daily backups? I have an (FPGA based) hardware gzip accelerator which I am hoping to direct proxmox to use, but am having troubles tracking down the call to gzip.
  18. J

    I/O error /dev/sda pve tainted IO

    Hey, did you ever track down the issue you were experiencing? I am seeing very similar warnings for an 8TB ultrastar on a proxmox node, not running ZFS, but *is* under heavy write load (backfilling ceph osd): --> Bold added to the line that brought me here. [152363.867593] mpt2sas_cm0...
  19. J

    Feature Request: Backup Staging Drive for SMR Capable Linear Writes

    Feature Request: SMR Capable Linear Writes Loving what proxmox is up to with PBS! One feature which would be immensely helpful would be the ability to safely use SMR drives without issue. To be transparent, I simply have some I would love to use, but the use-case for enterprise is clear...