Recent content by anaxagoras

  1. A

    MS-01 HCI HA Ceph Sanity Check

    i 3d printed a fan adapter, to slap a 140mm fan to the bottom of the case, i removed the stock nvme fan, and put heatsinks on all the nvme drives https://www.printables.com/model/1208615-minisforum-ms-01-140mm-fan-upgrade
  2. A

    Ceph keeps crashing, but only on a single node

    I think you were correct. It hasn't crashed since i swapped out the entire system about 50 hours ago, i was getting daily crashes of ceph and hard lockups on the system...
  3. A

    Ceph keeps crashing, but only on a single node

    I was initially thinking hardware issue, but having run memtest86, and various cpu benchmarks nothing weird happened. I've also run various storage benchmarks to stress test ceph. I have another identical node without storage, new ram, new cpu, i'm going to try swapping the hardware...
  4. A

    Ceph keeps crashing, but only on a single node

    May 20 20:17:06 pve2 ceph-mgr[2004]: *** Caught signal (Segmentation fault) ** May 20 20:17:06 pve2 ceph-mgr[2004]: in thread 7cb342dad6c0 thread_name:msgr-worker-0 May 20 20:17:06 pve2 ceph-mgr[2004]: ceph version 18.2.7 (4cac8341a72477c60a6f153f3ed344b49870c932) reef (stable) May 20 20:17:06...
  5. A

    Ceph keeps crashing, but only on a single node

    I've been trying to figure this out for over a week and i'm getting nowhere. I have 3 machines with identical hardware,, each with 3 enterprise nvme drives. 2x 4tb samsung m.2 pm983, and 1x 8 tb samsung u.2 pm983a (i think this is an oem drive for amazon). For some reason PVE2 keeps getting...
  6. A

    Virtiofs - high usage of cached and shared memory

    i'm exploring using virtiofs, and i find this concerning, did you ever resolve the issue?
  7. A

    Intel Nuc 13 Pro Thunderbolt Ring Network Ceph Cluster

    I've got a relatively recent issue. If one of my nodes reboots, the other 2 will lock up and reboot within a few seconds. I'm wondering if this is due to pinning the IRQ's to a specific CPU core, as this hasn't happened to me in the past and that's the most recent change i've made outside of...
  8. A

    Intel Nuc 13 Pro Thunderbolt Ring Network Ceph Cluster

    I've got 3x minisforum ms-01 with a 13900h. Even with the --bidir flag you have no problem with iperf hitting 27?
  9. A

    VM soft-locking up with ceph during disk benchmark

    hmmm, i'm seeing errors in dmesg around the time the system locks up. i'm sure there's a relation. [65039.247828] x86/split lock detection: #AC: CPU 0/KVM/623152 took a split_lock trap at address: 0xfffff8052744bb6d
  10. A

    VM soft-locking up with ceph during disk benchmark

    I made some progress on this. If i do a less aggressive benchmark using the "default" profile don't get a lockup. SEQ1M Q8T1, SEQ1M Q1T1, RND4K 32T1, RND4K Q1T1 it runs fine. The more aggressive "ssd" profile is where it has a problem. SEQ1M Q8TQ, SEQ128K Q32T1, RND4K Q32T16, RND4KQ1T1...
  11. A

    VM soft-locking up with ceph during disk benchmark

    I changed the io scheduler to native and it died again. it seems to keep dying on the random read tests. it seems to be getting through the sequential 1M Q8Tq, and the SEQ128kQ32T1. It keeps dying on either Random 4k Q32T16 or RND4kQ1T1. i'm unsure if that has any bearing. It doesn't make...
  12. A

    VM soft-locking up with ceph during disk benchmark

    So i did do this test under zfs before setting up ceph, i was curious to see the performance difference to a VM. It worked fine.
  13. A

    VM soft-locking up with ceph during disk benchmark

    I have a windows VM and i'm running crystal disk benchmark in it, and during the benchmark the VM does a soft-lockup. This is repeatable in my environment. I'm running the latest version of PVE 8.1.11, kernel 6.5.13-5-pve, and ceph reef version 18.2.2 What i mean by a soft lockup is: If task...
  14. A

    Opt-in Linux 6.8 Kernel for Proxmox VE 8 available on test & no-subscription

    as of today openzfs doesn't support kernels greater than 6.7, so would it be safe to assume if you use ZFS then upgrading to 6.8 is at your own risk?
  15. A

    Intel Nuc 13 Pro Thunderbolt Ring Network Ceph Cluster

    So cable pulls worked fine. But I'm having the same problem of frr restarting too early as a post-up command and not surviving a reboot, so I tried your if-up script and having the same issue of only 1 interface coming up on boot