I've got a relatively recent issue. If one of my nodes reboots, the other 2 will lock up and reboot within a few seconds. I'm wondering if this is due to pinning the IRQ's to a specific CPU core, as this hasn't happened to me in the past and that's the most recent change i've made outside of...
hmmm, i'm seeing errors in dmesg around the time the system locks up. i'm sure there's a relation.
[65039.247828] x86/split lock detection: #AC: CPU 0/KVM/623152 took a split_lock trap at address: 0xfffff8052744bb6d
I made some progress on this.
If i do a less aggressive benchmark using the "default" profile don't get a lockup. SEQ1M Q8T1, SEQ1M Q1T1, RND4K 32T1, RND4K Q1T1 it runs fine.
The more aggressive "ssd" profile is where it has a problem. SEQ1M Q8TQ, SEQ128K Q32T1, RND4K Q32T16, RND4KQ1T1...
I changed the io scheduler to native and it died again. it seems to keep dying on the random read tests. it seems to be getting through the sequential 1M Q8Tq, and the SEQ128kQ32T1. It keeps dying on either Random 4k Q32T16 or RND4kQ1T1. i'm unsure if that has any bearing.
It doesn't make...
I have a windows VM and i'm running crystal disk benchmark in it, and during the benchmark the VM does a soft-lockup. This is repeatable in my environment. I'm running the latest version of PVE 8.1.11, kernel 6.5.13-5-pve, and ceph reef version 18.2.2
What i mean by a soft lockup is: If task...
So cable pulls worked fine. But I'm having the same problem of frr restarting too early as a post-up command and not surviving a reboot, so I tried your if-up script and having the same issue of only 1 interface coming up on boot
putting a post-up script by itself is not working on boot, but does work if i restart networking. Since I'm testing things remote i can't test if it helps with physically plugging cables in.
I also tried @dovh 's method with an if-up.d script, this did work on a reboot! Next tonight i will...
Pretty close to what I did, but running this on boot only doesn't work. On my system I've seen that the interrupt addresses have changed when thunderbolt cables were plugged and unplugged. So i'm thinking it might make more sense to make this part of the udev rule that scyto setup to bring up...
ok, i've been messing with this for the better part of 2 hours. i figured it out. I noticed that when i was getting a lot of retries ksoftirqd was eating up nearly 100% of a CPU core, but when i told iperf3 to limit to 19gbps and i got few retries ksoftirqd was using hardly any cpu. I found...
The CPU has its own tb4 controller and dedicated lanes for both ports. Ive also gotten 26gbps at other points of my testing I had to send a unit back due to a bad rj45 port. But that was also with high retries.
I did also hit a weird wall where if my mtu size exceeded 35,000 or so, ceph would...
I'm running different hardware than you guys, the minisforum ms-01. I'm getting a lot of retransmits with iperf3. I started with some no-name thunderbolt4 cables, "connbull". I ordered and tried a belkin tb4 cable and an owc tb4 cables no apparent change. I've followed @scyto guide over on...
I considered that, but i coudln't find any A/E nvme drives in the US. I found one in Europe and i found a bunch of adapters. the usb storage has been working fine though. I'm probably going to put a google coral in the A/E slot eventually, but going to try openvino first with the v-igpu.
I...
My goal of this project was to decrease power utilization, and gain redundancy, from my single r730xd which has dual e5-2690v4 cpu's and 256 gb of ram. My r730xd's power floor, even with power tuning is around 280w (that's with storage).
Each ms-01 node has 96gb of ram, and 3 nvme drives...
A lot of my VM's and CT's are for fun home hosting stuff, but a few of them are important. My cluster has 3 nodes and while it has enough resources if all the nodes are up, memory get a little tight if one of them goes down. There's enough, but it depends on how the VM's wind up being...
I'm doing the homelab with a trio of ms-01s. Each host has 1x7.68 u.2, and 2x 3.84tb m.2. booting off of a USB to nvme adapter. Ceph network is a thunderbolt mesh with open fabric routing, and using the built in sfp ports lacp'ed to my switch for proxmox vms, 1 of the 2.5 ports dedicated for...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.