Search results

F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

PS : some more findings ls -la /etc/pve/qemu-server/ gives you an idea, which vms are still running on this host ls -la /var/run/qemu-server/ still gives you access to the vnc, serial and qemu sockets (also for debugging) # still working qm migrate <id> <target server> qm terminal <id> qm...
- FXKai
- Post #8
- Aug 22, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] kernel panic since update to Proxmox 7.1

PS : some more findings ls -la /etc/pve/qemu-server/ gives you an idea, which vms are still running on this host # still working qm migrate <id> <target server> ps ax blocks just before the stale VM, but you can still loop over the procfs to get process info, i.e. for i in `ls /proc |egrep...
- FXKai
- Post #11
- Aug 22, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

Unfortunately, we are back in the situation, the BIOS update did not solve the problem, but we can nail it down to blocking VM has data on the local RAID controller (Perc H755, 4 x NVME Raid-10). Other VM's (running purely in CEPH Storage) are not affected.
- FXKai
- Post #7
- Aug 22, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

We updated our BIOS (Dell R6525 and the Perc Firmware) which helped to solve the problem. I guess the firmware of the Perc Controller was the main issue
- FXKai
- Post #6
- Aug 7, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

[1286374.147087] INFO: task khugepaged:796 blocked for more than 120 seconds. [1286374.154005] Tainted: P O 5.15.35-1-pve #1 [1286374.160027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1286374.168042] task:khugepaged state:D stack: 0...
- FXKai
- Post #3
- Jun 15, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 48 bits physical, 48 bits virtual CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s)...
- FXKai
- Post #2
- Jun 15, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

Hello Everybody, not sure if this thread (https://forum.proxmox.com/threads/kernel-panic-since-update-to-proxmox-7-1.101164/#post-437435) is related but since we updated to PVE7.2 we have repeated crashes with kernel message INFO: task khugepaged:796 blocked for more than 120 seconds...
- FXKai
- Thread
- Jun 15, 2022
- amd epyc crash kernel 5.15
- Replies: 11
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

Ahh perfect. This means after waiting a few days, your performance came back to normal?
- FXKai
- Post #22
- May 16, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

This means, that your HW setup runs with NPS1 (see page 10 at https://developer.amd.com/wp-content/resources/56745_0.80.pdf) for details. I'm not too much of an expert in the nasty details of the AMD Epyc NUMA architecture but would say, that is is not your bottleneck (might give you some extra...
- FXKai
- Post #20
- May 7, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

For the Nexus, i dont know which exact Model you have but one of the faster ones, a 3172PQ seems to be around 850ns, other models might be (significantly) slower, but you will need to google this. The Lenovo switch I mentioned above is around 570ns, while the Mellanox goes down to 270ns for...
- FXKai
- Post #18
- Apr 29, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

1.) We use 2 x Mellanox SX1012 with 40GE QSFP+ and MLAG (before we used 2 x Lenovo GE8124E on MLAG with 10GE QSFP, similar performance). Note : both switches, the Lenovo and the SX1012 are cut-through switches 2.) default Linux/PVE driver with Mellanox 40GE CX-354 QSFP+ 3.) no Intel DPDK running...
- FXKai
- Post #16
- Apr 29, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

Hi Alibek, I would say, that after 3-4 weeks pulling my hair out, the cluster came back to normal operation speed. We actually never really figured out what the problem was but our feeling is, that the reconstruction of the OMAP data structures took quite some time in. the background. We also...
- FXKai
- Post #14
- Apr 29, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

Some more investigation reveals, that since the day(s) we split our 7TB SSD's into 4 OSD's (around Dec 8th), the latencies on this OSD's dropped significantly and never spiked again, so we can at least say, that this solved our issue. What caused the high latencies on these drives after the...
- FXKai
- Post #12
- Dec 22, 2021
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

2 weeks later everything got "normal", rados bench gives these values Total time run: 60.0139 Total writes made: 2301842 Write size: 4096 Object size: 4096 Bandwidth (MB/sec): 149.825 Stddev Bandwidth: 6.85404 Max bandwidth (MB/sec): 163.844 Min...
- FXKai
- Post #11
- Dec 21, 2021
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

We doubled our RAM but not difference. As i am starting to hunt down the bug i would like to see the PVE compile flags for CEPH or even compile it myself. Is there a guide how to rebuild the ProxMoxx packages on your own?
- FXKai
- Post #10
- Dec 8, 2021
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

We have approx 30GB cached, so i guess its not this. Still we will double the ram soon. Meanwhile I am hesitating to "escape" forward to Pacific unless i found some valid reasoning for the problem...
- FXKai
- Post #9
- Nov 25, 2021
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

What makes me wonder and what i can not explain is that the cluster used to be I/O bottlenecked (if i interpret it correctly) and since the update this changed. See the two example CEPH nodes below ... Updated PVE to PVE7.x two days ago, running the latest kernel, but no changes
- FXKai
- Post #6
- Nov 23, 2021
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

128 Queues ... root@xx-ceph01:~# fio -ioengine=rbd -direct=1 -name=test -bs=4k -iodepth=1 -rw=randwrite -pool=SSD -runtime=30 -rbdname=testimg -iodepth=128 test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=128 fio-3.12 Starting 1 process...
- FXKai
- Post #5
- Nov 17, 2021
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

both, launched from ceph node or from compute node deliver the same result. just got a warning, that one of my ceph node ran out of swap! (still having 30GB linux fs cache free), dont know if this is related, swapoff and rados bench does not change a thing. can it be that octopus eats more ram...
- FXKai
- Post #4
- Nov 14, 2021
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

Our ceph.conf [global] auth client required = cephx auth cluster required = cephx auth service required = cephx cluster network = xx fsid = aef995d0-0244-4a65-8b8a-2e75740b4cbb # keyring = /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true...
- FXKai
- Post #2
- Nov 13, 2021
- Forum: Proxmox VE: Installation and configuration

Top Bottom