Recent content by FXKai

F
[SOLVED] OSD's not starting after upgrade 18.2.4 -> 18.2.7

We found a kind of solution ourselves. We had caching values set in our ceph.conf. Apparently, this is what Ceph has become stricter about since version 18.2.7 or higher (we are now on 19.2.3). The startup checks have become more picky with this. Example error message...
- FXKai
- Post #2
- Nov 14, 2025
- Forum: Proxmox VE: Installation and configuration
F
Incorrect file permission set on file `/etc/pve/ceph.conf` causes `ceph` user could not access the ceph config file in PVE cluster

While it is ugly that this service has to be patched at all, the right and update-stable way of "fixing" the file would be using override files: # mkdir /etc/systemd/system/ceph-mgr@.service.d # cat >/etc/systemd/system/ceph-mgr@.service.d/override.conf <<EOF [Service] ExecStart=...
- FXKai
- Post #12
- Nov 14, 2025
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] OSD's not starting after upgrade 18.2.4 -> 18.2.7

Hi, after running the latest apt dist-upgrade and rebooting one of our storage servers, all OSD's on this node refuse to start. After downgrading all CEPH packages the OSD's start immediately. Any help appreciated :) Setup: 7 x CEPH Storage nodes, 3 x PVE Compute Nodes Log files of the...
- FXKai
- Thread
- Nov 11, 2025
- Replies: 1
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

PS : some more findings ls -la /etc/pve/qemu-server/ gives you an idea, which vms are still running on this host ls -la /var/run/qemu-server/ still gives you access to the vnc, serial and qemu sockets (also for debugging) # still working qm migrate <id> <target server> qm terminal <id> qm...
- FXKai
- Post #8
- Aug 22, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] kernel panic since update to Proxmox 7.1

PS : some more findings ls -la /etc/pve/qemu-server/ gives you an idea, which vms are still running on this host # still working qm migrate <id> <target server> ps ax blocks just before the stale VM, but you can still loop over the procfs to get process info, i.e. for i in `ls /proc |egrep...
- FXKai
- Post #11
- Aug 22, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

Unfortunately, we are back in the situation, the BIOS update did not solve the problem, but we can nail it down to blocking VM has data on the local RAID controller (Perc H755, 4 x NVME Raid-10). Other VM's (running purely in CEPH Storage) are not affected.
- FXKai
- Post #7
- Aug 22, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

We updated our BIOS (Dell R6525 and the Perc Firmware) which helped to solve the problem. I guess the firmware of the Perc Controller was the main issue
- FXKai
- Post #6
- Aug 7, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

[1286374.147087] INFO: task khugepaged:796 blocked for more than 120 seconds. [1286374.154005] Tainted: P O 5.15.35-1-pve #1 [1286374.160027] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1286374.168042] task:khugepaged state:D stack: 0...
- FXKai
- Post #3
- Jun 15, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 48 bits physical, 48 bits virtual CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s)...
- FXKai
- Post #2
- Jun 15, 2022
- Forum: Proxmox VE: Installation and configuration
F
Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

Hello Everybody, not sure if this thread (https://forum.proxmox.com/threads/kernel-panic-since-update-to-proxmox-7-1.101164/#post-437435) is related but since we updated to PVE7.2 we have repeated crashes with kernel message INFO: task khugepaged:796 blocked for more than 120 seconds...
- FXKai
- Thread
- Jun 15, 2022
- amd epyc crash kernel 5.15
- Replies: 11
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

Ahh perfect. This means after waiting a few days, your performance came back to normal?
- FXKai
- Post #22
- May 16, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

This means, that your HW setup runs with NPS1 (see page 10 at https://developer.amd.com/wp-content/resources/56745_0.80.pdf) for details. I'm not too much of an expert in the nasty details of the AMD Epyc NUMA architecture but would say, that is is not your bottleneck (might give you some extra...
- FXKai
- Post #20
- May 7, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

For the Nexus, i dont know which exact Model you have but one of the faster ones, a 3172PQ seems to be around 850ns, other models might be (significantly) slower, but you will need to google this. The Lenovo switch I mentioned above is around 570ns, while the Mellanox goes down to 270ns for...
- FXKai
- Post #18
- Apr 29, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

1.) We use 2 x Mellanox SX1012 with 40GE QSFP+ and MLAG (before we used 2 x Lenovo GE8124E on MLAG with 10GE QSFP, similar performance). Note : both switches, the Lenovo and the SX1012 are cut-through switches 2.) default Linux/PVE driver with Mellanox 40GE CX-354 QSFP+ 3.) no Intel DPDK running...
- FXKai
- Post #16
- Apr 29, 2022
- Forum: Proxmox VE: Installation and configuration
F
[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

Hi Alibek, I would say, that after 3-4 weeks pulling my hair out, the cluster came back to normal operation speed. We actually never really figured out what the problem was but our feeling is, that the reconstruction of the OMAP data structures took quite some time in. the background. We also...
- FXKai
- Post #14
- Apr 29, 2022
- Forum: Proxmox VE: Installation and configuration

Top Bottom

Back

Search

Search

Recent content by FXKai

[SOLVED] OSD's not starting after upgrade 18.2.4 -> 18.2.7

Incorrect file permission set on file `/etc/pve/ceph.conf` causes `ceph` user could not access the ceph config file in PVE cluster

[SOLVED] OSD's not starting after upgrade 18.2.4 -> 18.2.7

Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

[SOLVED] kernel panic since update to Proxmox 7.1

Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

Kernel panic, machine stuck, task khugepaged:796 blocked for more than 120 seconds

[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

[SOLVED] CEPH IOPS dropped by more than 50% after upgrade from Nautilus 14.2.22 to Octopus 15.2.15

We value your privacy