[SOLVED] Troubleshooting High Server Load on Node running LXC Containers

Donovan Hoare

Well-Known Member
Nov 16, 2017
30
6
48
43
Hi All.
I'm writing this in a hope to help others.
Scenario:
I run over 100 LXC Containers on this node. As you can see in the server load graph I was having an average load of over 70 With 24 CPU's
That is not an option.

Well like all IT i Restarted the node,
I replaced 2 hard disks in my raid array as they were giving media errors.
But still no luck.

It seems I was just an idiot and didn't start where I should of, I checked Syslog and found LXC Containers giving ext4 errors.
EG:
EXT4-fs warning (device dm-73): ext4_dirent_csum_verify:353:

To get container number i ran
ls -alh /dev/mapper/ | grep dm-73

> that gave me
lrwxrwxrwx 1 root root 8 Dec 26 06:22 pve-vm--255--disk--1 -> ../dm-73

So container ID 255 was the problem.
I shut down the container and ran

fsck -l /dev/pve/vm-255-disk-1

I had to fix a lot of items.
After that as per the graph below my load dramatically dropped to acceptable limits.
I hope this helps new users to proxmox.
I also would not have thought 2 container file systems could cause so much havoc.

Selection_291.png
 
Thanks for sharing your Problem and its solution - It will probably help others who run into the same issue!