[SOLVED] Troubleshooting High Server Load on Node running LXC Containers

Donovan Hoare

Active Member
Nov 16, 2017
16
3
43
42
Hi All.
I'm writing this in a hope to help others.
Scenario:
I run over 100 LXC Containers on this node. As you can see in the server load graph I was having an average load of over 70 With 24 CPU's
That is not an option.

Well like all IT i Restarted the node,
I replaced 2 hard disks in my raid array as they were giving media errors.
But still no luck.

It seems I was just an idiot and didn't start where I should of, I checked Syslog and found LXC Containers giving ext4 errors.
EG:
EXT4-fs warning (device dm-73): ext4_dirent_csum_verify:353:

To get container number i ran
ls -alh /dev/mapper/ | grep dm-73

> that gave me
lrwxrwxrwx 1 root root 8 Dec 26 06:22 pve-vm--255--disk--1 -> ../dm-73

So container ID 255 was the problem.
I shut down the container and ran

fsck -l /dev/pve/vm-255-disk-1

I had to fix a lot of items.
After that as per the graph below my load dramatically dropped to acceptable limits.
I hope this helps new users to proxmox.
I also would not have thought 2 container file systems could cause so much havoc.

Selection_291.png
 
Thanks for sharing your Problem and its solution - It will probably help others who run into the same issue!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!