High IO delay and temps

Airw0lf

Member
Apr 11, 2021
70
3
13
61
it-visibility.net
Team,

Since a few days there are some changes in the behavior of the Proxmox server in my home-lab.
On boot, there is a high IO delay (10-15%) for up to 30 minutes (versus far less then 1% before).
In addition, the monitoring tool is reporting a "temp1" raise going from 55-60-C to 70-plus-C - probably the VRM's. But I can not find any conclusive answers here.

Does this ring any bells?
Any suggestions where to start analyzing this change?
 
Last edited:
Prophets AFAIK do not frequent this forum. So I'm perplexed - as I am to numerous similar posts on this forum ; "This morning my server is failing, what is wrong with it?" - how can you expect help when you provide zero info. I know you probably think, as do those other posters, that probably many PVE systems got booted this morning & are experiencing similar behavior as yours, but in real life that usually is not the case.

You need to provide at least basic HW (incl. storage), NW & Guest usage (LXC & VM) so we can even have a picture of what you are facing.

Now let's try some prophecy:

Since a few days
Analyze what within your setup has changed. Updates? Infrastructure? NW? etc.

On boot, there is a high IO delay (10-15%) for up to 30 minutes
Check the logs for that period?

going from 55-60-C to 70-plus-C
After that 30-min boot period - do the temps settle? During the initial 30-min period what is accessible/inaccessible? Is something lagging then etc.?

I'd check the following - in this order:

  • Check logs for more info.
  • Physical internal inspection - focusing on the ventilation system, heat sink/s, PSU, cabling & board connections.
  • Disk/Storage checks. SMART data etc.
  • RAM check.

Good luck.
 
  • Like
Reactions: Airw0lf
Prophets AFAIK do not frequent this forum. So I'm perplexed - as I am to numerous similar posts on this forum ; "This morning my server is failing, what is wrong with it?" - how can you expect help when you provide zero info. I know you probably think, as do those other posters, that probably many PVE systems got booted this morning & are experiencing similar behavior as yours, but in real life that usually is not the case.

You need to provide at least basic HW (incl. storage), NW & Guest usage (LXC & VM) so we can even have a picture of what you are facing.

Now let's try some prophecy:


Analyze what within your setup has changed. Updates? Infrastructure? NW? etc.


Check the logs for that period?


After that 30-min boot period - do the temps settle? During the initial 30-min period what is accessible/inaccessible? Is something lagging then etc.?

I'd check the following - in this order:

  • Check logs for more info.
  • Physical internal inspection - focusing on the ventilation system, heat sink/s, PSU, cabling & board connections.
  • Disk/Storage checks. SMART data etc.
  • RAM check.

Good luck.

I know that it was rather an open question with nothing to go on.

Everything is working as expected - its just different behavior.
There are no problems in the logs - comparing was not possible as the old logs where already purged.

It happens somewhere in the following series of changes over the last 7-10 days:
* pair bonding mode 5 with the on-board adapter and one port of a 4-ports network adapter
* replacing the two 4-TBytes disk raid-0 config with a two 8-TBytes disk raid-1 config - both software raid.
* added an internal drive as a kind of intermediate storage
* added an external (USB) drive for backups

The system is based on an Asus TUF gaming B550-Plus.
The CPU is an AMD Ryzen 7 5700X 8-Core Processor and 64-GBytes of RAM (i.e. 2 modules of 32-GBytes).
One of the video card slots is equiped with a 2-port 10-Gbps network card. And the other with a 4-port 1-Gbps card.
Both of these network cards PCI-e X8 models.
The video card is a PCI-e X1 model and installed in the last PCI-e X1 slot (i.e. the one closest to the PSU).

Cooling is with 2 fans running at maximum speed - one on the CPU and one above the cards.
The build is based on a Q300L case and is open on all sides.

Normally I have 2 VM's running and 3 LXC-containers. One of these VM's gives a 10% CPU load.
This load is based on analyzing packets coming in via one of the 10-Gbps ports.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!