High CPU on 2 nodes since sunday 14/04 17h???

Hi @winproof ,

- I would first check if the BIOS/UEFI PCIs-firmware are up-to-date especially `TP-link TX-201` NIC if they're already up-to-date, consider testing with a different NIC model to see if the issue persists.

- Since the issue appears to be related to the backup involving the NFS storage, I would check the NFS server logs as well for any error or activity during the backup time.

- As @gfngfn256 said, the zimbra VM seems to be triggering the problem, I would check the zimbra VM logs again for any hidden issues or isolation the zimbra VM i.e., stop the zimbra VM in order to see if the issue still occurs. This can narrow down if the issue really related to that VM.
 
ok, all firmwares are up-to-date.
but TX201 still use r8169 driver, I haven't yet tried using the driver supplied by tp-link.

this weekend I will try to make the backup using integrated network card, to remove the 3rd link (used only for migration, corosync is in another link) just delete virtual bridge (and delete "migration: network=10.0.1.10/24,type=insecure" in datacenter.cfg) is enough?
or should I also delete the network card entry in gui (or physicaly remove card)?
 
Last edited:
this weekend I will try to make the backup using integrated network card, to remove the 3rd link (used only for migration, corosync is in another link) just delete virtual bridge (and delete "migration: network=10.0.1.10/24,type=insecure" in datacenter.cfg) is enough?
You can change the network migration in Proxmox VE Web UI by going to Datacenter -> Options -> Migration Settings.
 
okay, I think I've found the root of the problem!
really vicious, in fact it has nothing to do with the zimbra VM, proxmox or with the network....
in fact the problem appears every time my UPS management software runs a self-test (every 2 weeks in my case)
it's obviously a hardware problem on the DELL r250 side (which explains why my t430 isn't affected even though my 3 servers are on the same UPS), I came across a post by someone who noticed the same problem on an r250, even though in his case it's under Windows.

https://www.dell.com/community/en/c...clock-lock-at-20-ghz/65f836bc6f77eb5fca246c7d

apparently, after switching to uninterruptible power supply, something happens on the r 250 that limits the cpu to 0.2Ghz. o_O

I reported the problem to DELL, and it's in their hands now.

in any case, thanks for the help! (and sorry for wasting your time on something that ultimately has nothing to do with proxmox! :) )
 
Last edited:
Having read the link you provided & reread this entire thread, it looks most likely that you've found the cause.

You could check to confirm if this is indeed the problem by doing an lscpu & then simulating a UPS test/power-failure & then another lscpu after power is back to normal & then compare the results.

I hope Dell can/will actually deal with this. (This maybe HW specific. It would be interesting to know what HW you share with CwH4225 on the Dell site; CPU etc.?). If all else fails - you may have to test with another UPS that has a clean pure-sine AC wave (as suggested by CwH4225).
 
i confirm :)

before self-test :

Code:
root@pve2-r250:~# cat /proc/cpuinfo | grep "MHz"
cpu MHz         : 3401.182
cpu MHz         : 3404.657
cpu MHz         : 3399.991
cpu MHz         : 800.000
cpu MHz         : 800.000
cpu MHz         : 800.000
cpu MHz         : 800.000
cpu MHz         : 1890.949

after :

Code:
root@pve2-r250:~# cat /proc/cpuinfo | grep "MHz"
cpu MHz         : 200.008
cpu MHz         : 200.011
cpu MHz         : 199.996
cpu MHz         : 199.989
cpu MHz         : 199.999
cpu MHz         : 200.065
cpu MHz         : 200.032
cpu MHz         : 200.000
 
  • Like
Reactions: gfngfn256
i confirm
I imagine there must be others (excepting you & the poster on Dell) who are suffering the same, but are unaware or have yet to diagnose the problem. Since a complete power off & restart will reset the freq. back to normal - it will be hard to spot. However this will also be your "workaround". I really hope that Dell actually addresses this issue. (It may require a physical change/recall. You better hope it can be adjusted somehow within the BIOS/update).
 
We have the same server and the same problem.
Is connected to an Eaton UPS that performs a periodic self test every week; again, we believe this is what is generating the problem.
We will check the connector mentioned by Dell in the next few days.
We too apsect answers from Dell support.
 
Thanks for the update.
Today we found that we can reproduce the problem by disconnecting the AC power to the UPS (Eaton Ellipse Pro 1200).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!