CPU Load increases if SMB/CIFS share is not online anymore

ragsna

New Member
Jan 31, 2023
11
4
3
I have a strange phenomen which I can't explain:

I have a mounted SMB/CIFS share on a PVE 8.2.2.
The share is located on an Openmediavault server drive.
As soon as the server with the shared drive goes offline, the CPU Load of the PVE 8.2.2 increases significantly.
This is easy to recognize and repeatable.

And what is even more strange now:
I removed the SMB/CIFS share from the PVE 8.2.2 but the behaviour is exactly the same!

I've a second PVE still running on 8.0.4 which doesn't have this behaviour, while having the same shared folder mounted.

Any idea where to look for?
 
Have you tried to check which process causes the CPU load? Afterwards i would probably have a look at the log of that process to see what's happening
 
Yes, I tried this already. But I could not identify a dedicated process.
"Highest" loads are on the VMs, which I don't think they are causing the issue. And in fact the server load is not really high at all.
It's not resulting in any issues, but just strange to see the load dropping when turning the server with the shared drive on.
 
Yes, you are right in case that the share would be enabled.
But I even completely removed the share from PVE Datacenter => Storage
 
Yes, I tried this already. But I could not identify a dedicated process.
"Highest" loads are on the VMs, which I don't think they are causing the issue. And in fact the server load is not really high at all.
It's not resulting in any issues, but just strange to see the load dropping when turning the server with the shared drive on.
So I'm not really sure what you mean, does the load of the VMs increase when your disconnect the share? If that's the case you have to check what's happening on the VMs, are they trying to connect to that share? What processes are causing the loads?
Otherwise some process on the Proxmox host has to significantly increase it's load.
 
I can't identify the process which increases its load.
None of the VMs did ever have access to the share. Only the PVE had access and has meanwhile removed again (Datacenter => Storage).

Today I started the Openmediavault server with the share at 15:25.
You can clearly see the drop in the Graph:

pve_load.png

I made some screenshots of top.
First at 15:25 just before starting the server with the share:

pve_top1.png

And then after 1h having the server with the share online:
Clear drop of the PVE Load.

pve_top2.png


And then 5min after the server with the share switching off again:
Clear increase of PVE Load again.

pve_top3.png



So, I've no real clue where it could be linked to.
 
i mean you do see, that the only recognizable load variance is happening on 2 KVM process right? it might be purely coincidental but you can find out which VMs are apparently affected by checking the graphs of the VMs (two of them seem to significantly drop CPU usage as your top result suggest) or by checking the uptime of the VMs and the processes.

Then check the processes on the VM while you turn the share off, you should see which process causes it.
Or - if possible - shut them off before you deactivate the share again. The server load should not increase then.
 
Last edited:
So, when SMB goes offline I have been having PVE crash completely. VMs go unresponsive, Ceph becomes unhappy,...etc. What I found out, with 8.2.2 there is a really low SMB reconnect time that seems to run a racing condition.

In our initial situation we had a SMB server get full because of a recycle bin issue (Backups being fleeced were not being deleted off the SAN but the SAN reported the free space over the SMB protocol), the NAS being 99% full dropped SMB and created a whole cascading failure effect across several PVE nodes.

During RCA testing I found that as long as the SMB volume was enabled, after a few days PVE would just start to become slow and unresponsive. The only resolution to get back to normal operation was to reboot the affected PVE nodes. As long as we took the SMB volume offline at the datacenter level when running the testing scenarios PVE was perfectly fine. So we just added this to the maintenance plan for SMB moving forward.
 
Thanks to both of you. Will try to investigate in more detail and keep you posted if any news.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!