Proxmox v6 started to freeze... Bare metal server has to be restarted to get all VMs running again...

mstgeo

Member
May 15, 2022
40
1
8
Hi!

I started to have a problem with our proxmox v6 machine these days -- pls, see attached text file.

Is this the HARDWARE issue or rather a SOFTWARE one ? How can we fix and get rid that ? Anyone can help here ? Thanks in advance...

regards,
Grzegorz Leskiewicz
 

Attachments

Hi!

During that time:
Iodelay: ~ 13, cpu usage: 52%.

It is a big machine with 1.5TB RAM and 96 CPUs + nvme raid-1 for system + 6 x ssd (raid-5) for VM storage +100 VMs mostly LXC and some KVM based, No vzdump during that time.

Any thoughts ?

regards,
G.
 
No other kernel messages that might indicate further where the issue is? I/O Errors or such?

Have you done a memory test?
 
I found this today:

Jan 10 21:13:19 proxmox-server kernel: [70971.086489] proc: Bad value for 'hidepid'
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Jan 10 21:13:19 proxmox-server lxcfs[820788]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
 
Hi. Yes, each one has mysql db running on it....
Given the details of your current setup and the issues encountered, I recommend considering a migration from LXC to VMs. VMs typically offer better isolation and resource management, especially beneficial for database applications sensitive to I/O fluctuations. This change could potentially improve both performance and stability in your environment.

However, for more digging, please provide us with a full syslog that might help us to identify what the issue cause is. Additionally, as recommended above, consider upgrading your Proxmox VE node to the latest supported version.