I have just migrated from ESXi, where my server and VMs were running for months on end with zero issues. However, since migrating, my host has been extremely unstable, and will hang and lock up after a few hours. I have a large VM with 34GB of RAM and all 24 threads, plus 2 smaller ones each with 1GB. My host server is running Proxmox 6.4-9 and has 48GB of RAM. This should be a total of 36GB of RAM used by VMs alone, leaving 12GB of RAM for Proxmox and whatever it wants to do, yet it always seems to be running out of both RAM and CPU.
I can't find much info about it, but top says pmxcfs is eating up 500% of my 24 threads, and Proxmox web GUI reports a consistent 50% CPU utilization even when my VMs are shut off. In addition, I notice that over a few hours, my RAM utilization slowly rises from 30GB up to 48GB, then the entire swap fills up and Proxmox grinds to a halt. The Proxmox web GUI also breaks eventually, with all VMs and storage nodes showing as offline and performance graphs ending (attached image). Shutting down/restarting VMs does nothing, and I have no idea what is using the RAM as it's not listed in top. A few hours ago, I was having horrible performance issues in my VMs, so I disabled swap on the host, which greatly improved performance, and shouldn't be an issue seeing as I've left 12GB for it, yet I've found Proxmox still manages to eat up all 12GB of it and 100% CPU, then I get some kernel soft lockup messages before everything completely freezes and I'm forced to reset the server.
I feel like I'm going insane, I've spent hours staring at this and researching, but no matter what I try, Proxmox will eventually use 100% of my CPU and 100% of my RAM/swap until I have to reset the system. In the 10 minutes it took me to write this, I've watched my RAM usage rise from 40GB to 46GB, and it's still climbing ever closer to 48GB... My CPU utilization has also risen to 70% even though the VMs are currently idle. Does anybody have any idea what could be wrong? I could try reinstalling Proxmox I guess, but this install is brand new already, so I don't know how that would help. I apologize for this post ending up as kind of a rant. I'm not sure how best to present the information, but I can try to provide more if necessary. Thank you for your time.
Some other info I thought might be important: Proxmox is running on a Dell R710 with a Perc H700 that has 3 SATA SSDs as single-drive RAID 0 arrays, one for Proxmox, another for VMs, and a 3rd unused drive. I'm using ext4 everywhere and not ZFS because of concerns with running that on the RAID card. I have no dedicated GPU. I do have 2 unused QLogic QLE2560s in the server though. I can't find a single log about the crashes or a possible cause anywhere either.
EDIT: I forgot to add, on ESXi, CPU utilization was usually around 10% on the host, and RAM usage was also very consistent at around 46GB. I realize RAM usage isn't all that comparable between hypervisors, but CPU usage definitely is fairly comparable and shouldn't be jumping from 10% up to 50% or more. I even shut down my experimenting VM to try and help out Proxmox, reducing RAM consumption by another 8GB, but it's still performing worse.
Here is the output from "free -m" shortly before I posted this:
I can't find much info about it, but top says pmxcfs is eating up 500% of my 24 threads, and Proxmox web GUI reports a consistent 50% CPU utilization even when my VMs are shut off. In addition, I notice that over a few hours, my RAM utilization slowly rises from 30GB up to 48GB, then the entire swap fills up and Proxmox grinds to a halt. The Proxmox web GUI also breaks eventually, with all VMs and storage nodes showing as offline and performance graphs ending (attached image). Shutting down/restarting VMs does nothing, and I have no idea what is using the RAM as it's not listed in top. A few hours ago, I was having horrible performance issues in my VMs, so I disabled swap on the host, which greatly improved performance, and shouldn't be an issue seeing as I've left 12GB for it, yet I've found Proxmox still manages to eat up all 12GB of it and 100% CPU, then I get some kernel soft lockup messages before everything completely freezes and I'm forced to reset the server.
I feel like I'm going insane, I've spent hours staring at this and researching, but no matter what I try, Proxmox will eventually use 100% of my CPU and 100% of my RAM/swap until I have to reset the system. In the 10 minutes it took me to write this, I've watched my RAM usage rise from 40GB to 46GB, and it's still climbing ever closer to 48GB... My CPU utilization has also risen to 70% even though the VMs are currently idle. Does anybody have any idea what could be wrong? I could try reinstalling Proxmox I guess, but this install is brand new already, so I don't know how that would help. I apologize for this post ending up as kind of a rant. I'm not sure how best to present the information, but I can try to provide more if necessary. Thank you for your time.
Some other info I thought might be important: Proxmox is running on a Dell R710 with a Perc H700 that has 3 SATA SSDs as single-drive RAID 0 arrays, one for Proxmox, another for VMs, and a 3rd unused drive. I'm using ext4 everywhere and not ZFS because of concerns with running that on the RAID card. I have no dedicated GPU. I do have 2 unused QLogic QLE2560s in the server though. I can't find a single log about the crashes or a possible cause anywhere either.
EDIT: I forgot to add, on ESXi, CPU utilization was usually around 10% on the host, and RAM usage was also very consistent at around 46GB. I realize RAM usage isn't all that comparable between hypervisors, but CPU usage definitely is fairly comparable and shouldn't be jumping from 10% up to 50% or more. I even shut down my experimenting VM to try and help out Proxmox, reducing RAM consumption by another 8GB, but it's still performing worse.
Here is the output from "free -m" shortly before I posted this:
Code:
total used free shared buff/cache available
Mem: 48339 46985 216 528 1137 318
Swap: 8191 247 7944
Attachments
Last edited: