Proxmox Crashes/Freezes when installing a VM OS or VM Downloads any Data

Stav1994

New Member
May 7, 2022
5
0
1
CPU: 10700k

RAM: 128 GB

1 480 GB SSD boot drive

x8 10TB Seagate NAS Drive

LSI 9207 HBA

750 Watt PSU

When my VMs are idle the server is stable, however when I make a VM and it downloads or uploads data (installing windows or uploading data to nextcloud for example) shortly the whole server crashes. There are is no error output to the screen before crashing. Wondering what it could be causing this if it isn't hardware related, my guess it is, but what part?

Currently on PVE 7.2
 
Your RAIDZ on HDD's is probably quite slow, partially because of write amplification. The number of IOPS is no more than what one drive can handle (if I recall correctly) and all you VMs are sharing it. If Proxmox is installed on the SSD, it should not have too much of an impact. What are the ashift of the pool and volblocksize of the virtual machines? And how does the "whole server crashes" look like?
 
Proxmox is indeed installed on an SSD, ashift is default for a pool creation. and the "whole server crashes" basically when it's downloading or uploading any data after some time it will freeze and shortly thereafter, the server will completely shut down. Memtest Passes no issues as well.
 
Proxmox is indeed installed on an SSD, ashift is default for a pool creation. and the "whole server crashes" basically when it's downloading or uploading any data after some time it will freeze and shortly thereafter, the server will completely shut down. Memtest Passes no issues as well.
It actually powers off? An ashift of 9 with raidz will result in many, many IOPS for large files and I have seem systems become poorly responsive until it is finished, but not as bad as that (especially when Proxmox is on a separate drive). Does it run out of memory for the ARC? Did you limit ZFS memory usage or is it at the default 50%. ZFS can crash the system when swap is on ZFS and it runs out of memory (and swap). Did you make sure that the VMs leave 2GB for Proxmox and enough memory for ARC?
 
It actually powers off? An ashift of 9 with raidz will result in many, many IOPS for large files and I have seem systems become poorly responsive until it is finished, but not as bad as that (especially when Proxmox is on a separate drive). Does it run out of memory for the ARC? Did you limit ZFS memory usage or is it at the default 50%. ZFS can crash the system when swap is on ZFS and it runs out of memory (and swap). Did you make sure that the VMs leave 2GB for Proxmox and enough memory for ARC?
yep it actually powers off, all my VMs/CTs take less than 20GB RAM and out of the 128 so I should stil have enough for operations. this just started happening if that makes any difference, it has run for 6+ months with no issues. Leads me to believe its either the disks or the PSU, or maybe something entirely i'm not thinking of.
 
yep it actually powers off, all my VMs/CTs take less than 20GB RAM and out of the 128 so I should stil have enough for operations. this just started happening if that makes any difference, it has run for 6+ months with no issues. Leads me to believe its either the disks or the PSU, or maybe something entirely i'm not thinking of.
Sounds like a PSU or at least a power issue. Maybe because of the specific power draw of the 8 drives, which are all active for each little write action? Maybe in combination of heat build up in the PSU or the chassis (because of the drives, maybe)? I assume the CPU also gets busy with lots of little write actions, but the drives should be the bottleneck.
EDIT: Maybe keep a look at the voltages (5V and 3.3V for SATA?) with watch -n0.3 sensors (run sensors-detect first) just before it powers off?
 
Last edited:
Sounds like a PSU or at least a power issue. Maybe because of the specific power draw of the 8 drives, which are all active for each little write action? Maybe in combination of heat build up in the PSU or the chassis (because of the drives, maybe)? I assume the CPU also gets busy with lots of little write actions, but the drives should be the bottleneck.
EDIT: Maybe keep a look at the voltages (5V and 3.3V for SATA?) with watch -n0.3 sensors (run sensors-detect first) just before it powers off?
Replaced all SATA cables as a thought already, ordered a new 750 Watt PSU and gonna try that. Will update this post once i get it installed and test it out after a few days
 
Yes, those desktop machines do not have proper error detection capabilities. For enterprise hardware you will have your error log (SEL).

EDIT: Maybe keep a look at the voltages (5V and 3.3V for SATA?) with watch -n0.3 sensors (run sensors-detect first) just before it powers off?
Yeah, I would have suggested the same thing. In addition try to use a kill-a-watt or similar/available in your region for measuring the actual power consumption between the outlet and your machine.

Replaced all SATA cables as a thought already, ordered a new 750 Watt PSU
Isn't this a big huge for 8 drives? I'm using 24x including two processors on one 750W PSU and there is still hundreds of Watts available.
So do you have any other psu available just for testing? 500W should also be enough.
 
Isn't this a big huge for 8 drives? I'm using 24x including two processors on one 750W PSU and there is still hundreds of Watts available.
So do you have any other psu available just for testing? 500W should also be enough.
I agree but I worry about the non-12V voltages of some PSUs. Every PSU can do a lot of 12V nowadays but maybe 5V and 3.3V are weak and trips the PSU.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!