VM's become unresponsive after 1-2 weeks, believe it is caused by consumer SSDs.

Rubb

New Member
Jan 6, 2025
3
1
3
Hi,

Before I used Proxmox in a HP EliteDesk G2 to host an Ubuntu server VM and pfSense for just trying. It worked smoothly and I found it interesting so I bought some old server hardware and wanted to have som NAS solution. It caused alot of headache as VM's became unresponsive after a few weeks. Rebooting the server it worked for a few weeks again.
I believe I have identified the problem to the consumer SSDs and especially the cheap Crucial BX500 disks. So I see this post as a post that might help others as well as giving me some inputs how I should layout the storage solution in Proxmox.

Therefore current hardware specification:
Motherboard: Asrock EP2C612-WS
CPUs: 2x Intel(R) Xeon(R) CPU E5-2650L v4 (14 cores/CPU)
RAM: 128 GB DDR4 EEC Memory

Disks:
2x Seagate IronWolf 4TB 5400rpm 256MB
2x Crucial BX500 SSD 1TB
1x Crucial BX500 SSD 240GB
1x Samsung SSD 840 EVO 250GB

Storage Setup:
The whole controller for the IronWolf disks are passthroughed as a PCI Device to a TrueNAS VM.
So they are administrated through TrueNAS and are set up as mirrored ZFS.
2x Crucial BX500 SSD 1TB are setup as ZFS Mirrored in Proxmox named "ProxmoxStorage", for VM storage.
1x Crucial BX500 SSD 240GB & 1x Samsung SSD 840 EVO 250GB: are setup as ZFS Mirrored and this is where Proxmox is installed.
NAS_ISO is shared storage from TrueNAS for ISO Files for Proxmox.

Troubleshooting:
When the VM's became unresponsive it was impossible to even get out any information.
What I could see was messages like:
"INFO: task python3:2770 blocked for more than 120 seconds."
"INFO: task kworker/u32:3:232270 blocked for more than 120 seconds."

After some googling and found threads as:

My attention turned to my SSD Disks:
Code:
root@RUUBS:~# pvesm status
Name                  Type     Status           Total            Used       Available        %
NAS_ISO               cifs     active      2059730304        10226432      2049503872    0.50%
ProxmoxStorage     zfspool     active       942931968       518718808       424213160   55.01%
local                  dir     active       180755712         2548608       178207104    1.41%
local-zfs          zfspool     active       178207316              96       178207220    0.00%

Code:
root@RUUBS:~# pveperf /rpool
CPU BOGOMIPS:      123357.08
REGEX/SECOND:      2571720
HD SIZE:           169.95 GB (rpool)
FSYNCS/SECOND:     458.48
DNS EXT:           24.29 ms
DNS INT:           21.43 ms (home)

Code:
root@RUUBS:~# pveperf /ProxmoxStorage
CPU BOGOMIPS:      123357.08
REGEX/SECOND:      2574546
HD SIZE:           404.56 GB (ProxmoxStorage)
FSYNCS/SECOND:     0.02
DNS EXT:           24.78 ms
DNS INT:           20.62 ms (home)

So I have terrible FSYNCS/SECOND and as I found out BX500 is just real crap for this.
I found a review with this picture that I think is showing the cause to the problem:
1736252680392.png
Source: https://www.tomshardware.com/reviews/crucial-bx500-ssd,5377-3.html

where it is obvious that the BX500 drive can't sustain the write speed.
I have therefore ordered 2x used Kingston DC500M 960 GB disks.
My plan is to exchange the 2x Crucial BX500 SSD 1TB for these DC500M disks.
Hopefully, I have found the root cause to my problems and will get rid of the problem.

But getting into this trouble I also realize that I constantly learn things regarding virtualization and it is as always with hobbies turning into a costly experience :)
So I thought that I should ask for some advices to how I should layout my storage.
Having 4x SSD Disks for Proxmox + VM Storage feels costly and I have not bought any replacements for the Proxmox OS disks, here I'm thinking of going cheaper and use 2x MX500 disks in mirrored ZFS configuration, or if it is possible to use the 2x DC500M disks for both Proxmox and VM storage in a mirrored ZFS setup.
I plan to setup Backup to some USB Disks as a first implementation.

I hope this post can help others having problems with VM's hanging after some uptime, and maybe some advices of how one should layout the storage as a hobby installation.
 
  • Like
Reactions: Kingneutron
it is possible to use the 2x DC500M disks for both Proxmox and VM storage in a mirrored ZFS setup.
Sure it works. PVE only uses a few GB. The only downside is that if you need to reinstall, you also need to wipe your VMs. If you can live with this, go for it.

I believe I have identified the problem to the consumer SSDs and especially the cheap Crucial BX500 disks.
This is no surprise to the regular forum user. Consumer or prosumer SSDs are not good for running ZFS.
 
Sure it works. PVE only uses a few GB. The only downside is that if you need to reinstall, you also need to wipe your VMs. If you can live with this, go for it.


This is no surprise to the regular forum user. Consumer or prosumer SSDs are not good for running ZFS.
Thanks for your answer.
I went with the option to only use the 2xDC500M disks for both Proxmox Host and VM Storage in a mirrored ZFS setup.
If I will need to reinstall I guess I'll just do as usual by restoring the VMs from their backups.


With the new disks I now get:
Code:
root@RUUBS:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool   848G   174G   674G        -         -     0%    20%  1.00x    ONLINE  -

Code:
root@RUUBS:~# pveperf /rpool
CPU BOGOMIPS:      190410.64
REGEX/SECOND:      2507170
HD SIZE:           647.63 GB (rpool)
FSYNCS/SECOND:     3073.74
DNS EXT:           1005.05 ms
DNS INT:           1001.75 ms (home)

So the FSYNCS/SECOND increased a lot and I hope that the problem with unresponsiveness and requirement of reboots now go away.
 
If your boot disks are in ZFS, just mirror them to your new drives, then once resilvered, remove the old disk(s).

I’ve had similar problems on Windows clients with that brand. Updating the SSD firmware fixed it.
 
  • Like
Reactions: Kingneutron
If your boot disks are in ZFS, just mirror them to your new drives, then once resilvered, remove the old disk(s).

I’ve had similar problems on Windows clients with that brand. Updating the SSD firmware fixed it.
Hi,

Don't know if I really follow here what you are answering?

I went from having
2x240 GB in mirrored zfs for Proxmox Host
2x1TB in mirrored zfs for VM's

To having both Proxmox host and VM's on
2x960 GB disks in mirrored zfs.

Even if I were to just replace the 2x1TB disks, would I even be able to resilver these with my 2x960 GB when they are smaller?

For the SSD firmware, are you talking about the Kingston DC500M or the Crucial BX500 disks?
 
Well Proxmox developer Aaron Lauterer did a writeup how to move to smaller disks:
https://aaronlauterer.com/blog/2021/proxmox-ve-migrate-to-smaller-root-disks/
See also related discussions in the forum:
Or search for https://aaronlauterer.com/blog/2021/proxmox-ve-migrate-to-smaller-root-disks/ with the search function of the forum and you should find all related discussions