All VMs unusable slow under disk load

gothbert

Member
Apr 3, 2021
26
2
8
44
Dear community,

I run PVE on a Supermicro X10SLM+-LN4F board with 8 x Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz CPU and 32 GB ECC DDR3 RAM in a Supermicro Chassis with 4 x 1 TB SATA SSDs in a ZFS RAID10 array (as per recommendation in this forum a few years ago). Although the hardware is now somewhat dated, I am satisfied with the performance except when a VM has increased disk activity. This has been an issue ever since (also before with a 3 x 2 TB HDD RAIDZ3) but yesterday´s experience was so worse that I am now forced to look for a solution.

The scenario is a home lab. PVE hosts one Debian linux server VM for homeautomation with low performance requirements and several Windows 10 and 11 VMs for personal software development and testing purposes. The linux VM is always on and I mostly work with one Windows VM. If I run another one Windows VM simultaneously, the whole setup becomes unusable (irresponsive).

This is related to the disk activity when Windows wakes up from hibernation or goes into hibernation or downloads or installs updates. The actual R/W rates on the underlying filesystem of the PVE hardly exceed 15 MB/s. There is plenty of RAM left. Yesterday evening, I had to restore one VM from Proxmox Backup Server which locked the linux machine for hours (I found it in STOPPED state this morning, it refuses to boot - no bootable device - but that´s another issue).

Any suggestions how to remedy this issue would be appreciated.

Kind regards
Boris
 
A few more details would be helpful.

What SSD models do you use? Enterprise or consumer drives? Did you check wearout levels and ZFS health states?

How many VMs are running and what amount of RAM is assigned to each VM? Are the vdisks assigned as VirtIO SCSI single? 32GB in overall while ZFS is used sounds not that much.
 
What SSD models do you use? Enterprise or consumer drives? Did you check wearout levels and ZFS health states?

How many VMs are running and what amount of RAM is assigned to each VM? Are the vdisks assigned as VirtIO SCSI single? 32GB in overall while ZFS is used sounds not that much.
Thanks for the swift reply.

The 4 SSDs are SanDisk SSD Plus 1000 GB. Wearout is not reported. Here the SMART data for the fourth device (the others have similar readings with less bad blocks):


Code:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.11-7-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Marvell based SanDisk SSDs
Device Model:     SanDisk SSD PLUS 1000GB
Serial Number:    <redacted>
LU WWN Device Id: <redacted>
Firmware Version: UH5100RL
User Capacity:    1,000,207,286,272 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic
Device is:        In smartctl database 7.3/5319
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Dec  9 12:18:55 2023 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x15) SMART execute Offline immediate.
No Auto Offline data collection support.
Abort Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 182) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       12431
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       52
165 Total_Write/Erase_Count 0x0032   100   100   000    Old_age   Always       -       10319
166 Min_W/E_Cycle           0x0032   100   100   ---    Old_age   Always       -       91
167 Min_Bad_Block/Die       0x0032   100   100   ---    Old_age   Always       -       0
168 Maximum_Erase_Cycle     0x0032   100   100   ---    Old_age   Always       -       186
169 Total_Bad_Block         0x0032   100   100   ---    Old_age   Always       -       1630
170 Unknown_Marvell_Attr    0x0032   100   100   ---    Old_age   Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Avg_Write/Erase_Count   0x0032   100   100   000    Old_age   Always       -       91
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       28
184 End-to-End_Error        0x0032   100   100   ---    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   066   073   000    Old_age   Always       -       34 (Min/Max 11/73)
199 SATA_CRC_Error          0x0032   100   100   ---    Old_age   Always       -       0
230 Perc_Write/Erase_Count  0x0032   100   100   000    Old_age   Always       -       7472 4628 7472
232 Perc_Avail_Resrvd_Space 0x0033   100   100   005    Pre-fail  Always       -       100
233 Total_NAND_Writes_GiB   0x0032   100   100   ---    Old_age   Always       -       94137
234 Perc_Write/Erase_Ct_BC  0x0032   100   100   000    Old_age   Always       -       161236
241 Total_Writes_GiB        0x0030   100   100   000    Old_age   Offline      -       26104
242 Total_Reads_GiB         0x0030   100   100   000    Old_age   Offline      -       17528
244 Thermal_Throttle        0x0032   000   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported


Total RAM is 32 GB. Running the Linux VM with 2 GB and one Windows VM with 8 GB is fine. Running another Windows VM with 8 GB brings the PVE to its knees. I previously experimented with smaller memory allocation (4 GB for the Windows VMs) but situation was equally worse.

I doubt that the SSDs are to blame. I had the same bad experience previously with 3 WD Red HDDs (I actually switched to SDDs hoping to overcome the problem).

VDisks are assigned to VirtIO SCSI single.

I suspect an issue with the on-board SATA controller or backplane. I experimented with better and shorter SATA cables earlier with no measurable difference.

A restore of the damaged Linux VM is running while I am writing this. The kernel log exhibits blocked tasks (intermittedly either 120 seconds or 241 seconds). Messages like these keep repeating:
Code:
[ 5196.709031] INFO: task txg_sync:470 blocked for more than 241 seconds.
[ 5196.709053]       Tainted: P           O       6.5.11-7-pve #1
[ 5196.709061] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5196.709072] task:txg_sync        state:D stack:0     pid:470   ppid:2      flags:0x00004000
[ 5196.709075] Call Trace:
[ 5196.709077]  <TASK>
[ 5196.709080]  __schedule+0x3fd/0x1450
[ 5196.709086]  schedule+0x63/0x110
[ 5196.709088]  schedule_timeout+0x95/0x170
[ 5196.709091]  ? __pfx_process_timeout+0x10/0x10
[ 5196.709096]  io_schedule_timeout+0x51/0x80
[ 5196.709099]  __cv_timedwait_common+0x140/0x180 [spl]
[ 5196.709114]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 5196.709118]  __cv_timedwait_io+0x19/0x30 [spl]
[ 5196.709128]  zio_wait+0x13a/0x2c0 [zfs]
[ 5196.709301]  dsl_pool_sync+0xce/0x4e0 [zfs]
[ 5196.709465]  spa_sync+0x57a/0x1030 [zfs]
[ 5196.709630]  ? spa_txg_history_init_io+0x120/0x130 [zfs]
[ 5196.709789]  txg_sync_thread+0x1fd/0x390 [zfs]
[ 5196.709948]  ? __pfx_txg_sync_thread+0x10/0x10 [zfs]
[ 5196.710106]  ? __pfx_thread_generic_wrapper+0x10/0x10 [spl]
[ 5196.710118]  thread_generic_wrapper+0x5f/0x70 [spl]
[ 5196.710128]  kthread+0xf2/0x120
[ 5196.710130]  ? __pfx_kthread+0x10/0x10
[ 5196.710134]  ret_from_fork+0x47/0x70
[ 5196.710136]  ? __pfx_kthread+0x10/0x10
[ 5196.710139]  ret_from_fork_asm+0x1b/0x30
[ 5196.710143]  </TASK>

I am thinking now about investing in a new board and chassis.

Best regards,
Boris
 
The 4 SSDs are SanDisk SSD Plus 1000 GB. Wearout is not reported. Here the SMART data for the fourth device (the others have similar readings with less bad blocks):
Other people on the internet also complain about those consumer drives writing slower over time. The current advice here is to get (second-hand) enterprise SSDs with PLP for ZFS.
I am thinking now about investing in a new board and chassis.
Maybe it's better and cheaper to replace those 4 drives with 2 enterprise 2TB SSDs, which will give you much better IOPS and performance?
 
  • Like
Reactions: gseeley
SanDisk Plus are (sorry) crap. I had several customers who used these drives and various strange errors occured on these systems. Stuttering, drastic performance decreases, bsods, non-bootable environments, etc. On 2 systems the bricked SSDs even completely freezed the system during bios posts. And the „funny“ thing was that on most of these drives the smart reports where ok.

Long story short: if you have two spare ssds from a different manufacturer you could try to set up a simple zfs mirror and check if performance gets back to normal.
 
  • Like
Reactions: gseeley
Thank you for your replies. I certainly cannot ignore two people blaming the SanDisk SSDs, with cwt reporting similar symptoms as mine.

What's nagging in my head is that I remember the same issue with a 3 enterprise HDDs in a RAID Z3, which led to the upgrade to 4 consumer SSDs in a RAID 10.

I took me so long to answer because I was and still am juggling upgrade paths in my mind. I want to virtualize my NAS, that runs on a second identical machine, to cut the electricity bill (TrueNAS Core). This frees up 3 WD Red HDDs to give MD/Ext4 a try instead of ZFS. This requires some more thinking as I want to put the variable data parts of the always-on home server and the NAS on a separate disk to allow the data disks to sleep most of the time to save power. But this is getting off-topic now.
 
It has been a while since the last activity in this thread. Though, I would like to conclude with what I did in the end.

I decided against virtualizing the NAS due to the dependencies it would create between storage and VMs and hosts on the local net.
I also decided against switching to enterprise SSDs with PLP for ZFS simply because of the cost (of new ones; I do not trust used SSDs).
Thus, I only recently reinstalled PVE on top of RAID10 on my existing 4 SSDs and used ext4 filesystem instead of ZFS.
The performance of storage under load is still poor but at least the several minute total standstill (post #3) has not occured ever since.

Someday, I will need to replace the now 10 year old mainboard, CPU and memory and include a pair of enterprise-grade SSDs in the setup in the 2.5 kEUR. For now I am satisfied with what I have.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!