Unexpected drop in disk subsystem performance

docent

Renowned Member
Jul 23, 2009
96
1
73
Hi!
My PVE 6.2 worked a few months without any problems. But three days ago the latency of the disk subsystem increased dramatically.
PVE is installed on HPE DL380Gen8 with RAID6 on 8 SSD 2TB.
No action was taken on the server when the problem started.
There are no suspicious messages in the logs.
Code:
root@vmc1-3:~# pveversion
pve-manager/6.2-12/b287dd27 (running kernel: 5.4.65-1-pve)
root@vmc1-3:~# df -h /pool1
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb         11T  6.8T  3.4T  67% /pool1
root@vmc1-3:~# mount | grep pool1
/dev/sdb on /pool1 type ext4 (rw,relatime,stripe=384)
fio on this storage freezes
Code:
root@vmc1-3:~# ps axfu | grep D | grep -v kvm
    PID TTY      STAT   TIME COMMAND
    882 ?        D      0:07 [jbd2/sdb-8]
1963833 ?        D      0:01 [kworker/u129:0+flush-8:16]
1976190 pts/1    D+     0:13 fio --ioengine=libaio --filename=testfile --size=9G --direct=1 --sync=1 --rw=randread --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio
There are no any errors on hardware.
How can I diagnose this problem?
 

Attachments

  • CPU_Usage_Week.png
    CPU_Usage_Week.png
    29.2 KB · Views: 8
  • Server_Load_Week.png
    Server_Load_Week.png
    21.3 KB · Views: 8
  • fio test.png
    fio test.png
    14.6 KB · Views: 8
What type oft ssds do you have and how old are they?
Steady State performance ist often much much less than peak performance.
Sometimes that starts to happen if the controllers don't support trim and the ssds have been written once.
The read modify write has a write amplifier and this can start being noticeable. Always depending on the workload...
 
What type oft ssds do you have and how old are they?
New Samsung SSD 860 EVO on P420i FW v8.32
Smart Array P420i in Slot 0 (Embedded)
Bus Interface: PCI
Slot: 0
Serial Number: 001438022762B90
Cache Serial Number: PBKUC0BRH7A8JT
RAID 6 (ADG) Status: Enabled
Controller Status: OK
Hardware Revision: B
Firmware Version: 8.32-0
Rebuild Priority: Low
Expand Priority: Medium
Surface Scan Delay: 3 secs
Surface Scan Mode: Idle
Parallel Surface Scan Supported: No
Queue Depth: Automatic
Monitor and Performance Delay: 60 min
Elevator Sort: Enabled
Degraded Performance Optimization: Disabled
Inconsistency Repair Policy: Disabled
Wait for Cache Room: Disabled
Surface Analysis Inconsistency Notification: Disabled
Post Prompt Timeout: 15 secs
Cache Board Present: True
Cache Status: OK
Cache Ratio: 10% Read / 90% Write
Drive Write Cache: Disabled
Total Cache Size: 1.0
Total Cache Memory Available: 0.8
No-Battery Write Cache: Disabled
SSD Caching RAID5 WriteBack Enabled: False
SSD Caching Version: 1
Cache Backup Power Source: Capacitors
Battery/Capacitor Count: 1
Battery/Capacitor Status: OK
SATA NCQ Supported: True
Spare Activation Mode: Activate on physical drive failure (default)
Controller Temperature (C): 71
Cache Module Temperature (C): 29
Capacitor Temperature (C): 19
Number of Ports: 2 Internal only
Encryption: Not Set
Driver Name: hpsa
Driver Version: 3.4.20
Driver Supports SSD Smart Path: True
PCI Address (Domain:Bus:Device.Function): 0000:02:00.0
Port Max Phy Rate Limiting Supported: False
Host Serial Number: USE312YXA3
Sanitize Erase Supported: False
Primary Boot Volume: None
Secondary Boot Volume: None



Internal Drive Cage at Port 1I, Box 2, OK

Power Supply Status: Not Redundant
Drive Bays: 4
Port: 1I
Box: 2
Location: Internal

Physical Drives
physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SATA SSD, 2 TB, OK)
physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SATA SSD, 2 TB, OK)
physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SATA SSD, 2 TB, OK)
physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SATA SSD, 2 TB, OK)



Internal Drive Cage at Port 2I, Box 2, OK

Power Supply Status: Not Redundant
Drive Bays: 4
Port: 2I
Box: 2
Location: Internal

Physical Drives
physicaldrive 2I:2:5 (port 2I:box 2:bay 5, SATA SSD, 2 TB, OK)
physicaldrive 2I:2:6 (port 2I:box 2:bay 6, SATA SSD, 2 TB, OK)
physicaldrive 2I:2:7 (port 2I:box 2:bay 7, SATA SSD, 2 TB, OK)
physicaldrive 2I:2:8 (port 2I:box 2:bay 8, SATA SSD, 2 TB, OK)


Port Name: 1I
Port ID: 0
Port Connection Number: 0
SAS Address: 5001438022762B90
Port Location: Internal

Port Name: 2I
Port ID: 1
Port Connection Number: 1
SAS Address: 5001438022762B94
Port Location: Internal

Array: A
Interface Type: Solid State SATA
Unused Space: 2 MB (0.00%)
Used Space: 14.55 TB (100.00%)
Status: OK
Array Type: Data
Smart Path: disable


Logical Drive: 1
Size: 150.00 GB
Fault Tolerance: 1+0
Heads: 255
Sectors Per Track: 32
Cylinders: 38550
Strip Size: 256 KB
Full Stripe Size: 1024 KB
Status: OK
Unrecoverable Media Errors: None
Caching: Enabled
Unique Identifier: 600508B1001C1EC87E6343288115D2CE
Disk Name: /dev/sda
Mount Points: None
Logical Drive Label: A0706F33001438022762B9086D5
Mirror Group 1:
physicaldrive 1I:2:1 (port 1I:box 2:bay 1, SATA SSD, 2 TB, OK)
physicaldrive 1I:2:2 (port 1I:box 2:bay 2, SATA SSD, 2 TB, OK)
physicaldrive 1I:2:3 (port 1I:box 2:bay 3, SATA SSD, 2 TB, OK)
physicaldrive 1I:2:4 (port 1I:box 2:bay 4, SATA SSD, 2 TB, OK)
Mirror Group 2:
physicaldrive 2I:2:5 (port 2I:box 2:bay 5, SATA SSD, 2 TB, OK)
physicaldrive 2I:2:6 (port 2I:box 2:bay 6, SATA SSD, 2 TB, OK)
physicaldrive 2I:2:7 (port 2I:box 2:bay 7, SATA SSD, 2 TB, OK)
physicaldrive 2I:2:8 (port 2I:box 2:bay 8, SATA SSD, 2 TB, OK)
Drive Type: Data
LD Acceleration Method: Controller Cache

Logical Drive: 2
Size: 10.70 TB
Fault Tolerance: 6
Heads: 255
Sectors Per Track: 32
Cylinders: 65535
Strip Size: 256 KB
Full Stripe Size: 1536 KB
Status: OK
Unrecoverable Media Errors: None
Caching: Enabled
Parity Initialization Status: Initialization Completed
Unique Identifier: 600508B1001C7C1E163F142690C7B100
Disk Name: /dev/sdb
Mount Points: /pool1 10.7 TB Partition Number 0
Logical Drive Label: A0706F54001438022762B90F3E3
Drive Type: Data
LD Acceleration Method: Controller Cache


physicaldrive 1I:2:1
Port: 1I
Box: 2
Bay: 1
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: RVQ02B6Q
Serial Number: S4CYNF0N101780P
WWID: 3001438022762B80
Model: ATA Samsung SSD 860
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 21
Maximum Temperature (C): 43
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:2:2
Port: 1I
Box: 2
Bay: 2
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: RVQ02B6Q
Serial Number: S4CYNF0N101770M
WWID: 3001438022762B81
Model: ATA Samsung SSD 860
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 20
Maximum Temperature (C): 43
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:2:3
Port: 1I
Box: 2
Bay: 3
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: RVQ02B6Q
Serial Number: S4CYNF0N101775F
WWID: 3001438022762B82
Model: ATA Samsung SSD 860
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 20
Maximum Temperature (C): 43
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 1I:2:4
Port: 1I
Box: 2
Bay: 4
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: RVQ02B6Q
Serial Number: S4CYNF0N101784L
WWID: 3001438022762B83
Model: ATA Samsung SSD 860
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 21
Maximum Temperature (C): 43
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 2I:2:5
Port: 2I
Box: 2
Bay: 5
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: RVQ02B6Q
Serial Number: S4CYNF0N101769H
WWID: 3001438022762B84
Model: ATA Samsung SSD 860
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 19
Maximum Temperature (C): 42
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 2I:2:6
Port: 2I
Box: 2
Bay: 6
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: RVQ02B6Q
Serial Number: S4CYNF0N101772K
WWID: 3001438022762B85
Model: ATA Samsung SSD 860
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 27
Maximum Temperature (C): 42
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: OK
Carrier Application Version: 11
Carrier Bootloader Version: 6
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 2I:2:7
Port: 2I
Box: 2
Bay: 7
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: RVQ02B6Q
Serial Number: S4CYNF0N101783M
WWID: 3001438022762B86
Model: ATA Samsung SSD 860
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 20
Maximum Temperature (C): 42
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None

physicaldrive 2I:2:8
Port: 2I
Box: 2
Bay: 8
Status: OK
Drive Type: Data Drive
Interface Type: Solid State SATA
Size: 2 TB
Drive exposed to OS: False
Logical/Physical Block Size: 512/512
Firmware Revision: RVQ02B6Q
Serial Number: S4CYNF0N101776J
WWID: 3001438022762B87
Model: ATA Samsung SSD 860
SATA NCQ Capable: True
SATA NCQ Enabled: True
Current Temperature (C): 20
Maximum Temperature (C): 43
SSD Smart Trip Wearout: Not Supported
PHY Count: 1
PHY Transfer Rate: 6.0Gbps
Drive Authentication Status: Not Authenticated. Smart Array will not control drive LEDs.
Sanitize Erase Supported: False
Shingled Magnetic Recording Support: None


SEP (Vendor ID PMCSIERA, Model SRCv8x6G) 380
Device Number: 380
Firmware Version: RevB
WWID: 5001438022762B9F
Vendor ID: PMCSIERA
Model: SRCv8x6G
Steady State performance ist often much much less than peak performance.
Sometimes that starts to happen if the controllers don't support trim and the ssds have been written once.
It looks like it is.
I'll try to research in this direction.
Thanks for the tip.
 
Jeah, 860 Evo isnt exactly what I would expect to provide predictable performance. Especially not behind a raid controller.
I know its tempting to use these kind of drives but in the end that is "cheap junk" produced for home usage, not to be placed in servers etc.
You could try using them on a jbod controller with software raid (lvm, mdadm) or zfs (but even zfs is not optimal due to its architecture and write amplification. However by this approach the disks can expose trim which is currently likely masked away by the raid controller.
 
I found the cause.
I switched P420i to HBA mode and tested all the disks.
it turned out that three of them have been degraded.
Code:
iops        : min=  376, max=  474, avg=461.68
iops        : min=  380, max=  478, avg=460.37
iops        : min=  370, max=  472, avg=459.93
iops        : min=   26, max=  340, avg=192.43
iops        : min=   34, max=  340, avg=187.63
iops        : min=    8, max=  228, avg=22.35
iops        : min=  370, max=  478, avg=459.92
iops        : min=  374, max=  470, avg=460.29
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!