Slow server all of a sudden

FlorinMarian

Well-Known Member
Nov 13, 2017
88
4
48
29
Howdy!
I have an HP server with two Intel Xeon processors and 16 drives (12 HDDs and 4 SSDs), having several ZFS pools between disks of the same type.
The problem I'm facing is that a night ago, suddenly the server ended up having a huge consumption of resources.
From the tests so far, smartctl does not display any errors and using the same benchmark script I get the iops on the disks (regardless of the pool) about twenty times lower.
Any idea is welcome.
Thanks!

Screenshot from 2022-08-30 22-46-43.png

Screenshot from 2022-08-30 22-47-10.png

Screenshot from 2022-08-30 22-46-57.png

Benchmark before issues:
Code:
Basic System Information:
---------------------------------
Uptime     : 0 days, 0 hours, 0 minutes
Processor  : Intel Core Processor (Haswell, no TSX)
CPU cores  : 2 @ 2299.998 MHz
AES-NI     : ✔ Enabled
VM-x/AMD-V : ❌ Disabled
RAM        : 3.6 GiB
Swap       : 0.0 KiB
Disk       : 500.0 GiB
Distro     : CentOS Stream 9
Kernel     : 5.14.0-0.rc7.54.el9.x86_64

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 61.49 MB/s   (15.3k) | 881.80 MB/s  (13.7k)
Write      | 61.61 MB/s   (15.4k) | 886.44 MB/s  (13.8k)
Total      | 123.10 MB/s  (30.7k) | 1.76 GB/s    (27.6k)
           |                      |
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 4.32 GB/s     (8.4k) | 5.03 GB/s     (4.9k)
Write      | 4.55 GB/s     (8.9k) | 5.37 GB/s     (5.2k)
Total      | 8.88 GB/s    (17.3k) | 10.40 GB/s   (10.1k)

iperf3 Network Speed Tests (IPv4):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed
                |                           |                 |
Clouvider       | London, UK (10G)          | 720 Mbits/sec   | 832 Mbits/sec
Online.net      | Paris, FR (10G)           | 557 Mbits/sec   | 828 Mbits/sec
Hybula          | The Netherlands (40G)     | 686 Mbits/sec   | 777 Mbits/sec
Uztelecom       | Tashkent, UZ (10G)        | 262 Mbits/sec   | 626 Mbits/sec
Clouvider       | NYC, NY, US (10G)         | 515 Mbits/sec   | 632 Mbits/sec
Clouvider       | Dallas, TX, US (10G)      | 392 Mbits/sec   | 319 Mbits/sec
Clouvider       | Los Angeles, CA, US (10G) | 338 Mbits/sec   | 476 Mbits/sec

Geekbench 5 Benchmark Test:
---------------------------------
Test            | Value
                |
Single Core     | 377
Multi Core      | 747
Full Test       | https://browser.geekbench.com/v5/cpu/16917935

Benchmark today:
Code:
Basic System Information:
---------------------------------
Uptime     : 0 days, 0 hours, 17 minutes
Processor  : Common KVM processor
CPU cores  : 2 @ 2299.998 MHz
AES-NI     : ❌ Disabled
VM-x/AMD-V : ❌ Disabled
RAM        : 1.8 GiB
Swap       : 0.0 KiB
Disk       : 40.0 GiB
Distro     : CentOS Linux 7 (Core)
Kernel     : 3.10.0-1160.71.1.el7.x86_64

fio Disk Speed Tests (Mixed R/W 50/50):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 90.94 MB/s   (22.7k) | 159.98 MB/s   (2.4k)
Write      | 91.18 MB/s   (22.7k) | 160.82 MB/s   (2.5k)
Total      | 182.13 MB/s  (45.5k) | 320.81 MB/s   (5.0k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ----
Read       | 121.49 MB/s    (237) | 108.20 MB/s    (105)
Write      | 127.94 MB/s    (249) | 115.41 MB/s    (112)
Total      | 249.43 MB/s    (486) | 223.62 MB/s    (217)

iperf3 Network Speed Tests (IPv4):
---------------------------------
Provider        | Location (Link)           | Send Speed      | Recv Speed     
                |                           |                 |               
Clouvider       | London, UK (10G)          | 650 Mbits/sec   | 886 Mbits/sec 
Online.net      | Paris, FR (10G)           | busy            | 857 Mbits/sec 
Hybula          | The Netherlands (40G)     | 836 Mbits/sec   | 882 Mbits/sec 
Uztelecom       | Tashkent, UZ (10G)        | 611 Mbits/sec   | 510 Mbits/sec 
Clouvider       | NYC, NY, US (10G)         | 658 Mbits/sec   | 294 Mbits/sec 
Clouvider       | Dallas, TX, US (10G)      | 577 Mbits/sec   | 607 Mbits/sec 
Clouvider       | Los Angeles, CA, US (10G) | 615 Mbits/sec   | 662 Mbits/sec 

Geekbench 5 Benchmark Test:
---------------------------------
Test            | Value                         
                |                               
Single Core     | 248                           
Multi Core      | 439                           
Full Test       | https://browser.geekbench.com/v5/cpu/16979355
 
having several ZFS pools between disks of the same type.
Why? So you're splitting the available ARC resources over different pools which makes understanding what goes wrong #pools times more complicated. With ZFS in such a setup, a lot can go wrong in the background with respect to performance. The lower performance benchmark looks resonable for disks, the upper ist just wrong. No way you would have roughly 9 GB/s in reading ...
 
Why? So you're splitting the available ARC resources over different pools which makes understanding what goes wrong #pools times more complicated. With ZFS in such a setup, a lot can go wrong in the background with respect to performance. The lower performance benchmark looks resonable for disks, the upper ist just wrong. No way you would have roughly 9 GB/s in reading ...
Thank you for your answer but what about just a few hundred iops with only one VM running?
 
Thank you for your answer but what about just a few hundred iops with only one VM running?
Independent of any VM: one (spinning rust) disk is capable of 120-200 IOPS (depending on the rpm) and in a RAIDz setting with one vdev, that is exaclty what you're going to see in write IOPS. That's just the way RAID works without any cache or fast slog device.

If that would be my hardware, I'd create ONE fast ZFS pool with everything you have:
- 12 disks in stripped mirror (so 6x disk write performance, space of 6 devices)
- additionally the SSDs as stripped mirror special allocation class devices for metadata and other local datasets (adding 2x SSD to the total space of the pool).

If you would have two dedicated, small Optane NVMe's you could add them as SLOG device to get fast sync write speeds.
 
Did you check when your last scrub has run? This can consume alot of ressources, only occures once per month or so and can take multiple hours or even days.
 
Last edited:
  • Like
Reactions: FlorinMarian
Did you check when your last scrub has run? This can consume alot of ressources, only occures once per month or so and can take multiple hours or even days.
Yes. This problem still persists even now.
Half of the RAM is used but disk IO performance is very low.
 
I've discovered something interesting using "iostat -d -x -m" command:
Code:
Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sdg              0.31      0.00     0.00   0.54    7.05     4.52    0.00      0.00     0.08  96.24   83.38   456.90    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.02
sdh              0.29      0.00     0.00   0.00    0.15     4.32    0.00      0.00     0.08  98.78  153.24   332.40    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.01
sdp              0.00      0.00     0.00   0.00    0.68    50.79    0.00      0.00     0.08  98.66   17.81   301.35    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.32    0.00   0.00

All three drives have high percentage of wrqm but also all three drives are unused (no partitions or lvm/zfs members).

Any idea?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!