Using HDD pool slows down SSD pool on the same server

mailinglists

Renowned Member
Mar 14, 2012
641
69
93
Hi guys,

a few days ago, I did backup import onto HDD pool on the same server that also has SSD pool.
After a few minutes all guest VMs (they run only on SSD pool) started reporting problems with hung tasks, etc and services on them stopped working.
Host has had high IOWait and low CPU usage.
I canceled the import and after a few minutes, everything went back to normal.
(Graphs which I can attach later, show read write speeds of VMs at 40 to 80 Gigabytes / s during that time. .. :-) )
Most of the writing was done by some [vma] process.

The storage system is not under any load at any time and iowait is under 1 most of the time.
To be frank, there is a NVMe device shared as a special mirror for HDDs and slog for SSDs, but it is plenty fast.
Source of the backup can do 100 - 200 Mbps, over 10Gb line.
Disks are on SATA ports on Supermicro board.

Why would I experience the slowdown of all SSD backed VMs when using HDD pool?

Code:
  pool: hddpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 01:46:41 with 0 errors on Sun Jan 10 02:10:42 2021
remove: Removal of vdev 2 copied 180K in 0h0m, completed on Mon Oct 26 23:46:17 2020
    192 memory used for removed device mappings
config:

        NAME                                                   STATE     READ WRITE CKSUM
        hddpool                                                ONLINE       0     0     0
          mirror-0                                             ONLINE       0     0     0
            scsi-35000c5008455beff                             ONLINE       0     0     0
            scsi-35000c50084560403                             ONLINE       0     0     0
        special
          mirror-3                                             ONLINE       0     0     0
            nvme-INTEL_SSDPE21K100GA_PHKE831500DJ100EGN-part4  ONLINE       0     0     0
            ata-INTEL_SSDSC2BA400G4_BTHV513606WT400NGN-part1   ONLINE       0     0     0
        logs
          nvme-INTEL_SSDPE21K100GA_PHKE831500DJ100EGN-part2    ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 01:42:59 with 0 errors on Sun Jan 10 02:07:02 2021
config:

        NAME                                                     STATE     READ WRITE CKSUM
        rpool                                                    ONLINE       0     0     0
          mirror-0                                               ONLINE       0     0     0
            ata-SAMSUNG_MZ7LM960HCHP-000MV_S2LBNXAH324668-part3  ONLINE       0     0     0
            ata-SAMSUNG_MZ7LM960HCHP-000MV_S2LBNXAH324365-part3  ONLINE       0     0     0
          mirror-1                                               ONLINE       0     0     0
            ata-SAMSUNG_MZ7LM960HCHP-000MV_S2LBNXAH324419        ONLINE       0     0     0
            ata-SAMSUNG_MZ7LM960HCHP-000MV_S2LBNXAH324096        ONLINE       0     0     0
          mirror-2                                               ONLINE       0     0     0
            ata-SAMSUNG_MZ7LM960HCHP-000MV_S2LBNXAH324415        ONLINE       0     0     0
            ata-SAMSUNG_MZ7LM960HCHP-000MV_S2LBNXAH324414        ONLINE       0     0     0
        logs
          nvme-INTEL_SSDPE21K100GA_PHKE831500DJ100EGN-part1      ONLINE       0     0     0

errors: No known data errors
 
For an actual server storage controller, it can be an issue with command queue depths being filled with commands waiting on the slower HDDs. I've even seen decent server storage controllers choke a bit when there are SSDs and HDDs connected and the HDDs are loaded with a large write task.

However seeing as this is ZFS, there could be some sort of other queue depth issue or DB that is getting filled with transactions while the SSDs wait. My ZFS experience is rather limited since I have mostly worked in Ceph for VM storage.

EDIT: Out of curiosity, what are those HDDs? I sure hope they aren't SMR or that's the source of all of your storage problems.
 
Last edited:
I have read recently that special device+SLOG (and/or L2ARC) can create side effects.

I also have an issue where one pool pulls down other pools by going into suspended mode:
https://forum.proxmox.com/threads/s...on-node-even-if-no-vdisks-are-affected.70213/

Have not found the reason for it yet. I guess it is some corner-case like your situation.
I'd try removing the slog from the HDD pool and do the import again. If it still behaves the same - i think its not the slog per se...
 
For an actual server storage controller, it can be an issue with command queue depths being filled with commands waiting on the slower HDDs. I've even seen decent server storage controllers choke a bit when there are SSDs and HDDs connected and the HDDs are loaded with a large write task.

However seeing as this is ZFS, there could be some sort of other queue depth issue or DB that is getting filled with transactions while the SSDs wait. My ZFS experience is rather limited since I have mostly worked in Ceph for VM storage.

EDIT: Out of curiosity, what are those HDDs? I sure hope they aren't SMR or that's the source of all of your storage problems.
I have the same assumption, but have no idea how to monitor queue depth on hardware (SATA/SAS) controller. I might be able to monitor and adjust queue depth on ZFS, as I remember vaguely such options.

They are Seagate Enterprise Capacity 3.5 HDD 6TB 7200RPM 12Gb/s SAS 128 MB Cache Internal Bare Drive ST6000NM0034 .
 
I have read recently that special device+SLOG (and/or L2ARC) can create side effects.

I also have an issue where one pool pulls down other pools by going into suspended mode:
https://forum.proxmox.com/threads/s...on-node-even-if-no-vdisks-are-affected.70213/

Have not found the reason for it yet. I guess it is some corner-case like your situation.
I'd try removing the slog from the HDD pool and do the import again. If it still behaves the same - i think its not the slog per se...
I agree I could do some more testing, but do not have the time a.t.m. nor I wish to play on production cluster.
I will just pull these HDDs out and create HDD only nodes. When have the time, setup another node for testing this scenario.

I also think I know what your problem is and will reply there shortly.
 
For an actual server storage controller, it can be an issue with command queue depths being filled with commands waiting on the slower HDDs.
That is true.
But IMHO it is mostly tied to one controller and its backplane. Depending on systems you only might have one so it pulls everything down.
What's the situation here actually?
How are things connected controller wise?
Nvme is dedicated, sure - but what about the rest?

Are SAS HDDs (actually they are SATA if we talk 7200RPM) running on the same controller/bus as the ssds do? They are SATA right?
Had issues in the past with this. Especially when expanders have been in place...
 
Last edited:
The fact that you have a SLOG, does not mean it is used. It is only used for O_DIRECT IIRC.

So, even though you have a NVME-backed HDD-pool, that is not always used, and thus you are limited to the speed of the HDD's.
 
Tuxis, tnx for the info.
I know SLOG is used only for sync writes, so it really should not be a connecting point that slows down SSDs also.

tburger, tnx for the ideas.
I have on SAS controller:
Code:
02:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2308 PCI-Express Fusion-MPT SAS-2 (rev 05)
        Subsystem: Dell SAS2308 PCI-Express Fusion-MPT SAS-2
        Kernel driver in use: mpt3sas
        Kernel modules: mpt3sas
All drives are in hot swap trays (Dell server) connected to before mentioned SAS controller.
NVMe is on it's own PCIe.
HDD disks are SAS, but SSDs are SATA. :-)
1611758206029.png