The 1 millionth thread about L2ARC

Nov 16, 2022
114
16
23
Sweden
Hello,

I stumbled across an Kingston DC1000B the other day, which I formatted to 4K sectors. I have no use for it since I already run mirrored SLOGs in my main PVE, but my PVE server are pretty slow, so I thought "why not".

This is some of the stats right now:
Bash:
L2ARC size (adaptive):                                         120.8 GiB
        Compressed:                                     6.7 %    8.1 GiB
        Header size:                                    0.1 %  101.0 MiB
        MFU allocated size:                            58.1 %    4.7 GiB
        MRU allocated size:                            41.8 %    3.4 GiB
        Prefetch allocated size:                        0.2 %   16.4 MiB
        Data (buffer content) allocated size:           0.0 %    0 Bytes
        Metadata (buffer content) allocated size:     100.0 %    8.1 GiB

L2ARC breakdown:                                                    1.9M
        Hit ratio:                                     49.6 %     944.4k
        Miss ratio:                                    50.4 %     958.5k
    
ARC total accesses:                                                18.3M
        Total hits:                                    89.5 %      16.4M
        Total I/O hits:                                 0.1 %      11.8k
        Total misses:                                  10.5 %       1.9M

Bash:
NAME                        PROPERTY        VALUE           SOURCE
backupstorage               secondarycache  metadata        local

And this is my setup:
Bash:
    NAME                                                    STATE     READ WRITE CKSUM
    backupstorage                                           ONLINE       0     0     0
      raidz2-0                                              ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V8redacted                 ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V8redacted                  ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V8redacted                  ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V8redacted                  ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V8redacted                 ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V8redacted                 ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V9redacted                   ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V9redacted                  ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V9redacted                   ONLINE       0     0     0
        ata-HGST_HUS726T6TALN6L4_V9redacted                   ONLINE       0     0     0
    logs
      nvme-KINGSTON_SEDC1000BM8240G_50redacted644-part1  ONLINE       0     0     0
    cache
      nvme-KINGSTON_SEDC1000BM8240G_50redacted644-part2  ONLINE       0     0     0

No I'm trying to figure out if it's better, the same, or worse. But a whole day of reading (this was very nice: https://klarasystems.com/articles/openzfs-all-about-l2arc/) didn't make me an expert.

Server has 32 GB RAM, and holds this much data:
Bash:
NAME            SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
backupstorage  54.6T  24.9T  29.6T        -         -     5%    45%  1.00x    ONLINE  -


So, what are you guys say in this? I know "buy more RAM!!" but yeah, if this is just sliiiightly better I think it's worth it. I got the NVME cheap.
 
Last edited:
  • Like
Reactions: Kingneutron
Your I/O problem is the pool design with raidz2 with just 1 vdev of 10 disks. With the definition of your l2arc cache device (which even needs ram for housekeeping) you did your I/O even worse as you loose the iops to fill and update the cache permanently for your normal pool usage. It depends on your activ data of the pool if you get a benefit or loose more than without.
If you want the best performance you should have a pool of mirrored hdd's and a special device, just maybe add a slog and even less maybe do a cache device also. But as seen if you rebuild your pool as mirror with your actual data on it would be unuseable full.
You could think about a 2-vdev raidz2 but even won't be happy with the performance even if would be double fast as today but still lacking.
You have to much data, too small disks or too less disks for a mirrored performance pool yet, sorry.
#
But the best on this thread is the title !! :cool:
 
Thanks for your pointers! Yeah, a special device would be the best option - already figured that out actually. And also split the large pool into two vdevs maybe... But I have no more room for any more disks I'm afraid, and the special device would be a crucial part of the pool, so I'd better mirror the heck out of it - but I can't since I'm out if slots to out any disks in. I've been thinking of getting a PCIe adapter with 2 M.2 slots for a special device, but then the adapter card it self would be single point...

That's also the reason why I only cache metadata on L2. I read some thread deep down in the internet L2ARC hole that it was working.

Since PBS is "just a backup" and I have an offsite PBS as well, I thought I'd play with my main backup a little bit. Also since I only have one NVME and neither SLOG or L2ARC is critical devices, well, I thought it at least was worth trying partitioning it that single device as well.

So just to get this straight, I can see like 23u latency on the NVME and like 2 ms on the rust when running iostat, but you still think that's worse than without L2ARC? Also just did a scrub, and it was reading in like 1.3 GB/s.

My problem is verification, which takes around 2 days!
 
Last edited:
> My problem is verification, which takes around 2 days!

Wait, what? Define "verification" - I have a 14x4TB DRAID (SAS disks + shelf) that takes less than 6-8 hours to scrub, and it has ~23TB allocated
 
> My problem is verification, which takes around 2 days!

Wait, what? Define "verification" - I have a 14x4TB DRAID (SAS disks + shelf) that takes less than 6-8 hours to scrub, and it has ~23TB allocated
It's PBS "Verify" that takes more than two days (48 hours) for one of the datasets. Getting a special device here would help a lot, or least that's what I've read. But I've also read that L2ARC would help with this.

Scrub the pool "only" takes one day.

Bash:
scan: scrub repaired 0B in 1 days 00:31:57 with 0 errors on Thu Dec 19 00:49:17 2024
 
Yeah, if you have lots of little files that could be part of it.

Code:
 pool: zshelf15
 state: ONLINE
  scan: scrub repaired 0B in 03:53:50 with 0 errors on Fri Oct  4 23:35:30 2024

config:
        NAME                                                   STATE     READ WRITE CKSUM
        zshelf15                                               ONLINE       0     0     0
          draid2:7d:14c:1s-0                                   ONLINE       0     0     0
            wwn-0x5000cca07321bea8                             ONLINE       0     0     
            wwn-0x5000cca07325f6b0                             ONLINE       0     0     0
            wwn-0x5000cca05d546dcc                             ONLINE       0     0     0
            wwn-0x5000cca05d54a848                             ONLINE       0     0     0
            wwn-0x5000cca03b6d92ec                             ONLINE       0     0     0
            wwn-0x5000cca03b6be528                             ONLINE       0     0     0
            wwn-0x5000cca03b6f6090                             ONLINE       0     0     0
            scsi-35000cca244360c24                             ONLINE       0     0     0
            wwn-0x5000cca25c099f8c                             ONLINE       0     0     0
            wwn-0x5000cca244389584                             ONLINE       0     0     0
            wwn-0x5000cca25d57dc60                             ONLINE       0     0     0
            wwn-0x5000cca25d555290                             ONLINE       0     0     0
            wwn-0x5000cca03b6f6c84                             ONLINE       0     0     0
            wwn-0x5000cca03b6fc490                             ONLINE       0     0     0
        special 
          mirror-1                                             ONLINE       0     0     0
            ata-THNSF8800CCSE_57FS104YTBUT-part1               ONLINE       0     0     0
            wwn-0x58ce38ee2032d0bd-part1                       ONLINE       0     0     0
        cache
          ata-Samsung_SSD_860_PRO_512GB_S5GBNS0NB01060P-part4  ONLINE       0     0     0
          wwn-0x58ce38ee2032d0bd-part2                         ONLINE       0     0     0
          ata-THNSF8800CCSE_57FS104YTBUT-part2                 ONLINE       0     0     0
        spares
          draid2-0-0                                           AVAIL   
errors: No known data errors


zpool iostat -v zshelf15
                                                         capacity     operations     bandwidth 
pool                                                   alloc   free   read  write   read  write
-----------------------------------------------------  -----  -----  -----  -----  -----  -----
zshelf15                                               25.6T  21.8T      1    565  25.2K   185M
  draid2:7d:14c:1s-0                                   25.5T  21.8T      0    509  13.1K   184M
 
  • Like
Reactions: Johannes S and UdoB
draid2:7d:14c:1s-0

I am just curious: "draid2" mean this one has two parity drives = two drives may get lost without affecting data, right?

The Special Device is crucial, if it is lost the pool is toast. It is a mirror, so your SD can only lose one drive --> it does not have the same redundancy level. Is there a reason for this, or did you just have no chance for a triple mirror?

Disclaimer: I never used draid - I am too small for this.
 
  • Like
Reactions: Johannes S
I see no reason to triple-mirror a Special device on what is essentially only a tertiary backup. It only gets turned on once a month (or two) for updates and dedup.

The mirror is already comprised of 2 different brands/models, so the odds of both failing at the same time are pretty low.

Technically my 14-disk DRAID is "too small" as well according to some, but it works fine for me. I needed free space over redundancy, and the drives are only 4TB. Just don't try to run an interactive VM off of 14 spinning disks, I learned the hard way.
 
  • Like
Reactions: Johannes S
Code:
        special
          mirror-1                                             ONLINE       0     0     0
            ata-THNSF8800CCSE_57FS104YTBUT-part1               ONLINE       0     0     0
            wwn-0x58ce38ee2032d0bd-part1                       ONLINE       0     0     0
                               25.5T  21.8T      0    509  13.1K   184M
That right there is what I would need, but don't have any room for currently. I'm on Fujitsu RX2540 M2, so no NVME for me. :/

Just installed Netdata yesterday, and well, look at this. :)

ARC
1734642205511.png

L2ARC
1734642054917.png

I can answer my own question, no need for L2ARC at all. Just add more RAM. Now it's confirmed.
 
Last edited:
  • Like
Reactions: Johannes S and UdoB

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!