Help diagnosing ZFS storage issues

magic_ren · May 25, 2024

Hey folks,

Signed up to the forums as have been having some issues lately and haven't been able to find the root cause. Have been running my Proxmox server with the same config for about the last 6 years (I do somewhat regular proxmox version updates) and these issues have only started to happen in about the last month or so. No major changes have taken place in what the server is doing, or how it's being accessed.

CPU: 2x X5670
RAM: 80GB ECC
OS Pool: 2x120GB SSD - ZFS Mirror
Storage Pool: 2x6x3TB 3.5" - ZFS RAIDZ2 (two 6 disk vdev)

Workload:
PiHole VM, Plex VM, Deluge VM etc.
Occasional random test VMs for lab purposes

The problem I am getting is very slow speed of the pool, very high iowait, VMs becoming unresponsive for hours until the iowait issue clears and whatever stuck process was causing issues finally dies off. If I try to reboot when things lock up, it doesn't help and I get console errors for a few hours until the storage calms down again. It is usually possible to mount the pool read only with no issues when this happens. The main storage pool did reach about 75% usage recently so I thought maybe it was getting full, I deleted about 5TB or data and tried to move some of the newest VM disks around to 're-write them' and this just seems to have made things worse. Seems like if I even look at the storage wrong right now, or ask it to do anything even remotely disk intensive it will lock up and can take a few hours to come good again.

No errors in zpool status. I run scrutiny to review HDD SMART data and not seeing any significant issues with the drives, so I'm not sure if it's a hardware issue or something else. There's 80GB of RAM total, and the VMs I'm running use about 24GB or less.

Anyone able to help me try and narrow down where the performance issue is originating and how I can return to some stability? Can't log into it right now to collect any more data or logs but have a few screenshots. Thanks in advance!

Example:

Example of iowait when the pool is struggling:

Example of 'normal' performance:

Thanks!

Ramalama · May 25, 2024

can you post the output of "zpool iostat -vy POOLNAME 1 1", just to see if its probably unbalanced.
"zpool list -v POOLNAME" to check fragmentation.
"zfs get all POOLNAME" for additional information.

That should be enough for the first part.

PS: If you delete 5TB of data and move VM's around etc... It can happen that it doesn't happen instantly, sometimes the space gets freed after some time actually etc...
But it should be done after max an hour, even after 10minutes actually.
However, all i want to say that it doesnt happen instantly sometimes, just in case you experimented directly afterwards...

Ramalama · May 25, 2024

Ah and you're right about the 75%, after 70% the zpools get usually utterly slow. After 90% the zpools are almost not working anymore.

But you made a lot of free space again as far i understood, so let's see simply your output and what can help based on that

magic_ren · May 25, 2024

Thanks for the info, currently it's still stuck trying to boot up after I rebooted it earlier so will log in when I can and try to collect the requested info.

magic_ren · May 25, 2024

Ramalama said:
can you post the output of "zpool iostat -vy POOLNAME 1 1", just to see if its probably unbalanced.
"zpool list -v POOLNAME" to check fragmentation.
"zfs get all POOLNAME" for additional information.

That should be enough for the first part.

PS: If you delete 5TB of data and move VM's around etc... It can happen that it doesn't happen instantly, sometimes the space gets freed after some time actually etc...
But it should be done after max an hour, even after 10minutes actually.
However, all i want to say that it doesnt happen instantly sometimes, just in case you experimented directly afterwards...

thanks for these suggestions, have checked this and you may be onto something so far.

here is the data after a fresh boot with no workload running.

Code:

# zpool iostat -vy files 1 1
                                                  capacity     operations     bandwidth
pool                                            alloc   free   read  write   read  write
----------------------------------------------  -----  -----  -----  -----  -----  -----
files                                           21.8T  10.7T      6  1.13K  27.9K  8.66M
  raidz2-0                                      15.0T  1.23T      2    581  12.0K  2.91M
    ata-ST3000DM001-1ER166_Z500GBZ6                 -      -      0     89      0   466K
    ata-ST3000VN000-1H4167_Z300SAJT                 -      -      0    105      0   538K
    ata-WDC_WD30EFRX-68AX9N0_WD-WMC1T2171151        -      -      0     99  3.98K   502K
    ata-WDC_WD3003FZEX-00Z4SA0_WD-WCC130603183      -      -      0     99  3.98K   514K
    ata-WDC_WD30EFRX-68AX9N0_WD-WMC1T2163230        -      -      0     95      0   490K
    ata-ST8000VN0022-2EL112_ZA1BJD5X                -      -      0     91  3.98K   474K
  raidz2-1                                      6.77T  9.48T      3    573  15.9K  5.75M
    ata-WDC_WD30EFRX-68EUZN0_WD-WMC4N0D9ZAEV        -      -      0    102      0  1024K
    ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N0KTSTJS        -      -      1     94  7.97K   968K
    ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N0KTSD35        -      -      0     92      0   924K
    ata-WDC_WD30EFRX-68EUZN0_WD-WCC4N1SJ317S        -      -      0     84      0   912K
    ata-WDC_WD30EFRX-68AX9N0_WD-WMC1T1687917        -      -      1     94  7.97K  1004K
    scsi-35000c5008643a2c3                          -      -      0    104      0  1.03M
----------------------------------------------  -----  -----  -----  -----  -----  -----

Code:

# zfs get all files
NAME   PROPERTY              VALUE                  SOURCE
files  type                  filesystem             -
files  creation              Wed Jan 27  2:39 2021  -
files  used                  15.6T                  -
files  available             6.33T                  -
files  referenced            14.5T                  -
files  compressratio         1.00x                  -
files  mounted               yes                    -
files  quota                 none                   default
files  reservation           none                   default
files  recordsize            128K                   default
files  mountpoint            /files                 default
files  sharenfs              rw=192.168.1.0/24      local
files  checksum              on                     default
files  compression           on                     default
files  atime                 on                     default
files  devices               on                     default
files  exec                  on                     default
files  setuid                on                     default
files  readonly              off                    default
files  zoned                 off                    default
files  snapdir               hidden                 default
files  aclmode               discard                default
files  aclinherit            restricted             default
files  createtxg             1                      -
files  canmount              on                     default
files  xattr                 on                     default
files  copies                1                      default
files  version               5                      -
files  utf8only              off                    -
files  normalization         none                   -
files  casesensitivity       sensitive              -
files  vscan                 off                    default
files  nbmand                off                    default
files  sharesmb              on                     local
files  refquota              none                   default
files  refreservation        none                   default
files  guid                  2650839464353105516    -
files  primarycache          all                    default
files  secondarycache        all                    default
files  usedbysnapshots       0B                     -
files  usedbydataset         14.5T                  -
files  usedbychildren        1.08T                  -
files  usedbyrefreservation  0B                     -
files  logbias               latency                default
files  objsetid              51                     -
files  dedup                 off                    local
files  mlslabel              none                   default
files  sync                  standard               default
files  dnodesize             legacy                 default
files  refcompressratio      1.00x                  -
files  written               14.5T                  -
files  logicalused           14.7T                  -
files  logicalreferenced     14.4T                  -
files  volmode               default                default
files  filesystem_limit      none                   default
files  snapshot_limit        none                   default
files  filesystem_count      none                   default
files  snapshot_count        none                   default
files  snapdev               hidden                 default
files  acltype               off                    default
files  context               none                   default
files  fscontext             none                   default
files  defcontext            none                   default
files  rootcontext           none                   default
files  relatime              on                     default
files  redundant_metadata    all                    default
files  overlay               on                     default
files  encryption            off                    default
files  keylocation           none                   default
files  keyformat             none                   default
files  pbkdf2iters           0                      default
files  special_small_blocks  0                      default

Ramalama · May 25, 2024

Yeah, your pool is hugely unbalanced.
The first z2 stripe is full, 92,3% usage.
The other one is almost empty compared to that xD

That comes in my opinion only from one thing, your drives.
It looks to me like the Hard-Drives in the first Stripe (the one that is full) are a lot faster.
That means, even if you rebalance the pool, you will get the same issue soon.

The only Correct way to fix that issue (for forever), is rebuilding the entire pool, otherwise you will run into the same issue.
Copy the data to something, then destroy the pool.

Then you have to measure the speed of each drive.
I made a script once for myself, actually i use that always before i create a new pool.
But use my Script with caution, check it at least once, to understand what it does etc...

However, if you know how fast your drives are, go ahead and rebuild your pool accordingly.
Try to balance out both z2-Stripes based on the speed of the disks.
Especially based on the Slowest disks.

So if you have for example 2 similar very slow disks, put one into stripe 1 and the other into the other stripe.
So that each stripe has as much as possible the same write speed especially.
Write speed is what you care about, read speed doesn't matter.

Otherwise, if you cannot do that, you can rebalance the pool somehow, but im not a guru, you can google how you can rebalance it.
Cheers

PS: I test the speeds with relatively huge Blocksizes of 1M and 2M, you can simply change that to whatever your usecase is. i think 64k and 512k is more common.
I needed simply to test for big files, since i made that script to decide how i use the disks for Proxmox-Backup-Server and PBS writes only large "chunk" files.
But your usecase is more mixed i think, so you should change it to 128k and 256k or 512k. thats the "bs=" in the dd commands.
If you use the default recordsize of ZFS, definitively test 128k and use 128k write speeds for your decision.
Ah and if you use 128k, increase the count variable to at least 10000. Otherwise the speedtests will be too unreliable.

Ramalama · May 25, 2024

I edited it for zfs defaults, 64k+128k, should be more appropriate for everyone that uses the default recordsize of zfs.
But you can compare both scripts, to see what i changed etc, to modify yourself...
Cheers

magic_ren · May 25, 2024

Thanks, I was leaning towards that conclusion a little myself but I was hoping there might be an easier way. I can delete some more replaceable data but I'm not sure how easily I can find another 12-15TB of storage to use temporarily. Is this the most likely cause of the poor performance and storage locking up? I just tried to delete some more files now and even deleting some 800gb backup images has caused things to lock up again so it's getting pretty bad.

When I first setup this pool I thought that 2x 6 drive vdev would be optimal, some drives have died over the years and been replaced, majority of the drives have over 10 years uptime, or close to that for some others.

If I was to re-create the pool in future, is there a more optimal way to configure it, rather than trying to test the individual speed of each drive and balance multiple vdev's between that? Would a single 12 drive raidz2 or raidz3 be better?

I was thinking instead, maybe 6 pairs of mirrors? As I get new drives, it's going to be easier to get them in pairs rather than in batches of 6 or 12. Open to suggestions.

Ramalama · May 25, 2024

6 Pairs of mirrors, will give you a huge speed increase in iops and speed itself + no parity calculation.
But at a huge cost of total available size.

I googled "rebalance zfs" and there was a script called inplace rebalance something on github.
Maybe you could do that.
With my "you will run into the same issue soon" sentence, im probably wrong, cause you said already that you run it for 6 years or so....
So you might need to rebalance somehow maybe once a year or 2.

Additionally, i seen that some people on google said, that zfs scrub, will rebalance the data too.
But tbh, i don't believe it, since scrub only calculates + checks existing checksums, however, you might give it a try anyway, because if it works magically indeed, that would be the worlds easyest solution.

Im not sure what your usecase is, but if you need that pool lets say only for samba, i would simply add a special vdev mirror based on nvme drives.
Then set xattr=sa, recordsize to 1M, and special small blocks to 128k (with 2tb nvme drives) or special small blocks to 64k with 1tb nvme drives.
Then change the pool to whatever gives you the most size, maybe one z2 pool of all 12 drives, since speed isn't important anymore.

With the special vdev, all your samba searches and file listenings, latency etc... will get a huge performance boost. Small files below 128k or 64k goes to your nvmes and iops speed will be no issue anymore, since the hdd will deliver only big files, where iops doesnt matter and nvmes will deliver only small files, where iops matters...

I know thats additional costs, maybe you arent even able to add another 2 ssds or nvme drives, because of no ports anymore etc...
Additionally those ssds or nvmes should be something reliable, that has a high amount of TBW. So not the cheapest, something like PM9a3 etc and not 970/980/990 Pro nvme's.

If you cannot go that route, decide yourself, a 6 pair mirror gives you a lot of speed at the cost of space. But don't forget that you are capped at network speeds.
If you have only 1GB/s to the server, then 125mb/s is anyway the maximum you can reach and an raid10 will give you around 1200mb/s (10x more speed as you can utilize over the network).

It's a hard decision, but you have to decide simply yourself and accept it and live with it. Your Stripe of 2x Z2, is not bad in any way, its a good balance of reliability/speed/space.
I think that the decision is hard, but there is actually nothing wrong with any way, nothing that you will regret afterwards. Just keep in mind afterwards why you decided that way and the alternatives at that time were worse.

PS: What some people don't know is, you don't have to stick with same drives, i usually replace died disks in my pool with bigger ones, but i keep always an eye that those bigger ones are faster too and have the same or more amount of cache.
Because most of the time, newer/bigger disks are cheaper and a little faster.
And once all the disks were replaced with bigger ones, zfs can autogrow to the new size (of the smallest disk in the pool).

Cheers

Ramalama · May 25, 2024

Actually thats not completely true with the smallest disk in the pool.
On Raid 10, zfs can grow to the size of one mirror, here is an example:

Code:

                                                      capacity     operations     bandwidth
pool                                                alloc   free   read  write   read  write
--------------------------------------------------  -----  -----  -----  -----  -----  -----
HDD-SAS                                             2.12T  52.0T      0      0      0      0
  mirror-0                                           604G  14.0T      0      0      0      0
    wwn-0x5000cca2c1848944                              -      -      0      0      0      0
    wwn-0x5000cca2cc0005b4                              -      -      0      0      0      0
  mirror-1                                           518G  12.2T      0      0      0      0
    wwn-0x5000cca25806aee0                              -      -      0      0      0      0
    wwn-0x5000cca259029b90                              -      -      0      0      0      0
  mirror-2                                           510G  12.2T      0      0      0      0
    wwn-0x5000cca25806ef60                              -      -      0      0      0      0
    wwn-0x5000cca2db04db2c                              -      -      0      0      0      0
  mirror-3                                           520G  12.2T      0      0      0      0
    wwn-0x50000399d8914779                              -      -      0      0      0      0
    wwn-0x50000399d892a6a1                              -      -      0      0      0      0
special                                                 -      -      -      -      -      -
  mirror-4                                          16.9G  1.44T      0      0      0      0
    nvme-SAMSUNG_MZPLJ1T6HBJR-00007_S55JNC0W900178      -      -      0      0      0      0
    nvme-SAMSUNG_MZPLJ1T6HBJR-00007_S55JNC0W900193      -      -      0      0      0      0
--------------------------------------------------  -----  -----  -----  -----  -----  -----
HDD_Z2                                              28.1T   118T      0      0      0      0
  raidz2-0                                          28.1T   117T      0      0      0      0
    ata-WDC_WUH722020BLE6L4_8LG7M8RA                    -      -      0      0      0      0
    ata-WDC_WUH722020BLE6L4_8LG7ER9A                    -      -      0      0      0      0
    ata-WDC_WUH722020BLE6L4_8LG7KSTA                    -      -      0      0      0      0
    ata-WDC_WUH722020BLE6L4_8LG7XKNA                    -      -      0      0      0      0
    ata-WDC_WUH722020BLE6L4_8LG64ZHE                    -      -      0      0      0      0
    ata-WDC_WUH722020BLE6L4_8LG7WHTE                    -      -      0      0      0      0
    ata-WDC_WUH722020BLE6L4_8LG7Y79A                    -      -      0      0      0      0
    ata-WDC_WUH722020BLE6L4_8LG7VRXA                    -      -      0      0      0      0
special                                                 -      -      -      -      -      -
  mirror-1                                          13.0G   241G      0      0      0      0
    nvme-eui.0025384831b022f3-part5                     -      -      0      0      0      0
    nvme-eui.0025384431414875-part5                     -      -      0      0      0      0
  mirror-2                                          12.7G   241G      0      0      0      0
    nvme-eui.00253844314008c6-part5                     -      -      0      0      0      0
    nvme-Samsung_SSD_990_PRO_2TB_S7DNNJ0WA01354R-part5  -      -      0      0      0      0
--------------------------------------------------  -----  -----  -----  -----  -----  -----

But don't take the 990 Pro Special vdev as example please, i have to replace almost every 2 moths a 990 pro xD
You see in that example, that in HDD-SAS pool, one mirror is bigger as the others, thats what i mean as example.
Cheers

Kingneutron · May 25, 2024

> majority of the drives have over 10 years uptime, or close to that for some others

Dude - you're literally running on fumes. Most drives last around 5 years. Your whole setup could fall over at any moment with multiple drives dying at once.

> I'm not sure how easily I can find another 12-15TB of storage to use temporarily

If you have the budget, that should all fit on 1x18TB drive. You should be budgeting to replace all the drives in your pool at this point if they're well after the warranty period expired. And it looks like you built it on mixed-manufacturer 3TB drives - I'm flat out amazed it's lasted this long, as they were never known for reliability.

For a budget rebuild, what I would recommend (since I did this myself) is buy used 4TB SAS drives on ebay if you have a disk shelf. Only got 2 bad drives out of 14. You can get a SAS/SATA 15-bay disk shelf for ~$300 these days (plus a bit for shipping.)

https://www.ebay.com/itm/194192688167?epid=18010651151&hash=item2d36c94427:g:7p8AAOSw98lggXrI

Kingneutron · May 25, 2024

> But don't take the 990 Pro Special vdev as example please, i have to replace almost every 2 moths a 990 pro

Every 2 months?? If I were you I would stop buying that model and go with something else, like used Enterprise SSD.
But you should also check if you have the latest firmware for existing ones, there were problems reported with those.

Ramalama · May 25, 2024

Kingneutron said:
> But don't take the 990 Pro Special vdev as example please, i have to replace almost every 2 moths a 990 pro

Every 2 months?? If I were you I would stop buying that model and go with something else, like used Enterprise SSD.
But you should also check if you have the latest firmware for existing ones, there were problems reported with those.

The 990 Pro's Firmware updates don't help, they help only Samsung for Marketing reasons...
970 and 980 are somewhat okay, but 990 Pro is pure crap, no firmware will fix that crap xD

And yeah i will replace them at some point, just that HDD-Z2 Pool is just a movie storage pool that is anyway not important + some backups and series. So basically i dont care. The 990 Pro's are still in warranty, i have 6 of them and need only 4. So ill let Samsung bleed and send their crap back to replace them xD
I think 2 of those 990 Pros (out of probably 10 now) reached 100TB of written Data, all others died already before with no indication from smartctl.

However long story short, i know what im doing, cheers.

magic_ren · May 25, 2024

Kingneutron said:
> majority of the drives have over 10 years uptime, or close to that for some others

Dude - you're literally running on fumes. Most drives last around 5 years. Your whole setup could fall over at any moment with multiple drives dying at once.

> I'm not sure how easily I can find another 12-15TB of storage to use temporarily

If you have the budget, that should all fit on 1x18TB drive. You should be budgeting to replace all the drives in your pool at this point if they're well after the warranty period expired. And it looks like you built it on mixed-manufacturer 3TB drives - I'm flat out amazed it's lasted this long, as they were never known for reliability.

For a budget rebuild, what I would recommend (since I did this myself) is buy used 4TB SAS drives on ebay if you have a disk shelf. Only got 2 bad drives out of 14. You can get a SAS/SATA 15-bay disk shelf for ~$300 these days (plus a bit for shipping.)

https://www.ebay.com/itm/194192688167?epid=18010651151&hash=item2d36c94427:g:7p8AAOSw98lggXrI

thanks, if you think this is bad, you should see my other pool of 7x 2TB wd greens in raidz1 in a HP N40L that's about 14 years old and still going strong.

I do feel like these performance issues are more than just the pool out of balance or getting full, I feel like maybe there's a disk that's not responding properly and not giving errors either, and dragging down performance.

I've had a few drives die over the years, I'm in IT so I grab old drives from server decoms etc where I can get them for free and keep a stack of spares on hand, I replace maybe one drive every 18 months or so. I find that usually if a drive makes it past 2-3 years, it will pretty much last forever especially if it's left running 24x7 and not touched too much. most of my drives look something like this, there's a couple that are a bit newer like 5-6 years

I know it's a risk but I have backups of stuff I care about.

my main pool (the one having issues) is running in an old dell r510 so if I can find some decent SAS drives that's an option.

magic_ren · May 25, 2024

magic_ren said:
I do feel like these performance issues are more than just the pool out of balance or getting full, I feel like maybe there's a disk that's not responding properly and not giving errors either, and dragging down performance.

my old n40l hasn't been on for a while, turned it on tonight to see how much free space it has, it's over 80% full but still capable of ~240MB/s while resilvering a disk.

the pool on the left locks up for hours at a time and becomes totally unresponsive doing the most simple tasks like trying to copy or move a 100gb file, the iowait is constantly 15-50%, there's no VMs running and nothing happening on the server at all currently. the pool on the right is working perfectly even with 1 disk down and a scrub/resilver in progress, iowait is only about 2%.

i'm not sure this poor performance can be attributes purely to the pool being full and unbalanced. if my disks aren't reporting any smart errors, how can I try to diagnose a problem disk or two dragging down the overall performance of the pool?

Ramalama · May 25, 2024

From earlier research, most drives either die early, or don't die, because they were replaced already.
There is no rule that says that the drives will die after 5 or 10 years of usage, just indicators that out of 5k disks of the same model, most are diying in the first 6 months, or start diying after 5-6 years.
But this indicators doesn't tell you that your drive will die after 5-6 years...

For that we would need a statistic where lets say 500 drives of the same model are all died. Then get a graph of the years when they died. But as long as only 300 out of 500 drives died and not all, there is no use in those graphs and you cant make a rule of thumb out of that.
And such statistics dont exists, because no one was able to wait long enough till all died.

That means in short, if the drives are already 5+ years running and they survived already everything, there is no indication that the drives will fail anytime soon, especially with CMR Disks.
With HAMR or anything that needs to be heated with lasers before they can be written, that will change tho, since the cells will degrade each heating cycle.

But i can imagine, that old drives get simply slower with time, as it gets harder to write data to them, has something todo with remagnetizing physics, whatever blah... need to google...
But im sure i readed something about that either, even with normal CMR Drives.
But there is no real proven data available about that either, its just possible.

However back to topic:
@magic_ren there is maybe an easier way to identificate the slow drive:
apt install sysstat
iostat -dx 2

And do some intensive write tasks, then you're maybe able to find out the bad drive, monitor utilization and write speeds especially.

But try to delete at least some more data, to get that one z2 stripe at least down to 80%. Or try to move around and check if it rebalaces maybe with zpool iostat command.

Cheers

magic_ren · May 25, 2024

Thanks, I've been trying to delete some more data and move stuff around but even trying to move a single 100gb file seems to lock up the pool right now and trigger massive iowait for hours. Unfortunately I can't get enough load on it to really test anything with iostat.

I did find a few rebalance type scripts I'd like to try too but not confident that much change won't crash the whole pool right now with how sensitive it is right now.

Also I deleted about another 2-3tb earlier and didn't get any more space back yet. Going to leave it alone for a day and see if it 'finishes' doing something.

Ramalama · May 25, 2024

magic_ren said:
Thanks, I've been trying to delete some more data and move stuff around but even trying to move a single 100gb file seems to lock up the pool right now and trigger massive iowait for hours. Unfortunately I can't get enough load on it to really test anything with iostat.

I did find a few rebalance type scripts I'd like to try too but not confident that much change won't crash the whole pool right now with how sensitive it is right now.

Also I deleted about another 2-3tb earlier and didn't get any more space back yet. Going to leave it alone for a day and see if it 'finishes' doing something.

If its one drive that causes such issues, maybe you get lucky and zfs pauses the pool with an warning, or you get a read error or sth. The thing is, it makes absolutely sense that it is utterly slow as shit, but it shouldn't lockup for hours, thats not normal.
On another side, i readed that pools lock and get unusable after 90%, just thought that people overreact, but according to you it seems to be really the case.

rebooting etc doesn't help? or do you fear, that the pool wont get imported at boot?
Another thing that makes absolutely no sense, is that reading from that pool locks up your drives too. I cant imagine tbh, why reading should be a problem with a full pool.
Writing is definitively an issue on any COW Filesystem, but reading shouldnt be affected, at least in my zfs understanding.

However, i wish you best luck, hopefully you don't loose that data, if something weird happens that we didn't thought of.

Ill Provide better some other scripts i made. (They are all safe to run, as that are only info scripts from smartctl)
get_disk_errors -> Shows/Counts errors that are logged on the disk firmware itself. Thats the best indicator to see if one disk maybe fails.
get sas/sata info -> That shows only physical/logical blocksize of the disk, its almost useless for you, but i use that often to check if i can reformat some disks to 4k logical size with sg_format or with hdparm. But not all disks support that and there are no performance gains, because the disks write 8x512byte blocks at the same time/speed as one 4k block. It makes only sense on nvme to change the blocksize to gain more speed.
However, i needed that for a usecase, no one else needs.

But get_disk_errors you should definitively run, it may help you. (Hopefully)
0 Errors is good, everything else is not so good xD
Cheers

PS: They work only if your environment is english (default), not if your cli and smartctl outputs the content in german language or any other xD

magic_ren · May 27, 2024

thanks again, just another update, I left it alone overnight and didn't touch it all day either, I think it has finally finished doing whatever it was doing since then. probably took about 18 hours but the space from my deletions has now been released, unfortunately for some reason everything that's remaining seems to be on the first vdev and the second one now only has 3TB allocated to it. I tried to move a 100GB VM disk and that took 58 minutes and the storage was pretty unresponsive the whole time so it's really still struggling a bit. The first vdev is still about 85% full so I need to try and free up some more space, if I can get it below 75-80% through deletion then I might feel more comfortable to try some rebalancing script on a few other folders and see if that also helps. will keep this thread updated with progress. worst case I have found some extra drives I can use to copy all the data I want to keep off and back onto the pool if I re-create it.

gfngfn256 · May 27, 2024

I'm going to go out of the box here - not that I disagree with most of the excellent advice given above - but what stands out to me is, and I quote your original post:

Have been running my Proxmox server with the same config for about the last 6 years (I do somewhat regular proxmox version updates) and these issues have only started to happen in about the last month or so.

I find it interesting that till a month ago you saw NO problems. I would imagine that the problems/suggestions above should have built up over time - & you should have seen at least some form of gradual degradation. (I very much know that I may be wrong with this assessment - but its my hunch).

However, I would lean to the kernel change to 6.8 (which I imagine you updated to about a month ago) which has proven to have its fair share of problems. You should search these forums on that subject - and maybe try pinning to a previous kernel to see if that fixes your issues.

PLEASE ignore this whole post if either you are not on the above kernel or you know for a fact that I'm wrong with your specific HW/setup etc.

Help diagnosing ZFS storage issues

New Member

Renowned Member

Renowned Member

New Member

New Member

Renowned Member

Attachments

Renowned Member

Attachments

New Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

New Member

New Member

Renowned Member

New Member

Renowned Member

Attachments

New Member

Distinguished Member

We value your privacy