Extremely high i/o delay

clanktron · Dec 29, 2021

I've recently setup a few lxc's on my pve node and am having terrible performance due to ridiculously high i/o delay (like a minimum of 5-10% under little to no load, spiking to 40-70% when doing pretty much anything). My containers are running on a mirrored zfs array of two 2tb hdds and I've made sure to allocate 6gb as the max ram usage for the arc cache. Not sure what the culprit is but any help is appreciated.

Screen Shot 2021-12-29 at 12.29.33 PM.png

aaron · Dec 30, 2021

What HDDs do you use exactly?

A mirror of two HDDs will not perform very great when it comes to random IOPS. What are your containers doing more? Reading or writing data?

If it is reading, then increasing the ARC size (if you have the memory available), is a good chance to reduce the IO delay during normal operations. The ARC will use up to 50% of the memory if available. If you run arcstat you check how often data reads can be served from the ARC and how often it needs to go down to the actual disks (costing performance). Check the manual page to see what each column represents: man arcstat.

If it is writing, and they have a lot of sync writes (DB for example), you could add a ZIL/SLOG device on a faster SSD (a small intel Optane for example) to store the ZIL on the fast SSD before it is written to the slower HDDs. It really does not need to be large. In my setup (luckily switched over to SSD only by now), the ZIL rarely used up more than 1GB.

clanktron · Dec 31, 2021

aaron said:
What HDDs do you use exactly?

The HDD's are repurposed ps4 drives (Seagate 2TB 5400RPM 128MB Cache), so not ideal for performance but I didn't think it would be this bad.

aaron said:
A mirror of two HDDs will not perform very great when it comes to random IOPS. What are your containers doing more? Reading or writing data?

My containers are primarily reading data, with two of them being extremely small running a vpn and dns respectively. The third is running docker with a few web servers, nothing too intensive though.

aaron said:
If it is reading, then increasing the ARC size (if you have the memory available), is a good chance to reduce the IO delay during normal operations. The ARC will use up to 50% of the memory if available. If you run arcstat you check how often data reads can be served from the ARC and how often it needs to go down to the actual disks (costing performance). Check the manual page to see what each column represents: man arcstat.

I could be interpreting my arc data wrong but it seems to be fine (seen below).

aaron said:
If it is writing, and they have a lot of sync writes (DB for example), you could add a ZIL/SLOG device on a faster SSD (a small intel Optane for example) to store the ZIL on the fast SSD before it is written to the slower HDDs. It really does not need to be large. In my setup (luckily switched over to SSD only by now), the ZIL rarely used up more than 1GB.

Despite lack of writing I added a SLOG partition to the same ssd that the main OS is running on just in case that might help, though it doesn't seem to be having any effect even after a reboot.

aaron · Dec 31, 2021

clanktron said:
The HDD's are repurposed ps4 drives (Seagate 2TB 5400RPM 128MB Cache), so not ideal for performance but I didn't think it would be this bad.

Hehe, I checked their product number on the photos, and if it really is an ST2000LM007, then they use SMR recording

Recording process Shingled magnetic Recording (SMR), Drive Managed SMR

https://geizhals.eu/seagate-mobile-hdd-2tb-st2000lm007-a1394770.html

Even if they would perform okayish enough, as soon as one fails, and you replace it with the same kind of drive, it is possible that the pool will never resilver as those SMR drives tend to be problematic when large amounts of data is being written to them, causing the kernel to consider them failed if they don't respond in time. There was quite the drama in 2020 when WD sold Red drives with SMR without telling anyone about it (not the only vendor who did that).

See https://blocksandfiles.com/2020/04/15/shingled-drives-have-non-shingled-zones-for-caching-writes/ for some background.

I recommend that you get some non SMR disks, to avoid problems in the future. Geizhals is a website that I really like because you can filter for exactly that stuff, for example: https://geizhals.eu/?cat=hde7s&xf=13745_2000~3772_2.5~8457_Conventional+Magnetic+Recording+(CMR)

clanktron · Dec 31, 2021

aaron said:
Even if they would perform okayish enough, as soon as one fails, and you replace it with the same kind of drive, it is possible that the pool will never resilver as those SMR drives tend to be problematic when large amounts of data is being written to them, causing the kernel to consider them failed if they don't respond in time. There was quite the drama in 2020 when WD sold Red drives with SMR without telling anyone about it (not the only vendor who did that).

Damn that really blows. Guess I can still use them for backups or something. Thanks for the help!

Dunuin · Dec 31, 2021

clanktron said:
Damn that really blows. Guess I can still use them for backups or something.

That depends. They are really bad at writing alot of data at once and might go down to 1MB/s or something like that. So if you want to use them for backups you might want to only backup a few GB at a time, which is most of the time not you want from a backup drive. But you could try them as a PBS datastore. Its not recommended to use HDDs for that, but atleast the backups will be incremental, so only the difference to the last backups needs to be written and this should write way less data at once than a vzdump backup.

clanktron · Dec 31, 2021

Dunuin said:
But you could try them as a PBS datastore. Its not recommended to use HDDs for that, but atleast the backups will be incremental, so only the difference to the last backups needs to be written and this should write way less data at once than a vzdump backup.

That was my intention, I already have PBS running alongside PVE. Though I just checked and the drives are luckily still within their return period so I’ll be getting reimbursed instead lol.

Dunuin · Dec 31, 2021

Thats ofcause the best option.

BTW: If you really want to have some fun with your server you should get some SSDs (enterprise SSDs would be recommended, refurbished ones will also work and won't cost more then new consumer SSDs). The more VMs/LXCs you are running the more IOPS you get and HDDs are really terrible at handling IOPS.

Search

Search

Extremely high i/o delay

clanktron

New Member

aaron

Proxmox Staff Member

clanktron

New Member

Attachments

aaron

Proxmox Staff Member

clanktron

New Member

Dunuin

Distinguished Member

clanktron

New Member

Dunuin

Distinguished Member