LVM-Thin Performance Degradation after VM restores

PwrBank

Active Member
Nov 12, 2024
109
48
28
I've been tracking down an issue for a few weeks now and have come to the conclusion that it may have something to do with LVM-thin not properly trimming after a VM is deleted.

Using the same workload, you can compare LVM-thin vs ZFS vs EXT4. I'm starting to think this is an issue with the drive not being TRIM'd properly. I'm not sure why EXT4 doesn't have this issue, and maybe not Windows? I'll see if I can get a Windows OS installed on it and try to replicate this issue.

You'll notice at about the 5th or 6th run the drive drops off substantially. I'm using a 300GB file to do this test (So 5 or 6x300=1.8TB), so unless it's just a crazy coincidence, it looks like the drive isn't trimming. Which I'd imagine would result in the performance you can see below, where I'm assuming it's having to check each sector of the drive for a writable spot before writing, resulting in a 2-6x increase in write latency.

Y axis is speed in MB/s for a VM restore, X axis is the amount of times the restore has been ran sequentially. Each time the VM is deleted after a restore

This is with LVM
1751984524710.png

Now, the interesting part. I'm not sure how to run a trim on LVM without completely destroying the partitions. Maybe y'all would know more about that. Interesting part, with ZFS the issue is the same as LVM, but I'm not sure if it's because I had the drive formatted as LVM beforehand and it "poisoned" the TRIM, but I got super slow speeds on the ZFS pool, ran a manual TRIM, and bam, it's now full speed, even after writing to it 12 more times, would be over 2x the size of the drive in writes. LVM-thin will consistently tank at about the 5th or 6th 300GB write.
1751984569244.png

Another interesting point is this behavior is not replicated on EXT4, regardless if LVM was on the drive prior or not.
1751984592693.png

Hopefully that all makes sense. I feel like we're close to figuring out the issue, just not entirely sure if it's a hardware/firmware issue with the drives or if it's the file system not behaving like it should or at least isn't talking with the hardware properly.
 
  • Like
Reactions: _gabriel
Here's screenshots of the data tables, I'm not able to put them directly on here due to character count

LVM writes
1751984956756.png

EXT4 Writes
1751985027655.png

ZFS Writes with the manual trim at Seq 11
1751985192767.png
 
  • Like
Reactions: _gabriel
Tried to replicate this issue in Windows as well, unable to do so.

I ran https://panthema.net/2013/disk-filltest/ a few times and didn't see anything out of the norm.

The only odd behavior would be initially the drive is fast during writes, then slows down to 1.6GiB/s.
1752150321767.png
After about a minute the speeds would drop to this
1752150339506.png

But that seems to be consistent for this drive.
https://www.techpowerup.com/ssd-specs/sk-hynix-platinum-p41-2-tb.d587
1752150369857.png

Once DFT has gotten through the first test it clears out all of the random write data and does another pass, where the drive does stumble for a little bit but eventually recovers. I'd imagine this is the drive cleaning itself up and marking sectors as free, as expected.
1752150444612.png
It does recover shortly after this
1752150463917.png

CrystalDiskMark isn't showing anything strange either.


However, there are reports from users that this drive, which is the PC801 model, has a firmware issue similar to it's counterpart the P41.

There doesn't seem to be a firmware update available anywhere for the PC801. I tried flashing the P41 firmware to it with SK Hynix's tool and it didn't find them compatible.
1752150611890.png

This Reddit post shows some one with a similar issue:
https://www.reddit.com/r/pcmasterrace/comments/1jd7ti2/comment/mk36mec/

HP user with the same issue
https://h30434.www3.hp.com/t5/Noteb.../SK-hynix-PC801-slow-write-speed/td-p/9375322

Even tried Solidigm's firmware tool, since this drive is also branded and sold through them as well.



So, is this just an issue with LVM-thin not being a good sport and cleaning itself up proper after a large amount of write and deletes?
I can't seem to replicate this issue with EXT4, ZFS, or NTFS. The only other test I can imagine doing is seeing if XFS is effected.