[SOLVED] Possible slow down due tue zfs compression

Valerio Pachera

Active Member
Aug 19, 2016
131
6
38
42
Hi all, I have a strange behaviour of a windows 2016 guest.
It has warehouse managment software and some simple queries, take 4-5 minutes.
The guest is hosted on a zfs pool named 'storage' on top of ssd disks.
To diagnose the problem, I moved the disk on a nfs storage (on top of ext4 rotating disks) and the same query takes few seconds.
I moved again the disk on another zfs pool named 'storage2' on top of rotating disks.
The query is still fast!
The pool storage2 has no compression while the pool storage2 has lz4 compression enabled.
Right now, this is the only thing that may explain this weird behaviour.
What do you think about it?

Here are the zfs value of the two storage that are different

Code:
storage  used                  304G                   -
storage  available             113G                   -
storage  compressratio         1.25x                  -
storage  compression           lz4                    local
storage  usedbydataset         140K                   -
storage  usedbychildren        304G
storage  written               140K                   -
storage  logicalused           142G

storage2  used                  355G                   -
storage2  available             544G                   -
storage2  compressratio         1.00x                  -
storage2  compression           off                    default
storage2  usedbydataset         96K                    -
storage2  usedbychildren        355G                   -
storage2  written               96K                    -
storage2  logicalused           311G                   -

Proxmox 5.1-43.

[EDIT] the issue isn't related to proxmox, zfs or anything else. It's a os guest issue.
 
Last edited:
rather unlikely, but you can check with perf top whether the compression threads keep the CPU busy.
 
If you migrate your VM back to SSD from spinning disk, does the query take just a few seconds, or is it back to minutes?
 
Hi,

Do not compare nfs with zfs. Yes it could be a compression problem if your SSD are doing also some kind of internal compression. If the SSD compression algorithm is not so good you can see some delays for your queries.
Try to see if yours SSDs are ok (smart values, scrub, fstrim). I can guess that your VM was created with defaults (raw format, virtio scsi, 4k volblocksize) . For a wind VM try to use 16-32 k insted of 4k, and disable in VM any disk optimisation (scheduled defrag).
 
If you migrate your VM back to SSD from spinning disk, does the query take just a few seconds, or is it back to minutes?

Amazing, I tried to import back the virtual disk and the query is now quick.
Is there any data I can send you to compare?
Right now I have
- vm id 103 that is running in production and have the slow query issue
- mv id 199 (test vm) that is running on the same ssd storage / zfs pool and can't replicate the slow query issue.
 
Amazing, I tried to import back the virtual disk and the query is now quick.

This is happening because when you import again your VM most of data is not fragmented. But in time, your data it will increasing the data fragmentation. This process is normal for any COW fs like zfs, and it will accelerate the process if the zfs free space is small.
 
  • Like
Reactions: WhiteStarEOF
Slightly OT, but when talking about ZFS & Compression:

It seems ZSTD is comming to OpenZFS (smaller and faster compression developed by Facebook). Probably first to BSD, not sure when it comes to ZoL...
 
This is happening because when you import again your VM most of data is not fragmented. But in time, your data it will increasing the data fragmentation. This process is normal for any COW fs like zfs, and it will accelerate the process if the zfs free space is small.

I thought fragmentation shouldn't impact ssd disks.
Not I'm going to move out and back the disk of the production vm.
We'll see if the issue will show up after some time.

Any other consideration is welcome :)
 
Unbelievable, I moved the disk to the NFS storage and than back to the ZFS storage.
The issue is still there.

My test vm (the one without the issue) was restored form a backup.
It has not been possible to replicate the problem moving the disk to the ssd / zfs storage.

Note: when I moved the disk to the NFS storage, it has been converted to qcow2 format.

At this point, I believe ZFS fragmantation has nothing to do and I have no clue how to fix it.
The only way seems to backup and restore but it would be important to understand the reason.
 
Last edited:
Ok guys, now I'm 100% sure that it's not a storage related problem.
Believe it or not, this query issue doesn't show up if windows network card is disabled.
My test vm had the network card disabled (to avoid conflicts with the original vm), and that's the reason why it was working on any storage.
PS: I also tried to change guest nic from virtio to intel E1000 to avoid any relation with some virtualization "trick" but the only way is to fully disable the nic.
We'll find out why (I hope).

Sorry for bothering and thank you for the support.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!