Windows VM Trimming

infinityM

Active Member
Dec 7, 2019
179
1
38
31
Hey Guys,

I have scoured through the forum to try and answer my question with no success...
I have 3 PM nodes running windows VM's with the storage being ZFS backed, with thin provision enabled.

I noticed the 1 ZFS array reporting 17TB used, but in the VM there's 20TB storage assigned, but only 9TB used.
I've run the optimize and the quota is reporting the same? Am I missing something?
i've read that I need the discard and ssd flag set (yes the storage is ssd's), which I have also done.

I have the guest agent drivers installed, I am using scsi for the storage...What am I missing?
Is there some golden step that non of the forums mentioned?

All the nodes are up to date, and the windows servers are server 2019 (also up to date).
 
Maybe you are using a raidz1/2/3 with default volblocksize and it is "padding overhead"?
Output of zpool list -v and zfs list -o space would give some hints.
 
Maybe you are using a raidz1/2/3 with default volblocksize and it is "padding overhead"?
Output of zpool list -v and zfs list -o space would give some hints.
I have added the output below, Unsure if what to look for in this regard @Dunadan?
Code:
root@pm2:~# zpool list -v
NAME                                   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool                                  135G  3.08G   132G        -         -     1%     2%  1.00x    ONLINE  -
  mirror-0                             135G  3.08G   132G        -         -     1%  2.28%      -    ONLINE
    scsi-35000c50047464a23-part3          -      -      -        -         -      -      -      -    ONLINE
    scsi-35000cca00a7ba57c-part3          -      -      -        -         -      -      -      -    ONLINE
zStorage                              25.5T  18.6T  6.83T        -         -    10%    73%  1.00x    ONLINE  -
  raidz1-0                            25.5T  18.6T  6.83T        -         -    10%  73.2%      -    ONLINE
    ata-CT4000MX500SSD1_2311E6BC3520      -      -      -        -         -      -      -      -    ONLINE
    ata-CT4000MX500SSD1_2320E6D56F22      -      -      -        -         -      -      -      -    ONLINE
    ata-CT4000MX500SSD1_2320E6D876BE      -      -      -        -         -      -      -      -    ONLINE
    ata-CT4000MX500SSD1_2311E6BA7071      -      -      -        -         -      -      -      -    ONLINE
    ata-CT4000MX500SSD1_2351E88AAA46      -      -      -        -         -      -      -      -    ONLINE
    ata-CT4000MX500SSD1_2351E88AED4A      -      -      -        -         -      -      -      -    ONLINE
    ata-CT4000MX500SSD1_2351E88AA970      -      -      -        -         -      -      -      -    ONLINE
root@pm2:~# zfs list -o space
NAME                      AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
rpool                      128G  3.08G        0B    104K             0B      3.08G
rpool/ROOT                 128G  3.04G        0B     96K             0B      3.04G
rpool/ROOT/pve-1           128G  3.04G      190M   2.85G             0B         0B
rpool/data                 128G    96K        0B     96K             0B         0B
zStorage                  5.63T  15.7T        0B    162K             0B      15.7T
zStorage/vm-10000-disk-0  5.63T   271G     40.8G    231G             0B         0B
zStorage/vm-10000-disk-1  5.63T  28.5G     22.3G   6.19G             0B         0B
zStorage/vm-10000-disk-2  5.63T  1.04T     6.46G   1.04T             0B         0B
zStorage/vm-10000-disk-3  5.63T  2.28T     11.6G   2.27T             0B         0B
zStorage/vm-10000-disk-4  5.63T  2.90T     14.8G   2.89T             0B         0B
zStorage/vm-10000-disk-5  5.63T  3.03T     13.6G   3.01T             0B         0B
zStorage/vm-10000-disk-6  5.63T  2.03T     15.0G   2.01T             0B         0B
zStorage/vm-10000-disk-7  5.63T  2.02T     6.54G   2.02T             0B         0B
zStorage/vm-10000-disk-8  5.63T  2.09T     32.3G   2.06T             0B         0B
zStorage/vm-10000-disk-9  5.63T   922M        0B    922M             0B         0B
 
Last edited:
Maybe you are using a raidz1/2/3 with default volblocksize and it is "padding overhead"?
Output of zpool list -v and zfs list -o space would give some hints.
I just noticed the block size on zfs is 128K, we use ReFS with 64K block size.
Would that difference be what's causing a lot of wasted space?
 
I have added the output below, Unsure if what to look for in this regard @Dunadan?
root@pm2:~# zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 135G 3.08G 132G - - 1% 2% 1.00x ONLINE -
mirror-0 135G 3.08G 132G - - 1% 2.28% - ONLINE
scsi-35000c50047464a23-part3 - - - - - - - - ONLINE
scsi-35000cca00a7ba57c-part3 - - - - - - - - ONLINE
zStorage 25.5T 18.6T 6.83T - - 10% 73% 1.00x ONLINE -
raidz1-0 25.5T 18.6T 6.83T - - 10% 73.2% - ONLINE
ata-CT4000MX500SSD1_2311E6BC3520 - - - - - - - - ONLINE
ata-CT4000MX500SSD1_2320E6D56F22 - - - - - - - - ONLINE
ata-CT4000MX500SSD1_2320E6D876BE - - - - - - - - ONLINE
ata-CT4000MX500SSD1_2311E6BA7071 - - - - - - - - ONLINE
ata-CT4000MX500SSD1_2351E88AAA46 - - - - - - - - ONLINE
ata-CT4000MX500SSD1_2351E88AED4A - - - - - - - - ONLINE
ata-CT4000MX500SSD1_2351E88AA970 - - - - - - - - ONLINE
root@pm2:~# zfs list -o space
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
rpool 128G 3.08G 0B 104K 0B 3.08G
rpool/ROOT 128G 3.04G 0B 96K 0B 3.04G
rpool/ROOT/pve-1 128G 3.04G 190M 2.85G 0B 0B
rpool/data 128G 96K 0B 96K 0B 0B
zStorage 5.63T 15.7T 0B 162K 0B 15.7T
zStorage/vm-10000-disk-0 5.63T 271G 40.8G 231G 0B 0B
zStorage/vm-10000-disk-1 5.63T 28.5G 22.3G 6.19G 0B 0B
zStorage/vm-10000-disk-2 5.63T 1.04T 6.46G 1.04T 0B 0B
zStorage/vm-10000-disk-3 5.63T 2.28T 11.6G 2.27T 0B 0B
zStorage/vm-10000-disk-4 5.63T 2.90T 14.8G 2.89T 0B 0B
zStorage/vm-10000-disk-5 5.63T 3.03T 13.6G 3.01T 0B 0B
zStorage/vm-10000-disk-6 5.63T 2.03T 15.0G 2.01T 0B 0B
zStorage/vm-10000-disk-7 5.63T 2.02T 6.54G 2.02T 0B 0B
zStorage/vm-10000-disk-8 5.63T 2.09T 32.3G 2.06T 0B 0B
zStorage/vm-10000-disk-9 5.63T 922M 0B 922M 0B 0B
Very hard to read those tables if you don't put it between CODE tags.

I just noticed the block size on zfs is 128K, we use ReFS with 64K block size.
Are you sure its 128K volblocksize and not just 128K recordsize? You could check that with zfs get volblocksize,recordsize. Because 16K (previously 8K) volblocksize and 128K recordsize would be the default.
With a 7-disk raidz1 you will lose 50% of the raw capacity with 4K/8K volblocksize, 33% with 16K volblocksize, 20% with 32K/64K volblocksize, 16% with 128K/256K volblocksize, 15% with 512K/1M volblocksize. This lost raw capacity is indirect. ZFS will tell you that you got 6 of 7 disks usable for data but because of padding overhead every zvol will be bigger. So a 1TB of data on a zvol consuming for example 1.66TB of space.
Also keep in mind that the volblocksize can only be set at creating and can't be changed later without destroying and recreating those virtual disks. So even if the "Block size" field of the ZFS storage in PVE would be telling you that it's 128K, the virtual disks could use something different if the value of that textfield was changed after creating the virtual disks.

There is no refreversation consuming space and the space consumed by snapshots isn't mentionable. So my guess would be its padding overhead and not that trimming/discarding isn't working.
 
Last edited:
Very hard to read those tables if you don't put it between CODE tags.


Are you sure its 128K volblocksize and not just 128K recordsize? You could check that with zfs get volblocksize,recordsize. Because 16K (previously 8K) volblocksize and 128K recordsize would be the default.
With a 7-disk raidz1 you will lose 50% of the raw capacity with 4K/8K volblocksize, 33% with 16K volblocksize, 20% with 32K/64K volblocksize, 16% with 128K/256K volblocksize, 15% with 512K/1M volblocksize. This lost raw capacity is indirect. ZFS will tell you that you got 6 of 7 disks usable for data but because of padding overhead every zvol will be bigger. So a 1TB of data on a zvol consuming for example 1.66TB of space.

There is no refreversation consuming space and the space consumed by snapshots isn't mentionable. So my guess would be its padding overhead and not that trimming/discarding isn't working.
Ok that's interesting.
What would you recommend I do to resolve the issue?

I don't want a massive performance loss issue, but I assume I need to tweak it to better the loss?
 
First I would verify that you got a high enough volblocksize for those zvols using zfs get volblocksize.
 
First I would verify that you got a high enough volblocksize for those zvols using zfs get volblocksize.
It is defaulting to 8K and recordsize of 64K

Code:
root@pm2:~# zfs get volblocksize,recordsize zStorage
NAME      PROPERTY      VALUE     SOURCE
zStorage  volblocksize  -         -
zStorage  recordsize    64K       local
 
Last edited:
So with a 8K volblocksize and ashift=12 you would lose 50% raw capacity (14% because of parity, 36% because of padding overhead). Everything on those virtual disks should consume 71% more space. To fix that you would need to destroy and recreate those virtual disks. Easiest would be to change the "Block size" of the ZFS storage via webUI to something like 32K or 128K, then backup the VM, destroy the VM and restore it.
 
Last edited:
So with a 8K volblocksize and ashift=12 you would lose 50% raw capacity (14% because of parity, 36% because of padding overhead). Everything on those virtual disks should consume 71% more space. To fix that you would need to destroy and recreate those virtual disks. Easiest would be to change the "Block size" of the ZFS storage via webUI to something like 32K or 128K, then backup the VM, destroy the VM and restore it.
I am playing with it now thank you.
Side question though, Does this then not happen with Proxmox Backup servers too?
On that scale, with 3 stripes of 12 drives on raidz it would be hard for me to pick up by myself, How can one check if the padding is an issue in that array too?
 
Last edited:
Side question though, Does this then not happen with Proxmox Backup servers too?
Padding overhead only affects zvols as only zvols have a volblocksize. LXCs and PBS are using datasets which are using the recordsize instead (and therefore no padding overhead). One of the many reasons why you usually don't want to use a raidz1/2/3 for storing VMs but a striped mirror instead.

PS: Also don't expect great performance with those MX500 consumer SSDs. They got no PLP so all workloads doing sync writes, like running DBs, will have magnitudes less performance and way more SSD wear as without a PLP the SSD can't cache those writes in DRAM and will have to directly write to the slow NAND without optimizing it.
 
Last edited:
Padding overhead only affects zvols as only zvols have a volblocksize. LXCs and PBS are using datasets which are using the recordsize instead (and therefore no padding overhead). One of the many reasons why you usually don't want to use a raidz1/2/3 for storing VMs but a striped mirror instead.

PS: Also don't expect great performance with those MX500 consumer SSDs. They got no PLP so all workloads doing sync writes, like running DBs, will have magnitudes less performance and way more SSD wear as without a PLP the SSD can't cache those writes in DRAM and will have to directly write to the slow NAND without optimizing it.
That's fine. The setup is an experiment. Once I'm happy with it, I plan on changing them out for Micron's which do have PLP and handles the load better.
But thanks for the advise :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!