[SOLVED] Unexplained Storage Growth in Windows VM on ZFS.

taumeister

Member
Aug 16, 2022
31
5
8
49
Germany
Hello Proxmox Team,

I'm reaching out with an issue I've recently encountered regarding a Windows VM running on ZFS storage. I've noticed that this VM is progressively using more storage space. This became apparent particularly during my daily full backups, where I observed the VM's storage consumption increased by over 100 GB within ~8 weeks.

Interestingly, the VM itself doesn't show any increase in storage usage. In fact, I have deliberately deleted some data, yet both the backup volume and the ZFS store volume continue to grow. This phenomenon is baffling to me.

I read about the Discard option in the VM disk settings in the forum. I've implemented this setting and restarted both the VM and the host, but there has been no noticeable change.
zpool trim neither.

Other VMs do not show this phenomenon.
All VMs are stored on 4 nvme raidz1 storage.
The Windows Server is 2019.
- Used Space in VM is ~300GB
- Used Space in ZFS is ~700GB (Pictures)
UsedSpace.png
UsedSpace_ZFS.png
I just saw the email about the successful backup of the VMs, and it's quite evident there. The backup size has increased by over 14GB within two days, but no one is working in the office, no jobs are running, and the internal consumption of the VM remains the same... I'm starting to get a bit worried.
Monosnap pve - Proxmox Virtual Environment 2023-12-22 00-42-21.png




Could you please shed some light on what might be causing this issue and suggest how I can resolve it? Thank you in advance for your support.

Best regards,
Thomas
Proxmox Version is 8.1.3
 
Last edited:
Hello sb-jw,
I am curious to see what you want to look at in the configuration, here it is:
qm config 200 agent: 1,fstrim_cloned_disks=1 boot: order=virtio0;net0 cores: 4 cpu: kvm64 description: usb0%3A host=2-3%0A%0A Bus 001 Device 006%3A ID 0baf%3A0303 U.S. Robotics USR5637 56K Faxmodem%0A%0A Bus 01.Port 1%3A Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M%0A |__ Port 3%3A Dev 6, If 0, Class=Communications, Driver=cdc_acm, 480M%0A |__ Port 3%3A Dev 6, If 1, Class=CDC Data, Driver=cdc_acm, 480M machine: pc-q35-6.2 memory: 6144 meta: creation-qemu=6.2.0,ctime=1660385192 name: ZEUS net0: virtio=42:51:08:7B:45:8C,bridge=vmbr0,firewall=1 numa: 0 onboot: 1 ostype: win11 scsihw: virtio-scsi-pci smbios1: uuid=e1f0477d-a24d-4300-b6d2-fffeb1a93dc6 sockets: 1 startup: order=2 tpmstate0: NVME-Datastore:vm-200-disk-2,size=4M,version=v2.0 usb0: host=3-12 virtio0: NVME-Datastore:vm-200-disk-3,cache=writethrough,discard=on,size=1550G vmgenid: dd16eb11-2867-4f19-9647-7729b0e44d92

Thanks for time,
Regards
Thomas
 
Okay, thank you for your help so far,
but could you possibly explain what you expect from this, or why you are recommending it?
Does it solve the problem, have you seen this failure before with my settings, etc...or is it just a matter of design?

And, will these settings work without reinstalling drivers?
Greetings
Thomas
 
Last edited:
TRIM probably doesn't work for you because Windows doesn't know it's an SSD (hence the SSD flag). With virtio-scsi-single and scsi as disk, the disk thread runs individually and not for several. In addition, you could get more performance from NVMe with iothread.

Basically it is also described in the following article (which is also available for other Windows operating systems, just enter windows above and the suggestions will appear): https://pve.proxmox.com/wiki/Windows_2022_guest_best_practices

And, will these settings work without reinstalling drivers?
I assume that you have installed all packages from the VirtIO ISO, as described here: https://pve.proxmox.com/wiki/Windows_VirtIO_Drivers#Wizard_Installation

If that's the case, it should be detected straight away without any problems. To be on the safe side, you can take a snapshot beforehand, then the settings will be rolled back if something no longer works.

//EDIT:

boot: order=virtio0;net0
Please don't forget to adjust if the disk is on scsi0.
 
Last edited:
"I'm wondering if you have the VSS (Volume Shadow Copy Service) function enabled, and it might simply be taking up space on your disk. Check the Shadow Copies settings on your volumes using the command line:
vssadmin list shadows

Another feature that could be occupying space, though it may not be immediately visible, is System Restore. You can set the size of the space it occupies on each disk."
 
TRIM probably doesn't work for you because Windows doesn't know it's an SSD (hence the SSD flag). With virtio-scsi-single and scsi as disk, the disk thread runs individually and not for several. In addition, you could get more performance from NVMe with iothread.

Basically it is also described in the following article (which is also available for other Windows operating systems, just enter windows above and the suggestions will appear): https://pve.proxmox.com/wiki/Windows_2022_guest_best_practices


I assume that you have installed all packages from the VirtIO ISO, as described here: https://pve.proxmox.com/wiki/Windows_VirtIO_Drivers#Wizard_Installation

If that's the case, it should be detected straight away without any problems. To be on the safe side, you can take a snapshot beforehand, then the settings will be rolled back if something no longer works.

//EDIT:


Please don't forget to adjust if the disk is on scsi0.
Super answer, thank you.
I will plan this for this evening and will give feedback
 
"I'm wondering if you have the VSS (Volume Shadow Copy Service) function enabled, and it might simply be taking up space on your disk. Check the Shadow Copies settings on your volumes using the command line:
vssadmin list shadows

Another feature that could be occupying space, though it may not be immediately visible, is System Restore. You can set the size of the space it occupies on each disk."
Hello milew,
thanks for your thoughts and yes I have VSS active on all disks but if i remember correctly I spent 10GB each disk for VSS.
But it is a good point and I will check this too.
Thanks so far.
 
Hello milew,
thanks for your thoughts and yes I have VSS active on all disks but if i remember correctly I spent 10GB each disk for VSS.
But it is a good point and I will check this too.
Thanks so far.

I checked VSS and it seems not to be my problem.
But thanks for this idea anyway.
usedspace-vss.png
 
"Unfortunately, the attempt did not work. The moment I change the virtual disk from virtio0 to scsi0, the server runs into a Blue Screen with 'inaccessible disk'. The software was, and has been reinstalled by me. Additionally, I manually installed the drivers for vioscsi again as explained by your link, though that wasn't necessary. When installing a Windows VM, I always install all drivers and software from the virtio-win.iso.

I'm pretty confident I didn't make any mistakes, as it's not overly complex. Do you perhaps have another idea? Attached is the current configuration."


Code:
root@pve:~/2023-12-22-virtio-2-scsi# qm config 200
agent: 1,fstrim_cloned_disks=1
boot: order=scsi0;net0;ide2
cores: 4
cpu: kvm64
description: usb0%3A host=2-3%0A%0A Bus 001 Device 006%3A ID 0baf%3A0303 U.S. Robotics USR5637 56K Faxmodem%0A%0A Bus 01.Port 1%3A Dev 1, Class=root_hub, Driver=xhci_hcd/12p, 480M%0A    |__ Port 3%3A Dev 6, If 0, Class=Communications, Driver=cdc_acm, 480M%0A    |__ Port 3%3A Dev 6, If 1, Class=CDC Data, Driver=cdc_acm, 480M
ide2: local:iso/virtio-tools.iso,media=cdrom,size=612812K
machine: pc-q35-6.2
memory: 6144
meta: creation-qemu=6.2.0,ctime=1660385192
name: ZEUS
net0: virtio=42:51:08:7B:45:8C,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win11
parent: driver_re-installed
scsi0: NVME-Datastore:vm-200-disk-3,cache=writethrough,discard=on,size=1550G
scsihw: virtio-scsi-single
smbios1: uuid=e1f0477d-a24d-4300-b6d2-fffeb1a93dc6
sockets: 1
startup: order=2
tpmstate0: NVME-Datastore:vm-200-disk-2,size=4M,version=v2.0
usb0: host=3-12
vmgenid: dd16eb11-2867-4f19-9647-7729b0e44d92
 
Sometimes it helped to add a second disk with the target configuration. Then you can adjust the original one and the system starts again. The other one can then be deleted.

Do you still have the blue screen message?

Do you have Windows 11 or Windows server running? Then I'll see if I can recreate it or fix it.
 
"That's really kind of you to take the time for this.
Yes, I would continue to have blue screens, but of course, I have already reverted to the snapshot.
But it is no thing to redo everything and start from beginning.
It is indeed a Windows Server 2019.

Just to fully understand and we're not talking past each other.
Are you suggesting that I should
- add another disk with the new layout before changing the main disk... so as to 'encourage' Windows to use the correct driver,
- then change the main disk and later
- remove the 'catalyst'-disk?"
 
To be completely honest, I would have said that it's a far-fetched idea, the one with the additional hard drive, but I was wrong, it worked right away. You never stop learning. Thanks a lot for the tip.

What's the best way to proceed now? Should I wait until Windows trims the disks during its maintenance time, or do I need to trigger something manually?
 
The tips about switching to SCSI and thereby enabling SSD and IOTHREAD have coolly improved performance, but unfortunately, they haven't solved the original problem with the strange data growth.
Does anyone have an idea what I could do to counteract this? Normally, I wouldn't be so passive about it, but I just can't explain where it could be coming from. The server itself hardly generates more data and still occupies about 300GB internally.
However, in the ZFS pool, about 700GB is used, which I find odd, and the backup currently uses 640GB of storage.
The image shows the backup increasing by 140GB within 13 days, it is spooky.
What other options do I have, maybe a restore into a new virtual disk?
Monosnap pve - Proxmox Virtual Environment 2023-12-23 20-20-32.png
 
It's great that you were able to change the disk and at least notice an improvement in performance.

It could be that a few settings here and there were not set in Windows. This could happen because you first installed Windows on an "HDD" and then simply repurposed the same disk to be an "SSD".

You can try to trigger an fstrim via the GA: qm guest cmd 200 fstrim
(Note that the command may time out - this may be because the job has a long execution time because it has to process a lot.)
What else comes to mind, have you created any snapshots yet? Via the GUI or possibly ZFS directly?
 
Hi, and thank you for not leaving me alone with this problem.

Yes, I certainly do have snapshots, and I always resolve them promptly, the typical scenario for testing something or applying updates. I resolve these snapshots after two days; usually, you don't want to go back any longer than that anyway.

Regarding the trim command, I have executed it and after about 5 seconds I get.

Code:
qm guest cmd 200 fstrim
{ 
   "paths" : [
      { 
         "path" : "C:\\"
      },
      { 
         "path" : "V:\\"
      },
      { 
         "path" : "G:\\"
      },
      { 
         "path" : "E:\\"
      },
      { 
         "path" : "D:\\"
      }
   ]
}

* did trim command on my zpool
* did trim command on windows server itself with defrag /L
* restarted, of course many times.
No success.

Do you have other ideas?
 
If you get a 1 back with the command fsutil behavior query DisableDeleteNotify, then set it to 0 with fsutil behavior set DisableDeleteNotify 0.

Then you could try defrag /o and/or optimize-Volume -DriveLetter C -ReTrim -Verbose.

You could then use sdelete.exe -z c: to see if it helps.
=> https://learn.microsoft.com/en-us/sysinternals/downloads/sdelete

You could also try moving the image to another storage and then back to the correct one. During import, the image is usually optimized accordingly, which could result in it being reduced in size again. I think there is at least a very high chance that this will have happened and with the other optimizations you shouldn't find yourself in such a situation again.
 
Let us know if it helps, otherwise Merry Christmas and please enjoy the holidays with the family and not in front of the server ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!