High Disk IO and Disk Usage over a short period of time.

Jarvar

Active Member
Aug 27, 2019
317
10
38
I have Proxmox 6.0.4 installed for roughly just under 3 months now.
I am fairly new to using this Hypervisor, but I would consider the disk usage high for the use case.

The Proxmox server has 1 active Windows VM which is running 24/7. This is Windows Server 2019 Standard with a 100GB OS drive and 250GB Data Storage Virtual Drive as well.

It was backed up to an NFS storage nightly, until a week ago it was a Stop and GZIP for the most part. I changed it a week ago to be a LZO Snapshot for about a week now. The VM used to be ~150GB when I had it as 1TB Data VM and 250GB OS VM drive. I resized it a while ago so the result of VZdump is now ~66GB. Could this be accounting for the high disk usage?

I have 2 drives setup as RAID ZFS1 on a 1TB Hard drive. an Adata XPG SX8200 Pro and a Crucial MX500.

So far the drives are showing ~13.4 TBW since the begining of October.
I have tried to keep logs of how much data is written.
At first, I was continuing the Veam windows Agent backups, but stopped that, I am still making full Windows Server Backups inside on a daily basis to an external USB drive.
I also have a Duplicati backup running nightly inside which is about 20GB of Data to a network connected NAS, but only for changes, not a whole backup.

Here's the data from smartctl -a /dev/sdc which is the Crucial MX500 drive. The Adata one shows roughly the same amount of data written, just a larger percentage remaining. it is showing 2% life used.

10/21/2019 Smart Line 246. LBA 13,901,006,860 (6.47TB)
10/23/2019 SMART-246 LBA- 14,155,026,948 (14155026948)
10/24/2019 Smart 246-LBA- 14,262,437,348 (14262437348) (6.64TB)
10/25/2019 Smart 246-LBA- 14,373,922,564(14373922564) (6.69TB)
10/29/2019 Smart 246-LBA- 16,489,684,252 (16489684252) (7.68TB)
11/05/2019 Smart 246-LBA - 17,602,185,748 (17602185748) (8.20TB)
11/06/2019 Smart 246-LBA - 17,781,186,804 (17781186804) (8.28TB) 95% remaining
173 shows 75.
12/02/2019 Smart 246-LBA 22,090,879,516 (22090879516) (10.29TBW) 92% remaining.
12/23/2019 Smart 246-LBA 28,831,473,373 (28831473373) (13.43 TB) 90% Remaining.


Any help would be much appreciated.
Thank you.
 
Last edited:
This is a lot of data written. Without knowing where you back up this data and what workload you are running in your Windows guest it is hard to determine the cause.

If you do not have any other monitoring you could check if the graph of the VM summary gives any hints as to when there is how much disk IO happening.
 
This is a lot of data written. Without knowing where you back up this data and what workload you are running in your Windows guest it is hard to determine the cause.

If you do not have any other monitoring you could check if the graph of the VM summary gives any hints as to when there is how much disk IO happening.
Thanks, well when we had the server running barebones it had the Windows Server Backup running nightly, also a Veam Windows Agent Bare metal backup running both to an external USB Drive. I also had a backup of specific files with a program called Duplicati which would save all the data that was on our separate data drive. It would save this to a Synology NAS that was attached to the network.

For now, I have turned off the Windows Server Backup and the VEAM Agent for Windows bare metal backups inside the VM. I just have a vzdump of that VM nightly which comes out to about 66GB to an NFS Server running off our Synology NAS.
I checked the Smart data and from yesterday it was 14.2 TBW and today we are at 14.3 TBW which seems like a a lot still, roughly.
Could there be some run away program or process that could be causing this?
DISK IO is showing 85-120K hourly average.
But in the first week, the DISK IO was in the M not K. Even now if I look up by Maximum vs. Average in the charts I see some spikes into the M territories.
 
This is a lot of data written. Without knowing where you back up this data and what workload you are running in your Windows guest it is hard to determine the cause.

If you do not have any other monitoring you could check if the graph of the VM summary gives any hints as to when there is how much disk IO happening.

I'm wondering if the high I/O is related to the VZDUMPS that I have been running nightly on our Server VM. It takes up about 68GB on ZFS. That really only leaves ~30-40GB daily drive activity. However I don't think that should be very high. On our bare metal server, it has been a whole year and now it is showing 98% used, albeit it is on Intel S4600 or S4500 drives...

Are nightly vzdumps too much? should I be running them less? or is that about right?
Thanks.
 
Regarding SSD quality:
The Adata has a TBW of 640 TB (https://geizhals.eu/adata-xpg-sx8200-pro-1tb-asx8200pnp-1tt-c-a1927167.html)
The Crucial has a TBW of 360 TB (https://geizhals.eu/crucial-mx500-1tb-ct1000mx500ssd4-a1745481.html)

The Intel SSDs that you have in the other server play in a totally different league.
I looked up the ~1 TB models to have a fair comparison:

Intel S4500 960 GB has a TBW of 1.86 PB (https://geizhals.eu/intel-ssd-dc-s4500-960gb-ssdsc2kb960g701-a1688269.html)
Intel S4600 960 GB has a TBW of 5.25 PB (https://geizhals.eu/intel-ssd-dc-s4600-960gb-ssdsc2kg960g701-a1688382.html)

Can you explain to me how exactly you got this output? I do not see anything like it when I run smartctl on my SSDs.
10/21/2019 Smart Line 246. LBA 13,901,006,860 (6.47TB)
10/23/2019 SMART-246 LBA- 14,155,026,948 (14155026948)
10/24/2019 Smart 246-LBA- 14,262,437,348 (14262437348) (6.64TB)
10/25/2019 Smart 246-LBA- 14,373,922,564(14373922564) (6.69TB)
 
Regarding SSD quality:
The Adata has a TBW of 640 TB (https://geizhals.eu/adata-xpg-sx8200-pro-1tb-asx8200pnp-1tt-c-a1927167.html)
The Crucial has a TBW of 360 TB (https://geizhals.eu/crucial-mx500-1tb-ct1000mx500ssd4-a1745481.html)

The Intel SSDs that you have in the other server play in a totally different league.
I looked up the ~1 TB models to have a fair comparison:

Intel S4500 960 GB has a TBW of 1.86 PB (https://geizhals.eu/intel-ssd-dc-s4500-960gb-ssdsc2kb960g701-a1688269.html)
Intel S4600 960 GB has a TBW of 5.25 PB (https://geizhals.eu/intel-ssd-dc-s4600-960gb-ssdsc2kg960g701-a1688382.html)

Can you explain to me how exactly you got this output? I do not see anything like it when I run smartctl on my SSDs.
Thank you Aaron for getting back to me.

For the Adata XPG SX8200 Pro NVME and Crucial MX500 SATA, that is what I have installed in the Proxmox server I have now at one location. They are running as ZFS Raid 1. That is where I am pulling those SMART stats from. It is not a direct output, but something I kept track of over time to show quick increase in TBW over a short period of time.

I have another server which is in my home lab which is using intel2 x S4600 240GB in raid ZFS1 for the OS and another pair of S4500 960GB drive in raid ZFS1 which is my storage for the VMS.

Those are my comparisons.

This is the output for the MX500

Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       1505
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   089   089   000    Old_age   Always       -       174
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       5
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000    Pre-fail  Always       -       45
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   057   035   000    Old_age   Always       -       43 (Min/Max 0/65)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   089   089   001    Old_age   Offline      -       11
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_Host_Sector_Write 0x0032   100   100   000    Old_age   Always       -       28072456384
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       4565428615
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       3294812764


And this is for the Adata XPG SX8200 Pro

Code:
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        35 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    2%
Data Units Read:                    27,072,192 [13.8 TB]
Data Units Written:                 28,771,046 [14.7 TB]
Host Read Commands:                 1,120,979,702
Host Write Commands:                844,289,147
Controller Busy Time:               5,725
Power Cycles:                       48
Power On Hours:                     2,709
Unsafe Shutdowns:                   22
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Thermal Temp. 1 Transition Count:   23
Thermal Temp. 2 Transition Count:   15
Thermal Temp. 1 Total Time:         124
Thermal Temp. 2 Total Time:         160

Error Information (NVMe Log 0x01, max 256 entries)
No Errors Logged


Thank you,
Let me know if there is any other information that would be more helpfu.
 
Last edited:
For the Adata XPG SX8200 Pro NVME and Crucial MX500 SATA, that is what I have installed in the Proxmox server I have now at one location. They are running as ZFS Raid 1. That is where I am pulling those SMART stats from. It is not a direct output, but something I kept track of over time to show quick increase in TBW over a short period of time.
Okay. Well this line
Code:
202 Percent_Lifetime_Remain 0x0030   089   089   001    Old_age   Offline      -       11
for the Crucial shows that it is through 11% of its life and got 8% left. See the documentation of the Percent Lifetime Remaining value: https://www.micron.com/-/media/clie...torage/tnfd22_client_ssd_smart_attributes.pdf

I don't know what kind of server it is but the two SSDs have very different Power On Hours and Power Cycles, so my guess is, that they've also been used differently before they ended up in the ZFS Mirror? Another reason why the wear-out is different between them, besides the fact that the Adata is specified for a lot more TB written.

Regarding what can cause writes:
Is Proxmox VE itself installed on this ZFS Pool? It will create a few GB of logs per day.

VZDump writes the data only to the target storage when backing up VMs, in your case the NFS. It does create temporary files when backing up containers, but you only have Windows VMs right?

Other than that, you can try to find out what is causing writes in your VMs. On Windows you have the Resource Monitor for example.
 
Okay. Well this line
Code:
202 Percent_Lifetime_Remain 0x0030   089   089   001    Old_age   Offline      -       11
for the Crucial shows that it is through 11% of its life and got 8% left. See the documentation of the Percent Lifetime Remaining value: https://www.micron.com/-/media/clie...torage/tnfd22_client_ssd_smart_attributes.pdf

I don't know what kind of server it is but the two SSDs have very different Power On Hours and Power Cycles, so my guess is, that they've also been used differently before they ended up in the ZFS Mirror? Another reason why the wear-out is different between them, besides the fact that the Adata is specified for a lot more TB written.

Regarding what can cause writes:
Is Proxmox VE itself installed on this ZFS Pool? It will create a few GB of logs per day.

VZDump writes the data only to the target storage when backing up VMs, in your case the NFS. It does create temporary files when backing up containers, but you only have Windows VMs right?

Other than that, you can try to find out what is causing writes in your VMs. On Windows you have the Resource Monitor for example.

Well now that you mention it. I'm not sure why the power on hours are that different. I installed them both new at the same time.
One is NVME and the other is a SATA SSD, would that make a difference?
For the Crucial, my understanding was that 11% of the life is used, and so it has 89% remaining, That drive is supposed to have atleast 360TBW Endurance rating. The Adata XPG SX8200 one is supposed to be around 640 TBW which makese sense which is almost double.

I pretty much installed them both at the same time. Maybe I might have installed the ADATA first and tested things out before putting in the MX500. The discrepancy of 1000 hours though would make it something like a month of constant on difference which I didn't do for sure.

Yes the Proxmox VE is intalled on the ZFS Pool? Not really sure, when I set it up, I chose ZFS Raid 1 and pretty much let it automatically do its thing.
Is there a print out I can type in that would show that? like lsblk?
I'm kind of new to Proxmox.
Thanks.

The VZDUMP, I was testing as to whether or not it increases the disk write when I turn it on or off. Haven't had enough time to test yet since I turned on Server Full Backups and Veam Agent backup as well when altered the vzdump schedule. I really don't want to lose any data.
 
Hi,

I guess, that you use the deafault 8k blocks size for your VM. Raidz will split any block of 8k in 2x 4k(= default ashift zfs ) + 4k for parity. But in most cases SSDs wil use at minimum 16k block size. So to summarize for 8k(zfs) data on the SSDs will land at minimum 16k / each SSD(for this reason you see a increases size of your backup). And if we take in account the default VM block size (512) ... we have a bad case! WinDows will need for example to modify only one 512 block. But for this zfs will need to Read a 2 x 4k blocks of data, then modify only 512 in RAM, and then write a 2 new or maybe only 1 block of 4k (RMW -> high IO). This is call write amplification.
And take in account that your winDows have a task for optimise your vdisk=defragment FS -> another source of data write who will reduce unnecessary the life of yours SSDs(better to use SSD emulation for this vdisk in PMX).

What you can do? Increase zfs block size!

Good luck / Bafta !
 
Last edited:
  • Like
Reactions: takeokun and Jarvar
Hi,

I guess, that you use the deafault 8k blocks size for your VM. Raidz will split any block of 8k in 2x 4k(= default ashift zfs ) + 4k for parity. But in most cases SSDs wil use at minimum 16k block size. So to summarize for 8k data on the SSDs will land at minimum 16k / each SSD. And if we take in account the default VM block size (512) ... it we have a bad case! Win will need for example to modify only one 512 block. But for this zfs will need to Read a 2 x 4k blocks of data, then modify only 512 in RAM, and then write a 2 or maybe only 1 block of 4k (RMW). This is call write amplification.

Good luck / Bafta !

Is there a method to increase ZFS Block Size? or would I need to reinstall the ProxMox VE OS? Or reinstall the VM (didn't see an option there)?
Thank you so much for all this inforation. I really appreciate it.
Yes I went with default settings when I installed...
 
Or reinstall the VM (didn't see an option there)

You can create a new vdisk and add to the existentent VM. Then format from the OK this new vdisk but specify a increasesed size with NTFS(32-64K?). Then reinstal your VM without format the vdisk(so you can keep your 32-64 K block size)

Good luck /Bafta
 
You can create a new vdisk and add to the existentent VM. Then format from the OK this new vdisk but specify a increasesed size with NTFS(32-64K?). Then reinstal your VM without format the vdisk(so you can keep your 32-64 K block size)

Good luck /Bafta
Wow thanks so much.
So I am trying this out in a home lab environment first.
I have created a new XFS Pool:

zfs create rpool/VM64K

Then I've added it in the Datacenter-> Storage -> Add ZFS

I remember you mentioning we should change the default Block size. The default is 8K.
What should we make the new block size for the ZFS Pool? Do we need to specify a size as well? or do we start transfering VMs from the old pool to the new one?

Thank you so much again. I really appreciate your assistance.
 
You can create a new vdisk and add to the existentent VM. Then format from the OK this new vdisk but specify a increasesed size with NTFS(32-64K?). Then reinstal your VM without format the vdisk(so you can keep your 32-64 K block size)

Good luck /Bafta

I created a new ZFS pool and added it in the Data>Storage view with 64K which I am assuming from the VM64K.
Now do I need to create new vdisks for the windows VM? or is it just a simple matter of moving the disks from the old ZFS pool to the new one?

Well I tried to move the disk and it seems to work, but is it really that simple?
I got an error at first then I realized I typed in 64 without the "k" after I added then it worked.

I probably need a little more guidance...

You have been great gluetz. Thanks.
 
Just found this link about being able to change the block size on storage pools.

https://forum.proxmox.com/threads/zfs-is-block-size-actually-record-size.55897/

Will this work if I create a new 64K zfs pool. Move my exisitng VM disks to the new pool. Change the block size for the old pool, and then move them back? if I want to keep the same name of zfs pool and structure.

I am still trying to wrap my head around everything.
Thanks.
 
Well I tried to move the disk and it seems to work, but is it really that simple?

You can check using command line:

zfs get all rpool/.../vm-id-disk

see the volblocksize value.

Remember that you must change also the ntfs block size to 64.

Another tip: not all VM need 32-64 K, it depends ... For example mssql use 64k, mysql use 16k, for some file servers 128k is better and so on. On my servers I have different datasets with different volblocksize(mostly 16-64 k).

Good luck / Bafta.
 
  • Like
Reactions: takeokun and Jarvar
You can check using command line:

zfs get all rpool/.../vm-id-disk

see the volblocksize value.

Remember that you must change also the ntfs block size to 64.

Another tip: not all VM need 32-64 K, it depends ... For example mssql use 64k, mysql use 16k, for some file servers 128k is better and so on. On my servers I have different datasets with different volblocksize(mostly 16-64 k).

Good luck / Bafta.

Thank you immensely. This has been a huge help to me.
So if I created the VM64K pool, do I need to reinstall the Windows on the VM to get 64K blocks?
I found the volblocksize value is 64k after moving it.
I also checked the NTFS block size using fsutil fsinfo ntfsinfo c:

I got this output which shows as 4096 Bytes which I think translates to 4k.

The server is running a sqlserver express 2008 or 2012. I wouldn't even say intensely.
How did you make a choice between 16-64k volblocksizes?

I checked some previous installs. Even on bare metal and the 4k seems to be the default. Why is it causing so much wear once we change it to a VM in Proxmox? is there some incomptibility or something?
Also if i create 64k block size, is that good to use with Linux VMs as well like Centos8, Debian?

Thanks again.
 
I got this output which shows as 4096 Bytes which I think translates to 4k.


You solve only half of the problem. Fsutil must show 64k if you ask me :)


The server is running a sqlserver express 2008 or 2012.
.... and for mssql you will need 64k, as Microsoft recomends !


2012. I wouldn't even say intensely.
How did you make a choice between 16-64k volblocksizes?

Try and error ;) , intuition and .... lot of time to spend !


I checked some previous installs. Even on bare metal and the 4k seems to be the default

Only for compatibility .... but zfs is so different.
Why is it causing so much wear once we change it to a VM in Proxmox?

Is not about Proxmox here. Is about any COW (copy on write) system.
Also if i create 64k block size, is that good to use with Linux VMs as well like Centos8, Debian?

For a file server it will be ok. For others loads maybe not. Think at a car family... It will be ok if I drive with 120 miles/hour? It depends if you have a 200 kg load. But sometime you need to carry let say 500 kg!!! Also with 200 kg if you will drive in mountain region is very different compared with a plain road !

This has been a huge help to me.

Give me a "virtual beer" ;)

Good luck / Bafta.
 
  • Like
Reactions: Jarvar
In that case guletz, here is *virtual beer*. Cheers!

but I really do appreciate the help.
I was contemplating XCP-ng for a while. When I was on their forums, their replies were quite fast. My preference has been for Proxmox

Some others and yourself have been very helpful and accomodating with me considering some of my basic questions. I am trying to learn as much as I can.

I forgot to post the picture of fsutil output

here.
Capture1.PNG

I've been looking up changing the block size for NTFS. I keep finding the general consensus that it is usually done during disk formatting.
Although, maybe it's possible to create drives and clone them to a disk with 64k blocks, or will that reformat them when cloning as well.

Thank you for all the comparisons.
If I switch Linux VMs to be stored on 64k blocks, do I need to change the file system inside the VMs to 64k volblock sizes as well? Or Linux has a more flexible system?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!