Weird performance problem with NTFS disk after expanding

Sebastian Salmhofer

Renowned Member
Nov 11, 2012
20
2
68
Paraguay
Hi,

I have a pretty weird problem and I am out of ideas and can't find anyone that has the same problem. I would be really grateful for some ideas.

Background info: I have one Proxmox host with an AMD EPYC 7443P, 128 GB RAM and three ZFS pools.
One is mirrored with two 1TB NVME SSDs
One is mirrored with two 4TB NVME SSDs.
And the third is RAIDZ1 with three 6TB WD Red Plus HDDs.

I originally had a Windows Server 2016 with a system disk on the 4TB SSD pool and a data disk on the HDD pool. The windows server is a secondary domain controller and also used as a file server. I wanted to replace it with something else for a long time, but never got to it. I initially had the data disk also on the 4TB SSD pool, but recently added the HDD pool and moved it over there. I then extended the virtual disk a couple of times and extended the NTFS partition on it. I didn't notice anything until a couple of days ago, when I noticed that the performance was really bad. When I copy files (doesn't matter if big or small) the transfer speed fluctuates a lot and goes down to pretty much nothing for most of the time. With spikes up to 100 or 200 MBps. Most of the time the copy finishes eventually if its just a couple GBs up to 100 or 200GB. I didn't try more yet, because even that takes hours. Sometimes it hangs completely.
I thought maybe there is something wrong with the Windows Server installation, so I moved the virtual disk to a Windows 11 VM. But that didn't change anything.
I had the drive attached with VIRTIO, but also tested if it makes a difference with other options, but it didn't.
I didn't think about the connection to expanding the partition initially, but I now believe that that is probably the reason. I am sure it worked normally a couple of months ago.
In the meantime I made two new file servers. One on my new Univention domain controller that I am migration to, for normal files. And one OMV for my of course legally obtained Movie and TV show collection that makes up most of the data. That was the reason why I got the HDD pool and extended the disk.
The data drive for OMV is on the same HDD pool and it works great. I get 200-300 MBps transfer speed over SMB. So I can for sure rule out the physical disks.
I managed to copy some of the data to the new file server, but it's very very slow. What's interesting is that the disk active time in Windows is almost always at 100% as soon as anything is using the disk. Even just browsing files in explorer. When I start copying files, when the transfer rate goes up, the disk active time goes down. Sometimes the transfer rate goes up to 100 or 200 MBps, but it drops down quickly again to zero or almost zero.
I think that something might be wrong with the NTFS file system. I ran chkdsk -f -x but it didn't fix anything. I tired chkdsk -f -x -r too, but that would take forever and I don't know if it makes any sense with a virtual disk. The -r is supposed to find bad sectors on the disk and move that data somewhere else.
zpool scrub ran around two weeks ago and didn't find any errors. I am running it right now and it says it will take over 7 days. So maybe there is something wrong with the zfs volume? The new one on the same pool works great as I mentioned.
I am going to let it run, but so far it didn't repair anything.
Thats the status:
1.48T / 7.02T scanned at 45.8M/s, 374G / 7.01T issued at 11.3M/s
0B repaired, 5.20% done, 7 days 03:15:21 to go

I am not sure if that will solve anything or how long it will really take, so I wanted to ask if anyone experienced something similar before? Or if someone has any ideas what else I could try in the meantime. I just want to copy everything to the new fileservers. I do have a cloud backup for the normal data that I could download, but it would take a long time since we only have a 20Mbps internet connections. For the media data I don't have a backup and would really like to save it. I could get it again of course, but it would also take a very long time.

Is there maybe someway I can check and repair the virtual disk outside of Windows? I can move it to some VM or boot of an ISO, but I don't know what to use. I never had to deal with a problem like this. If it wasn't a VM I would say the disk is bad, but I can rule that out. They are new, SMART is perfect and the other zfs volume on there works great.
But it's also weird that the zpool scrub takes so long. Now it went up again by a couple of hours. I've been trying to fix that for days now and I am out of ideas.

Thanks for any input
 
Here you can see the active time. It is copying the entire time. Most of the time not doing anything and sometimes there are bursts where it does copy and then the active time goes down when it does work.
copy.PNG
 
Little update. The scrub is still running but the remaining time is now way shorter. But still no errors found.
scrub in progress since Sat Jul 27 12:46:30 2024
2.60T / 7.02T scanned at 34.3M/s, 1.69T / 7.01T issued at 22.3M/s
0B repaired, 24.07% done, 2 days 21:36:27 to go

Any other ideas?
 
Es sind Samsung PM9A3 SSDs. Aber das Problem habe ich mit dem HDD pool aus drei Western Digital Red Pro HDDs. Wobei ich nicht glaube, dass es an den HDDs liegt, weil ich mit einem anderen volume im gleichen pool gar kein Problem habe. Da habe ich über SMB mit 10Gbps Ethernet transfer speeds von ungefähr 300MBps. Das finde ich absolut okay für ein RAIDZ1 mit 7200RPM NAS disks.
 
Ich habe bewusst Datacenter SSDs gewählt, weil ich bei meinem alten home server mit consumer SSDs das problem hatte, dass die nach ein paar Jahren schon Probleme machten. Die Performance ist mehr als ausreichend für meine Anwendung.
 
I just had another idea and tried to mount the ntfs partition in linux and copy it that way. Doesn't change anything. It copied like 12MB in ten minutes. I also ran ntfsfix without any success.
 
I finally figured it out. It turned out to be a bad SATA cable. It was just random that it worked when I tested on the other volume. It was very intermitted, probably changes in the cable due to vibration or something like that.
That was quiet the trouble shooting.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!