Moving Disks between Storage Types is oddly slow

dagutman

New Member
Oct 30, 2023
10
0
1
I ran out of space on a CEPH pool, and so I am moving data to a local partition. The data is a 512GB virt-io based harddrive I am using for one of my virtual machines. It's taken almost 40 minutes, and is only part way complete. I was doing this with some other storage/hard drives on some of my other virtual machines as well--- in that case I was moving the data from a SSD ceph pool to an NVME based one.

Those migrations were also oddly slow. Even a 128GB virtio block took 1-2 hours to migrate. I am not sure exactly what performance to expect, but this seems oddly slow.

I currently have 9 machines in my cluster. They are Dell R730's, and I have a dedicated 10 GB NIC specifically for the CEPH backend. Each machine has 384 GB of RAM, and all the drives I am referencing are SSD or NVME, so I am just trying to wrap my head around any potential mis-configurations I may have that is making things so glacial.
 
Is the local partition on a SSD that uses QLC flash memory? Those become (much) slower than HDDs once the cache is full during sustained writes.

EDIT: This does not seem to be the problem. I have no idea since I/O delay is non-existing.
 
Last edited:
Good question. All of the SSD drives I use are the Samsung 970 EVO which I believe are TLC.

The machine that currently hosts the VM in question has a 4TB Samsung 870 EVO 4 TB as the boot disk, and a Samsung SSD 980 PRO TB NVME stick. The other drives in the machine are 16 TB Seagate Drives ( 4 of them) that I am using for ZFS.

I just checked the machine... CPU load is very low ( ~ 3% ) , and RAM usage isn't pegged either (I forgot, this machine only has 192 GB of RAM). The HD space that is also quite low.

Also any traffic should be moving over a 10GB dedicated port I have for the ceph backend, and the other main network I use internall is also 10GB... It actually just finished, but took 45 minutes, which seems bafflingly slow.

1698685481818.png

1698685508743.png
 
Thanks. Yeah I find it a bit baffling to be honest, and not sure how/where to troubleshoot. It went at a snails pace..
 
So digging a bit further, I profiled my storage. I expected low IOPS for my SATA drives which I was using for bulk storage, but even the SSD drives only pulled ~ 100 IOPS, and the NVME drives were only ~ 340 IOPS second. These are Samsung 970 and 980 EVO drives predominantly. I know there's a strong recommendation to use Enterprise grade drives (which are on order), but still a bit surprised a relatively decent NVME drive would perform this terribly in CEPH. Is this par for the course without having an enterprise drive with supercapacitors?

I also haven't gone through each drive and make sure write cacheing is disabled, which I saw as another recommendation. If the performance is this terrible though, I may just wait to get the Enterprise SSD's versus manually disabling cache on 60+ drives.


1698718144165.png


1698718332340.png
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!