VM Disk Performance Issue After Moving from Local RAID SSD to NFS Storage

powersupport

Well-Known Member
Jan 18, 2020
315
6
58
30
We have a Proxmox VE Essentials 1xCPU subscription and are experiencing a disk performance drop after migrating a VM’s virtual disk.


Originally, the VM’s disk was on a local RAID SSD. We moved it to an NFS shared disk (15K RPM RAID array) hosted on an ESXi server over the network.


After the migration, we observed the following:


  • NFS shared storage write speed: ~305 MB/s
  • Local RAID SSD performance is now noticeably poorer after virtualization (significantly below expected speeds)

VM Configuration:


agent: 1
boot: order=scsi0;ide2;net0
cores: 4
memory: 8192
scsi0: local-lvm2:vm-121-disk-1,size=1T
scsihw: virtio-scsi-single


The VM’s virtual disk controller is VirtIO SCSI (single queue).


We suspect the performance degradation might be related to:


  • How the VirtIO SCSI controller is handling I/O
  • Possible misconfiguration or driver issue inside the guest OS
  • Potential bottleneck in the local RAID SSD after virtualization

Questions :

  1. Could the choice of virtio-scsi-single be contributing to this performance drop?
  2. Would switching to VirtIO SCSI (multiple queue) or another controller type improve throughput?
  3. Are there recommended tuning parameters (both on Proxmox and in the guest OS) for optimal performance in this setup?
  4. Could the NFS storage setup or the move itself have impacted the local RAID SSD performance for this VM?
 

Attachments

Hi @powersupport ,
Would you clarify your setup for me? You moved from local SSD to NFS hosted in a VM on ESX backed by spinning rust? At the very least, I’d suggest starting your fio testing inside the ESX VM.

Your VM configuration file shows a single disk on storage called "local-lvm2". The name could be arbitrary, but did you actually name an NFS storage “local-lvm2”? Out of curiosity, could you share the output of:
cat /etc/pve/storage.cfg

You can find some best practices listed here: https://kb.blockbridge.com/technote/proxmox-tuning-low-latency-storage
These are geared toward block storage rather than file storage, but some tips may still help now that your data is traveling over the network instead of locally.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Hi @bbgeek17

Below is added the result, could you please look into it and advise

dir: local
path /var/lib/vz
content vztmpl,iso,backup

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images

nfs: iso
export /export
path /mnt/pve/iso
server 192.168.201.218
content vztmpl,iso
prune-backups keep-all=1

nfs: pvebk2
export /export
path /mnt/pve/pvebk2
server 192.168.201.219
content backup,images
prune-backups keep-all=1

dir: lvm2
path /lvm2
content rootdir,images
nodes pve-quorum2,pve4
prune-backups keep-all=1
shared 0

lvmthin: local-lvm2
thinpool data
vgname pve
content images,rootdir
nodes pve4
 
Originally, the VM’s disk was on a local RAID SSD.
We moved it to an NFS shared disk
scsi0: local-lvm2:vm-121-disk-1,size=1T
the originally attached VM config:
scsi0: local-lvm2:vm-121-disk-1,size=1T

The VM configuration you attached contains only one virtual disk/image - "scsi0". It is located on storage called "local-lvm2". You subsequently provided your storage configuration that shows:
lvmthin: local-lvm2
thinpool data
vgname pve
content images,rootdir
nodes pve4
Which is a node-specific duplicate of:
lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images
The latter is not scoped to all but "pve4". I have never attempted such strange configuration, but I suspect "pve4" sees both local-lvm and local-lvm2 which both point to the same location.

Rather than trying to untangle this, it may be easier for you to carefully review everything and repost/restate the issue with relevant data.

If you are indeed dealing with virtualized NFS server, you must start with establishing a baseline at all points of the chain.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: _gabriel
@bbgeek17
Orignally vm 121 was on storage local-lvm at pve4. To trouble shoot the IO issue, I've created storage local-lvm2 on the same VG at pve4 to see if it can help (actually it can't). I've already moved the vm disk back to local-lvm. I am sorry for confusing you.

HW raid model? HPE SR932i-p Gen10+
raid type? raid 6, How many disks? 8
SSD model? MO003200PZWSK (3.2TB 22.5G SAS SSD)
 
Hi,


Could someone kindly provide an update on the resolution of this issue?


Looking forward to your response.


Thank you.
 
Hi @powersupport ,

It's been a while since I looked at this thread due to being away. It seems the last deliverables are for you to clarify your setup and retest at all points to establish the baseline.

After giving the original post another read, I am left with even more questions. You stated:
Originally, the VM’s disk was on a local RAID SSD
So the disk was hosted locally? I.e. on local-lvm. It is still hosted locally based on the configuration attached to the opening post.
We moved it to an NFS shared disk (15K RPM RAID array) hosted on an ESXi server over the network.
It was moved to NFS share, but the configuration you provided does not show this. It still shows disk on "local-lvm2" which, reportedly, was an experiment you ran and is no longer present?

So what do FIO results show? A local-lvm direct vs VM? NFS vs VM? NFS vs local-lvm? It probably does not matter, as you need to redo it with better descriptions and more consistency.

After the migration, we observed the following:
Did you migrate it back to local-lvm after migrating to NFS?

  • Local RAID SSD performance is now noticeably poorer after virtualization (significantly below expected speeds)
Is it worse for the disk you migrated back and forth? For another unrelated VM?

  1. Could the choice of virtio-scsi-single be contributing to this performance drop?
  2. Would switching to VirtIO SCSI (multiple queue) or another controller type improve throughput?
  3. Are there recommended tuning parameters (both on Proxmox and in the guest OS) for optimal performance in this setup?
  4. Could the NFS storage setup or the move itself have impacted the local RAID SSD performance for this VM?
It may be worth for you to go over the recommendations in this article: https://kb.blockbridge.com/technote/proxmox-tuning-low-latency-storage/index.html
As well as this one: https://kb.blockbridge.com/technote/proxmox-aio-vs-iouring/index.html#summary

In the end you seem to be running a business environment and have a support subscription. Opening a case with PVE support may be more efficient than relying on volunteers in the forum.

Cheers


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kingneutron
Hi @bbgeek17

1. So the disk was hosted locally? Yes on local-lvm. I.e. on local-lvm. It is still hosted locally based on the configuration attached to the opening post.? yes
2. It was moved to NFS share, but the configuration you provided does not show this. It still shows disk on "local-lvm2" which, reportedly, was an experiment you ran and is no longer present?

Yes, I was doing an experiment to see if the IO issue is related to vm configuration.

3. So what do FIO results show? A local-lvm direct vs VM? NFS vs VM? NFS vs local-lvm? It probably does not matter, as you need to redo it with better descriptions and more consistency.

In short, the local-lvm IO performance is not good. That's why I created a ticket for help. If what I have done makes you confused, please ignore them. Please advise what we can do to diagnostic the issue.

4. Did you migrate it back to local-lvm after migrating to NFS? Yes

5. Is it worse for the disk you migrated back and forth? For another unrelated VM?

not really, I did a new os install on local-lvm. The disk performance is not good.
 
it might be that its overhead issues..

i would try a ZFS and put the disks in as RAW format. it has better preformance.
and remember backup backup and backup ;)
 
  • Like
Reactions: Kingneutron
Alright,

NFS, migration, ESXi, network - all of that is extraneous and not related to your problem.

It really boils down to this: you have a RAID-backed LVM group, and you’re not happy with its performance inside the VM. It didn’t get worse - it simply was never satisfactory to begin with?

@powersupport , you did not answer the most critical question - "What do FIO outputs show?".

The information in the fio TXT files you originally attached is not helpful. We don't know where the testfiles are directed to.
root@pve4:~# fio --name=test --filename=testfile
[georgetse@ee19c-dev ~]$ fio --name=test --filename=testfile

Please redo your tests, use bigger dataset size, document exact configuration during the test, ie where the "testfile" is directed and the entire layer beneath it.

@powersupport , please note you did not open a ticket - forum is not an official support space. I believe you can open an official ticket with Proxmox support via your customer portal interface.

@weconnect , OP is running a hardware RAID6, he'd have to tear everything down to transition to ZFS. Even then, I am not sure he will get 10x performance on Reads in the VM (making an assumption of what original FIO results show)


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
Hi @bbgeek17,


Thanks for your patience. Please find attached the io performance between host and guest. As there is no package ssacli found on proxmox repository. I've captured RAID controller infomation from iLo

Thank you
 

Attachments

  • Like
Reactions: BobhWasatch
Hi @powersupport,

We’re having trouble communicating in a way that allows us to assist you. The information shared so far isn’t sufficient for diagnosing the issue. At this point, I recommend opening a support ticket through your Proxmox support contract, where you can get dedicated one-on-one assistance: that’s precisely what it’s there for.

When troubleshooting performance, it’s essential to take a structured approach: measure carefully at both the host and VM level, eliminate filesystem layers in your testing, and document every step.

For what it’s worth, I asked one of our performance engineers to review the limited details provided. Their impression is that it could be a CPU scheduling issue on the host, possibly due to system age, misconfiguration, or simply system overload.

Best of luck, and please update this thread once you’ve identified the solution.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox