I/O spikes on VMs

filip.nikoloski

New Member
Jun 19, 2023
2
0
1
Hello everyone,

we have a cluster with 5 nodes and 140 LXC/VMs (around 90 running).
Recently we have migrated to a new NAS storage (TrueNAS with 3 x 3 vdevs, Raid Z1, 4TB disks).
Also, at the same time we have upgraded all Proxmox nodes to run on 8.0.3.
We are mounting the storage via ISCSI on a separate 100Gbit network.
There are 5 volumes each 2TB, shared over the iSCSI, on which is created an LVM.
The LXC/VMs are then created on the LVMs.

Most of the LXC containers are Debian. There are around 10-20 VMs that are Windows.
On the VMs, I notice are small spikes over the time as in the pictures below:
1691736176140.png


1691736481248.png

Sometimes the spikes are going up to 1GB read/write.

On the host itself, I cannot see this as a network traffic.
This is the host network traffic over the same period of time.
1691736264717.png

This means that the IO spikes are locally on the VMs.

What may cause this issue?
I don't know where to look next, I've tried tcpdump/Wireshark on the host (because the ISCSI is network-mounted),and I cannot corelate the spikes.

Any suggestions? I don't know if its because of the NAS or the Proxmox upgrade to v8.
On the NAS itself, I cannot notice any unusual spikes on the network or disks.
 
Hello everyone,

we have a cluster with 5 nodes and 140 LXC/VMs (around 90 running).
Recently we have migrated to a new NAS storage (TrueNAS with 3 x 3 vdevs, Raid Z1, 4TB disks).
Also, at the same time we have upgraded all Proxmox nodes to run on 8.0.3.
We are mounting the storage via ISCSI on a separate 100Gbit network.
There are 5 volumes each 2TB, shared over the iSCSI, on which is created an LVM.
The LXC/VMs are then created on the LVMs.

Most of the LXC containers are Debian. There are around 10-20 VMs that are Windows.
On the VMs, I notice are small spikes over the time as in the pictures below:
View attachment 54128


View attachment 54130

Sometimes the spikes are going up to 1GB read/write.

On the host itself, I cannot see this as a network traffic.
This is the host network traffic over the same period of time.
View attachment 54129

This means that the IO spikes are locally on the VMs.

What may cause this issue?
I don't know where to look next, I've tried tcpdump/Wireshark on the host (because the ISCSI is network-mounted),and I cannot corelate the spikes.

Any suggestions? I don't know if its because of the NAS or the Proxmox upgrade to v8.
On the NAS itself, I cannot notice any unusual spikes on the network or disks.
Hi,
caching on the host might influence the traffic behaviour, not all writes in the VM are passed trough to the physical disks directly, this depends on cache settings and if these are sync or async writes. See also https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_hard_disk_cache
 
All of them got a Default (No cache) policy. Is this a normal scenario?
Previously I haven't noticed any of these spikes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!