VM Freeze Issue

Oct 12, 2023
1
0
1
We're having a VM freeze/CPU usage issue on our proxmox server.

We are running a program in the host OS, and another program inside a KVM virtual machine.

We send the output of the host program to the program in the VM using a host-only bridge interface. We send the data using multicast.

There is a high amount of traffic flowing from the host to the VM (around 600 mbps).

What I've noticed when the system is running. When I pull up 'top' on the host and go into 'thread' mode by hitting 'H'. I can see that the main thread for the KVM uses 99.9% of a core. The VM is receiving the network traffic and doing a lot of disk IO.

The issue that I'm having is once in a while the VM seems to lock up for a half second or so. When the VM comes back I get a CPU spike as it tries to make up for lost time. But during this outages, I also get packet loss on the network bridge interface.

My hunch is that the main KVM thread is trying to do too much, since it is using 99.9% of a core, and once in a while it causes the entire VM to lock up.

I have tried quite a few things to help:
- like giving the VM exclusive access to its cores (by locking the VM to specific cores, and set the host OS to not schedule onto those cores).
- using aio=io_uring and setting iothread=1 for disk access
- for network I've set multiqueue on and use the total number of cores I've assigned to the VM (22 cores).
- I'm using virtio for the GPU, scsi controller, and network devices.
- I tried both a OVS bridge and the default linux bridge.

So far none of these things has helped. I have been able to work around the issue by setting the txqueuelen on the bridge port to 4096 (with the command ip link set tap100i1 txqueuelen 4096). But this seems less than ideal.

Wondering if anyone has any ideas or suggestions on what may be going on, and if there is a way to get the main kvm thread to use less cpu.

Thanks for any help.
Mark S
 
We're having a VM freeze/CPU usage issue on our proxmox server.

We are running a program in the host OS, and another program inside a KVM virtual machine.

We send the output of the host program to the program in the VM using a host-only bridge interface. We send the data using multicast.

There is a high amount of traffic flowing from the host to the VM (around 600 mbps).

What I've noticed when the system is running. When I pull up 'top' on the host and go into 'thread' mode by hitting 'H'. I can see that the main thread for the KVM uses 99.9% of a core. The VM is receiving the network traffic and doing a lot of disk IO.

The issue that I'm having is once in a while the VM seems to lock up for a half second or so. When the VM comes back I get a CPU spike as it tries to make up for lost time. But during this outages, I also get packet loss on the network bridge interface.

My hunch is that the main KVM thread is trying to do too much, since it is using 99.9% of a core, and once in a while it causes the entire VM to lock up.

I have tried quite a few things to help:
- like giving the VM exclusive access to its cores (by locking the VM to specific cores, and set the host OS to not schedule onto those cores).
- using aio=io_uring and setting iothread=1 for disk access
- for network I've set multiqueue on and use the total number of cores I've assigned to the VM (22 cores).
- I'm using virtio for the GPU, scsi controller, and network devices.
- I tried both a OVS bridge and the default linux bridge.

So far none of these things has helped. I have been able to work around the issue by setting the txqueuelen on the bridge port to 4096 (with the command ip link set tap100i1 txqueuelen 4096). But this seems less than ideal.

Wondering if anyone has any ideas or suggestions on what may be going on, and if there is a way to get the main kvm thread to use less cpu.

Thanks for any help.
Mark S
This will not be much helpful I am afraid, but as you mention your issue seems the process feeding the VM is literally overwhelming it with data flow, did you consider limiting the processing that is spewing out all the data so that the VM can be keeping up?

Of course I will not be going saying the originating process would be best contained itself because I suspect you have a reason for all that with multicast, etc.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!