Guest VM hang/stalled

Paulo Maligaya

New Member
Jul 23, 2016
18
0
1
40
Hi, I've uncovered an issue today with my Proxmox setup. So I have a cluster with two nodes. The primary node (proxmox01), and the secondary node (proxmox02). I have guest VMs running on both nodes. However, the problem is everytime I ssh into the guest VM from the secondary node (proxmox02), and run a simple command like -- ps -aux, top, open a file (vi), ifconfig the ssh session will hang.

example.

root@vps200:~# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.8 0.1 28684 4704 ? Ss 03:46 0:00 /sbin/init
root 2 0.0 0.0 0 0 ? S 03:46 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S 03:46 0:00 [ksoftirqd/0]

It will just hang and the ssh session froze. can't even terminate the ssh session (invoking ~).

Initially I thought this is something about MTU size issue. However, I had the same MTU size (1410) on my primary node (proxmox01) and all guest VMs there doesn't the same issue. So I think it is safe to rule out MTU issue here, and perhaps something else.

Can somebody help? TIA!
 
From the WebGUI of your second node proxmox02, start a local console session in the VM and tries to find out what happens:
* is the network of the VM down ?
* is the ssh session still active ( who )
* did the ssh daemon forcefully disconnected the ssh client ? ( /var/log/auth.log )
 
From the WebGUI of your second node proxmox02, start a local console session in the VM and tries to find out what happens:
* is the network of the VM down ?
* is the ssh session still active ( who )
* did the ssh daemon forcefully disconnected the ssh client ? ( /var/log/auth.log )

@manu Thanks for your response! as for the answers:

1. the network of the VM is up, I'm actually able to ssh to the VM/VMs and able to ping external network (e.g. google.com) from the console of the VM

2. Yes, the ssh session is still active when I ran "w" in the console. behind of the console window you'll the "ifconfig" command was hanging/stalled (refer to ssh_session_hung.png screenshot).

3. It didn't. Well, for the first couple of minutes. Then I think it will just timed out, and the hung session will just get disconnected after a while.

Do you have any other clue that probably the cause of this issue? Unfortunately, I really need to get this working. Thanks!
 

Attachments

  • Screenshot - ssh_session_hung.png
    Screenshot - ssh_session_hung.png
    138.4 KB · Views: 1
Hey so we found the issue. It seems like the weave bridge had a max MTU of 1410 while the VMs had a max of 1500. A quick test/fix to prove this was it was to change the VM MTU to 1410 and try to ssh and run commands that produced large outputs. Everything was fine after that.
 
Is the default weave bridge MTU set to 1410 ? Then it means other users could have this problem.
 
@manu Yes, weave defaults it's MTU to 1410 according to this doc. The way we worked around this was to is setting the default Proxmox MTU size to 8900, as we support this, and then set the default Weave MTU to 1500.
According to @Paulo Maligaya If we set the MTU size of weave to the same size as Proxmox, 1500, it would complain and break the cluster.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!