VMs Frozen and ssh broken pipe randomly

ryan.cph

New Member
Nov 17, 2024
2
0
1
Hi everyone,

I have an issue with VMs console randomly freezing (VNC) and ssh randomly broken pipe.
For VNC, I must refresh the page to re-connect, and for ssh I need to constantly reconnecting with `ssh` command. It was quite annoying...
Only VM frozen randomly, LXC working perfectly fine.

Hardware:
CPU: AMD Ryzen 7 2700
32 GB DDR4 RAM
M.2 SSD

I have spend days on solving this issue, if you have any ideas of this will be highly appreciated. Thanks.
 
Hello Ryan,

Is this issue only happening with one VM, or are other VMs affected as well? How about other VMs in the same network?

Did you check the output of the journalctl -u ssh.service command?

When did this issue start? Has it been occurring since the beginning, or did it start recently?

Respectfully,

Seiji
 
Hello Seiji,

Thanks for the reply. The issue is happening on all VMs in the same network. The command journalctl -u ssh.service looks perfectly normal to me.

Code:
Nov 19 21:38:16 media-server sshd-session[31795]: Accepted password for ryan from 192.168.1.126 port 57885 ssh2
Nov 19 21:38:16 media-server sshd-session[31795]: pam_unix(sshd:session): session opened for user ryan(uid=1000) by (uid=0)
Nov 19 21:38:16 media-server sshd-session[31795]: pam_systemd(sshd:session): New sd-bus connection (system-bus-pam-systemd-31795) opened.
Nov 19 21:40:42 media-server sshd-session[31833]: Accepted password for ryan from 192.168.1.126 port 59069 ssh2
Nov 19 21:40:42 media-server sshd-session[31833]: pam_unix(sshd:session): session opened for user ryan(uid=1000) by (uid=0)
Nov 19 21:40:42 media-server sshd-session[31833]: pam_systemd(sshd:session): New sd-bus connection (system-bus-pam-systemd-31833) opened.
lines 1-26/26 (END)

This issue started since day one, so I have been using LXC for weeks.

Best regards,
Ryan
 
Hello Ryan,

1. Could you try running journalctl -xe to check if there are any errors.

2. Could you run tcpdump to capture packets and investigate what might be happening:
-
Only SSH:

tcpdump -i vmbr0 tcp port 22 -w /tmp/ssh.pcap

Only host: 192.168.1.126:

tcpdump -i vmbr0 host 192.168.1.126 -w /tmp/vm126.pcap

Only VNC:

tcpdump -i vmbr0 tcp port 5900 -w /tmp/vnc.pcap

3. It seems unlikely that all VMs are encountering a bug simultaneously. I suspect there might be an issue elsewhere that's causing SSH/VNC to be inaccessible for all VMs.
Are there any specific configurations applied, such as changes to MTU settings?
4. How is the host's CPU, memory, or disk I/O utilization? Is the system under high load?
5. Does the same issue occur with a newly created VM?
6. During the issue, can you successfully SSH from a different network?
7. Have you tried updating Proxmox to the latest version?

Hope this helps.

Respectfully.

Seiji
 
Last edited: