While PVE Host Works Normally
Problem Description
Hello fellow DevOps and sysadmin experts, I’ve run into a puzzling SSH connection issue while deploying Debian 12 cloud image virtual machines on the Proxmox VE (PVE) virtualization platform, and I’m reaching out for advice:- Phenomenon Discrepancy
- For newly created Debian 12 cloud image VMs, the first attempt to connect via SSH will directly return a connect refused error in the terminal. A successful connection can only be established after multiple retries (with 1-2 minute intervals between attempts).
- In contrast, when initiating an SSH connection to the PVE physical host itself (or to older VMs that have been running for some time) from the same client, the connection succeeds on the first try without ever encountering a connection refusal.
- Environment Details
- Network layer: The VMs and the PVE host are on the same internal network segment (10.0.0.0/24), using PfSense as the gateway. The firewall has allowed traffic on SSH port (22), and the VMs can successfully ping both the gateway and the PVE host.
- VM configuration: Official Debian 12 cloud images are used, with static IPs configured. The sshd service inside the VMs is enabled (the systemctl status sshd command shows the service is in an active state).
- Client side: SSH requests are initiated from the same admin host, with no port restrictions or IP blacklisting policies in place.
Troubleshooting Steps Already Performed
- Checked the status of the sshd service on the VMs: Confirmed the service is running, set to start on boot, and has no error logs.
- Verified port listening: Executed ss -tulpn | grep 22 inside the VMs and confirmed the sshd process is listening normally on 0.0.0.0:22.
- Tested loopback SSH on the VMs: Successfully connected via ssh localhost within the VM, ruling out sshd configuration anomalies.
- Retried after clearing the ARP cache on PfSense, but the problem persisted.
- Compared with the PVE host: The sshd configuration on the PVE host is identical to that of the VMs, with no connection refusal issues.
Questions
- Why does the connect refused error only occur on the first SSH attempt to a new VM, while the PVE host has no such issue?
- Could this be a network initialization delay at the PVE virtualization layer, or a problem with the sshd service startup sequence in the linux?
- Is there any way to optimize the VM’s network or service startup configuration to avoid the first SSH connection failure?