Proxmox 4 / KVM / Network Connectivity issues

LeeS

New Member
Mar 30, 2015
17
0
1
I volunteer for a small charity that provides hospital radio, and I'm in the process of upgrading their infrastructure to something more this decade than last. I'm coming from relatively good experience on PVE <= 3.4, but 4.0 seems to be beating me. This should "just work"? But it doesn't... and I've been banging my head on the desk for the last 4 hours trying various things. Any help appreciated.

All of the network is on 192.168.0.0/24. Client PC's, Proxmox host, and VM's. All connected via a Cisco Gigabit switch (managed, but all management 'off' ... just in dumb unmanaged switch mode). The router for the subnet is 192.168.0.1 out to the internet. eth0 and eth1 are bonded, and this works fine, too.

cat /proc/net/bonding/bond0

Code:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)


Bonding Mode: load balancing (round-robin)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0


Slave Interface: eth0
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:23:7d:5b:ae:ae
Slave queue ID: 0


Slave Interface: eth1
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:23:7d:5b:ae:ac
Slave queue ID: 0



pveversion -v
Code:
proxmox-ve: 4.0-16 (running kernel: 4.2.2-1-pve)
pve-manager: 4.0-48 (running version: 4.0-48/0d8559d0)
pve-kernel-4.2.2-1-pve: 4.2.2-16
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-22
qemu-server: 4.0-30
pve-firmware: 1.1-7
libpve-common-perl: 4.0-29
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-25
pve-libspice-server1: 0.12.5-1
vncterm: 1.2-1
pve-qemu-kvm: 2.4-9
pve-container: 1.0-6
pve-firewall: 2.0-12
pve-ha-manager: 1.0-9
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.3-1
lxcfs: 0.9-pve2
cgmanager: 0.37-pve2
criu: 1.6.0-1
zfsutils: 0.6.5-pve4~jessie


/etc/network/interfaces
Code:
auto lo
iface lo inet loopback


#auto eth0
#iface eth0 inet manual




#auto eth1
#iface eth1 inet manual






auto bond0
iface bond0 inet manual
        slaves eth0 eth1
        bond_miimon 100
        bond-mode balance-rr
#       bond-lacp-rate 0




auto vmbr0
iface vmbr0 inet static
        address  192.168.0.210
        netmask  255.255.255.0
        gateway  192.168.0.1
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0

Proxmox itself (192.168.0.210): Can access the local network and internet, and can ping all VM's.
OpenMediaVault KVM (192.168.0.213): as above.
lxc LEMP stack (192.168.0.212): as above.
Windows 7 Pro (192.168.0.100): No local network, no internet. Can ping itself and Proxmox IP only. Also cannot ping other VM's.

What magic am I missing here? The OpenMediaVault install went flawlessly, and the same details were provided for the Windows 7 VM (excepting the IP address of course), and that isn't working at all.

Windows 7 is using virtio, just as the OpenMediaVault install is. Drivers installed and working fine (allegedly). Even tried switching to E1000 just to disprove to myself that it wasn't driver-related, but no change.

Any help really, really, appreciated :(

Proxmox to router:
Code:
root@proxmox0:/etc/vz# ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=1.54 ms
64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=0.892 ms
64 bytes from 192.168.0.1: icmp_seq=3 ttl=64 time=0.897 ms
64 bytes from 192.168.0.1: icmp_seq=4 ttl=64 time=0.918 ms
^C
--- 192.168.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2998ms
rtt min/avg/max/mdev = 0.892/1.062/1.544/0.280 ms

OpenMediaVault KVM to router:
Code:
root@fileserver:~# ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
64 bytes from 192.168.0.1: icmp_req=1 ttl=64 time=1.60 ms
64 bytes from 192.168.0.1: icmp_req=2 ttl=64 time=1.07 ms
64 bytes from 192.168.0.1: icmp_req=3 ttl=64 time=1.06 ms
64 bytes from 192.168.0.1: icmp_req=4 ttl=64 time=1.10 ms
^C
--- 192.168.0.1 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 1.061/1.212/1.605/0.229 ms



Windows to router:
No easy way to copy and paste this one but essentially 100% packet loss with 192.168.0.100: Destination host unreachable.

Yet pinging the Proxmox host or itself works fine, but no other VM's, or anything else on the network or the internet.
 
Last edited:
Is there a release note for the latest 4.0 version to know what are the fixes ...
Thank you.
 
Can you please let us know when the ISO file will be updated to the latest 4.0.
The current version is:
4.0-0d8559d0-17
 
there is no plan to update the 4.0 ISO, instead there will be a 4.1 (End of Q4 2015)
 
without digging deeper, you run a quite "old" version, upgrade to latest.http://pve.proxmox.com/wiki/Downloads#Update_a_running_Proxmox_Virtual_Environment_4.x_to_latest_4.0there were fixes regarding bonding.
Tom, sorry for the slow reply, I was driving home. Thank you for at least confirming that nothing 'major' has changed and what I thought should happen, should happen! I've been suspecting the bond interface for a couple of those 4 hours of head-banging, but quite what laid my suspicion on it I couldn't say now. My next step then is to approach the charity for the funds to buy the license. I tried the update/upgrade/dist-upgrade, but didn't get anything besides a few kerberos packages that were updated a few days ago. I'll keep you posted, and if you have any more brainwaves in the meantime please post :) it's for a good cause! :pEdit to add: Forgot about the no-subscription repo. Been ages since I've done something on Proxmox with no subscription. As a point of "did it or didn't it get fixed" I could try that tomorrow. Don't worry, still bugging the charity for the funds. This needs to be rock-solid come production time.
 
Last edited:
dist-upgrade done. Versions now:

Code:
root@proxmox0:~# pveversion -vproxmox-ve: 4.0-22 (running kernel: 4.2.3-2-pve)
pve-manager: 4.0-57 (running version: 4.0-57/cc7c2b53)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve1
corosync-pve: 2.3.5-1
libqb0: 0.17.2-1
pve-cluster: 4.0-24
qemu-server: 4.0-35
pve-firmware: 1.1-7
libpve-common-perl: 4.0-36
libpve-access-control: 4.0-9
libpve-storage-perl: 4.0-29
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-12
pve-container: 1.0-21
pve-firewall: 2.0-13
pve-ha-manager: 1.0-13
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.4-3
lxcfs: 0.10-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve6~jessie

Just by way of an update, I came in this morning to find the VM in question had done Windows Update overnight... which confused me entirely, to say the least.

It appears from some testing done with other KVM VM's and a quick comparison to the Windows one, that there is a noticeable delay under Proxmox 4 for KVM machines to get bridged to the real network? I just didn't sit still long enough yesterday for it to 'get going'. The OpenMediaVault cannot ping the router until about 2 minutes after boot. The Windows VM can't ping the router until about 10+ minutes after boot. When it does finally happen, it's solid for the remainder of the uptime (from very early testing).

Any theories? I'm stumped. Obviously not the game-stopping disaster I initially thought, but still concerning if we have to create a 20+ minute window for reboots rather than 10 minutes for the DL380 itself and to start all containers. LXC containers get their network inside and out instantly.