Updating OVS packages breaks network connectivity

Jun 30, 2020
23
1
8
We are running PVE 7.3.3 with OVS and ifupdown1 (through upgrades from older PVE versions).

During apt-upgrade of the current OVS packages
openvswitch-common: 2.15.0+ds1-2+deb11u1 ==> 2.15.0+ds1-2+deb11u2 openvswitch-switch: 2.15.0+ds1-2+deb11u1 ==> 2.15.0+ds1-2+deb11u2

the following error occurs:
ovs-vswitchd.service is a disabled or a static unit not running, not starting it.

...and in the same moment, the PVE server loses all its network connectivity. Causing instant trouble due to losing the SSH connection, and Ceph OSDs on that host having no more connections.

VMs on the affected server are still reachable via network, but stop to function as their Ceph storage is unresponsive.

Resorting to physical hands on in the server room, we find that the IP configuration was removed from all network interfaces.

Following the original error messages, we looked into the ovs-vswitchd.service, but to no avail. Instead ifdown -a ; ifup -a fixes the problem.

However, this shouldn't even happen in the first place.

We are now in the process of updating all our PVE servers with the physical-visit-to-the-server-room workaround. So we will not be able to easily reproduce this issue after today.

Though we are wondering: Are we the only ones experiencing this issue? Maybe an edge case due to our customized /etc/network/interfaces? Or is this a general issue you might want to be aware of?

Just in case of the latter, I'm writing this forum post.

regards,
Andreas
 
Are we the only ones experiencing this issue?
No, I've experienced the same behavior: all network connectivity down.

OVS is installed in my Test-Cluster as an ongoing experiment. I am not 100% sure if it really is installed and configured absolutely correct. So I didn't bother and just rebootet the Nodes via iDRAC/shell.

So this was no problem in my situation but I would have been really unhappy if something like this happens in my productive cluster (which uses the "classic" linux-bridges etc.)
 
No, I've experienced the same behavior: all network connectivity down.

Thank you for the confirmation.

OVS is installed in my Test-Cluster as an ongoing experiment. I am not 100% sure if it really is installed and configured absolutely correct.

I should add to my post above: We have been using OVS in our production PVE cluster since 2017. This is the first time we had an issue with it during updates.

Might be just this particular package version causing the issue. We hope this won't happen again on the next package update.
 
Hi.
The same problem with ovs :(
What's wrong with ovs?
Temporarily switched to linux bridge.
 
Last edited:
  • Like
Reactions: UdoB and Neobin
openvswitch-switch (and the other packages from the openvswitch source) version 2.15.0+ds1-2+deb11u2.1 is now available in the pvetest repository and contains the fix described in the debian bug-report

Feedback if the upgrade works smoothly for you would be much appreciated!

for the pvetest repository see: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_test_repo
repository managment is also available in the GUI - https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#_repositories_in_proxmox_ve
 
Feedback if the upgrade works smoothly for you would be much appreciated!
I am willing to sacrifice a running node in my test-cluster: how would test / trigger that behavior on purpose?

Code:
~# uname -a; dpkg -l |grep vswitch
Linux pm0 6.1.6-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.1.6-1 (2023-01-28T00:00Z) x86_64 GNU/Linux
ii  openvswitch-common                   2.15.0+ds1-2+deb11u2           amd64        Open vSwitch common components
ii  openvswitch-switch                   2.15.0+ds1-2+deb11u2           amd64        Open vSwitch switch implementations
 
I am willing to sacrifice a running node in my test-cluster: how would test / trigger that behavior on purpose?
Thanks!

a) simply upgrading to the version currently in pvetest would also help - if your ovs-bridges (thus probably network access for the pve-node) are still configured after the updated that would be good! (and please notify us if it does not!)

b) since the issue is only in version 2.15.0+ds1-2+deb11u2 (which you already have installed) - you'd need to downgrade to 2.15.0+ds1-2+deb11u1 :
Code:
apt install openvswitch-switch=2.15.0+ds1-2+deb11u1 openvswitch-common=2.15.0+ds1-2+deb11u1
and then run apt dist-upgrade (this is what brought the issues the last time to my knowledge, and this is what the new version should fix)
 
Downgrade via ssh:

Code:
root@pm0:~# apt install openvswitch-switch=2.15.0+ds1-2+deb11u1 openvswitch-common=2.15.0+ds1-2+deb11u1
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be DOWNGRADED:
  openvswitch-common openvswitch-switch
0 upgraded, 0 newly installed, 2 downgraded, 0 to remove and 0 not upgraded.
Need to get 1,828 kB of archives.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://ftp.de.debian.org/debian bullseye/main amd64 openvswitch-common amd64 2.15.0+ds1-2+deb11u1 [1,773 kB]
Get:2 http://ftp.de.debian.org/debian bullseye/main amd64 openvswitch-switch amd64 2.15.0+ds1-2+deb11u1 [54.6 kB]
Fetched 1,828 kB in 0s (10.9 MB/s)           
dpkg: warning: downgrading openvswitch-common from 2.15.0+ds1-2+deb11u2 to 2.15.0+ds1-2+deb11u1
(Reading database ... 83550 files and directories currently installed.)
Preparing to unpack .../openvswitch-common_2.15.0+ds1-2+deb11u1_amd64.deb ...
Unpacking openvswitch-common (2.15.0+ds1-2+deb11u1) over (2.15.0+ds1-2+deb11u2) ...
dpkg: warning: downgrading openvswitch-switch from 2.15.0+ds1-2+deb11u2 to 2.15.0+ds1-2+deb11u1
Preparing to unpack .../openvswitch-switch_2.15.0+ds1-2+deb11u1_amd64.deb ...
Unpacking openvswitch-switch (2.15.0+ds1-2+deb11u1) over (2.15.0+ds1-2+deb11u2) ...
Setting up openvswitch-common (2.15.0+ds1-2+deb11u1) ...
Setting up openvswitch-switch (2.15.0+ds1-2+deb11u1) ...
ovs-vswitchd.service is a disabled or a static unit not running, not starting it.

Terminal output stops here ;-)

"ifreload -a" via iDRAC resurrects that ssh-session. Now reboot. (Always takes some time, these are old Dell R720.) Verify state:

Code:
~# uname -a; dpkg -l |grep vswitch
Linux pm0 6.1.6-1-pve #1 SMP PREEMPT_DYNAMIC PVE 6.1.6-1 (2023-01-28T00:00Z) x86_64 GNU/Linux
ii  openvswitch-common                   2.15.0+ds1-2+deb11u1           amd64        Open vSwitch common components
ii  openvswitch-switch                   2.15.0+ds1-2+deb11u1           amd64        Open vSwitch switch implementations

Code:
~# apt dist-upgrade 
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  openvswitch-common openvswitch-switch
2 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 1,830 kB of archives.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] 
Get:1 http://security.debian.org bullseye-security/main amd64 openvswitch-common amd64 2.15.0+ds1-2+deb11u2 [1,775 kB]
Get:2 http://security.debian.org bullseye-security/main amd64 openvswitch-switch amd64 2.15.0+ds1-2+deb11u2 [54.7 kB]
Fetched 1,830 kB in 0s (7,724 kB/s)      
Reading changelogs... Done
(Reading database ... 83550 files and directories currently installed.)
Preparing to unpack .../openvswitch-common_2.15.0+ds1-2+deb11u2_amd64.deb ...
Unpacking openvswitch-common (2.15.0+ds1-2+deb11u2) over (2.15.0+ds1-2+deb11u1) ...
Preparing to unpack .../openvswitch-switch_2.15.0+ds1-2+deb11u2_amd64.deb ...
Unpacking openvswitch-switch (2.15.0+ds1-2+deb11u2) over (2.15.0+ds1-2+deb11u1) ...
Setting up openvswitch-common (2.15.0+ds1-2+deb11u2) ...
Setting up openvswitch-switch (2.15.0+ds1-2+deb11u2) ...
ovs-vswitchd.service is a disabled or a static unit not running, not starting it.

SSH stops here. Via iDRAC: all interfaces are down... "ifreload -a" via Console works.

So..., this didn't work for me.

For this problem the actual interface settings may be relevant: ~# grep -v \# /etc/network/interfaces

Code:
auto lo
iface lo inet loopback

auto enp3s0f0
iface enp3s0f0 inet static
        address 192.0.2.100/28

iface eno1 inet manual

iface eno2 inet manual

iface eno3 inet manual

iface eno4 inet manual

auto enp3s0f1
iface enp3s0f1 inet manual
        ovs_type OVSPort
        ovs_bridge vmbr0

auto vlan816
iface vlan816 inet static
        address 172.28.0.100/16
        gateway 172.28.255.254
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=816

auto vlan819
iface vlan819 inet static
        address 192.168.3.100/24
        ovs_type OVSIntPort
        ovs_bridge vmbr0
        ovs_options tag=819

auto vmbr0
iface vmbr0 inet manual
        ovs_type OVSBridge
        ovs_ports enp3s0f1 vlan816 vlan819
 
Get:1 http://security.debian.org bullseye-security/main amd64 openvswitch-common amd64 2.15.0+ds1-2+deb11u2 [1,775 kB] Get:2 http://security.debian.org bullseye-security/main amd64 openvswitch-switch amd64 2.15.0+ds1-2+deb11u2 [54.7 kB]
this is not the version from the pvetest repository
* do you have it enabled - did you run apt update before?
(sorry for not being a bit more explicit before that)
 
I have pve-test enabled, it is my Test-Cluster ;-)

Code:
root@pm0:~# grep proxmox /etc/apt/sources.list
deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription

And I was able to install 6.1:

Code:
root@pm0:~# apt policy pve-kernel-6.1
pve-kernel-6.1:
  Installed: 7.3-3
  Candidate: 7.3-3
  Version table:
 *** 7.3-3 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
        100 /var/lib/dpkg/status
     7.3-2 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages
     7.3-1 500
        500 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages

I have downgraded again; now I see this:

Code:
root@pm0:~# apt policy openvswitch-common
openvswitch-common:
  Installed: 2.15.0+ds1-2+deb11u1
  Candidate: 2.15.0+ds1-2+deb11u2
  Version table:
     2.15.0+ds1-2+deb11u2 500
        500 http://security.debian.org bullseye-security/main amd64 Packages
 *** 2.15.0+ds1-2+deb11u1 500
        500 http://ftp.de.debian.org/debian bullseye/main amd64 Packages
        100 /var/lib/dpkg/status

Only debian. I do not have a caching proxy or similar..., so what I am missing...?
 
Ooops - obviously my fault. Sorry.

Success!

Code:
~# apt install openvswitch-common openvswitch-switch
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be upgraded:
  openvswitch-common openvswitch-switch
2 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.
Need to get 0 B/1,828 kB of archives.
After this operation, 0 B of additional disk space will be used.
Reading changelogs... Done
(Reading database ... 83550 files and directories currently installed.)
Preparing to unpack .../openvswitch-common_2.15.0+ds1-2+deb11u2.1_amd64.deb ...
Unpacking openvswitch-common (2.15.0+ds1-2+deb11u2.1) over (2.15.0+ds1-2+deb11u1) ...
Preparing to unpack .../openvswitch-switch_2.15.0+ds1-2+deb11u2.1_amd64.deb ...
Unpacking openvswitch-switch (2.15.0+ds1-2+deb11u2.1) over (2.15.0+ds1-2+deb11u1) ...
Setting up openvswitch-common (2.15.0+ds1-2+deb11u2.1) ...
Setting up openvswitch-switch (2.15.0+ds1-2+deb11u2.1) ...
ovs-vswitchd.service is a disabled or a static unit not running, not starting it.
Processing triggers for man-db (2.9.4-2) ...
Processing triggers for libc-bin (2.31-13+deb11u5) ...
Scanning processes...                                                                                                                                                                                                                                                                                 
Scanning processor microcode...                                                                                                                                                                                                                                                                       
Scanning linux images...                                                                                                                                                                                                                                                                              

Running kernel seems to be up-to-date.

Failed to check for processor microcode upgrades.

No services need to be restarted.

No containers need to be restarted.

No user sessions are running outdated binaries.

Without a hickup. (But I will return to pve-no-subscription nevertheless.)
 
  • Like
Reactions: Stoiko Ivanov
Can confirm that enabling the test repository and updating openvswitch-common openvswitch-switch works as intended and resulted in no loss of access to the node.

I encountered this issue on my secondary node, and it required hands on to resolve, the terminal was spammed with a particular interface entering/exiting promiscuous mode over and over.

However, I noticed that after the upgrade, my SSH session would pause for a second or two, pings confirmed my suspicion of packet loss. Restarting the openvswitch-switch.service resolved that issue for me. Test repository disabled afterwards.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!