Proxmox VE 5.0 released!

BloodyIron · Jul 6, 2017

Another bit of info.

Prod node 2, upgraded from 4.4 to 5.0, keeps the old eth0/eth1 interface naming, and it has the /etc/udev/rules.d/70-persistent-net.rules file

Prod node 1, reinstalled from scratch 5.0 release, has the "new" renaming of interfaces to enp4s0 or whatever, and I manually created the /etc/udev/rules.d/70-persistent-net.rules file to match the relevant MAC address and other info

However, Prod node 1 does not honour the udev file. My reading suggests I need to add GRUB_CMDLINE_LINUX_DEFAULT="net.ifnames=0" to grub, however Prod node 2 does NOT have this in their grub conf file, so I am very confused as to how Prod node 2 is doing what I want, without having the configuration I would expect.

I have been trapsing across the config files for quite a while now, and I'm going to hold off on adding the grub flag, but I'd like to have some dev direction as to how this was handled, as I would want both nodes to be consistently configured.

BloodyIron · Jul 6, 2017

Another update to my quest for GLORY!

Turns out the Prod node 1, which was a install from scratch 5.0 in our last episode, had a few packages to update. I updated them, and now have the bond0 LACP working. I don't think I did anything special. Apt update, apt upgrade, install presented packages (not from enterprise repo btw). And configured the bond0 and such identical to Prod node 2.

The logs are not throwing the address errors any more, the switch ports report matching configuration with Prod node 2, and I was able to successfully migrate a test VM on AND off it.

It looks like it's working, except I would like to eventually get the interface naming back to eth0/eth1 stuff for consistency purposes, but I don't yet understand how Prod node 2 _can_ do that without having that grub flag.

This has been some adventure so far :S

PigLover · Jul 6, 2017

BloodyIron said:
Okay so the node I upgraded from 4.4 to 5.0 has ifconfig

but the node I rebuilt from scratch 5.0 does not have ifconfig... what.. the hell...

EDIT: fresh installs do not get the package "net-tools", but upgraded ones retain it. This is how I got my precious ifconfig back.

This isn't really a Proxmox issue - its Debian. ifconfig (and the rest of net-tools) has been deprecated in Stretch. The Debian community is trying to force the transition to new tools (ip, iw, etc). They are still in the repos and can be installed with apt (as you noted) but are not installed by default anymore.

gsupp · Jul 6, 2017

BloodyIron said:
has the "new" renaming of interfaces to enp4s0

Yeah, there seems to be a lot of changes in Debian Stretch like that. I have a server that has interfaces enp3s0, enp5s0 and enp11s0. Talk about confusing, it's not even sequential. If it wasn't for Proxmox allowing me to put a comment for each network interface in the GUI, I'd never keep them straight. Losing "ifconfig" is another one. I've been trying to get used to using "ip a" (hey it's shorter!) but I'm not a huge fan of the way it displays info. At any rate, glad updating all system packages solved your LACP issue. I believe I read somewhere that "apt dist-upgrade" was the preferred way to make sure all the Proxmox packages are updated...but I'd probably have to dig a bit to find where I saw that.

PigLover · Jul 6, 2017

The NIC naming is also more of a Debian issue than a Proxmox issue (actually, more of a "mainline Linux" issue since it is currently being adopted by most all major distributions as they get onto the 4.x kernel train).

I actually got through the "new" naming convention for network interfaces last year with Ubuntu 16.04. It is confusing at first, but the reason they did it is sound. Having deterministic and predictable interface names that aren't re-evaluated on every boot is really important. And the old method of setting up a udev rule to match the MAC was a bit crude - it broke if you had to replace the NIC for maintenance or move the image to a different, though identically configured, server. The "new" method results in stable interface naming across both of those - and many other - real world scenarios.

BloodyIron · Jul 6, 2017

From what I've been reading the naming is actually a systemd thing, not kernel or debian specific. Hence 16.04 is systemd

BloodyIron · Jul 7, 2017

Okay after a few hours or something the "received packet on bond0 with own address as source address" error is coming back up again. This is really frustrating and I'm just going to disable the bonding until I get some dev response here :/

I have absolutely no clue what the root cause is, but I see no evidence that it's the switch.

Perhaps, actually, the issue might arise when I put a large amount of load over the LACP, like migrating 8 VMs, but done 3 at a time. That seems to be a consistent possible cause.

mir · Jul 7, 2017

BloodyIron said:
"received packet on bond0 with own address as source address"

Could also be a switch problem. Bonding works by replacing original MAC in the package by the MAC from the actual nic used for the transport.

BloodyIron · Jul 7, 2017

I don't mean to be rude, but have you read through all of my prior messages? I've been very exhaustive in my testing, and I have performed a good amount of switch-centric testing. If you haven't had a chance to review what I wrote, please do, and share your thoughts.

mir said:
Could also be a switch problem. Bonding works by replacing original MAC in the package by the MAC from the actual nic used for the transport.

mir · Jul 7, 2017

BloodyIron said:
I don't mean to be rude, but have you read through all of my prior messages? I've been very exhaustive in my testing, and I have performed a good amount of switch-centric testing. If you haven't had a chance to review what I wrote, please do, and share your thoughts.

I was referring to this:
"Perhaps, actually, the issue might arise when I put a large amount of load over the LACP, like migrating 8 VMs, but done 3 at a time. That seems to be a consistent possible cause."
It's a know fact that if you stress a SOHO switch is will begin to broadcast packages on every port in a LAGG. This is also a fact for enterprise switches if your LAGG is configured on the default VLAN (VLAN 1). The default VLAN in any switch is software supported so never use the default VLAN for anything except perhaps management VLAN.

BloodyIron · Jul 7, 2017

Yeah, I'm using an Avaya 4548GT, and the "Prod" node 2 is also using LACP in literally the exact same configuration, and has not failed once. This single node is the consistent failure point. When doing the live migration of this many VMs, it's coming from Prod 2, to Prod 1, and Prod 2 is on the same switch with the same port configuration.

Also consider this was working _before_ the 5.0 upgrade when both were on 4.4, with the same switch.

mir said:
I was referring to this:
"Perhaps, actually, the issue might arise when I put a large amount of load over the LACP, like migrating 8 VMs, but done 3 at a time. That seems to be a consistent possible cause."
It's a know fact that if you stress a SOHO switch is will begin to broadcast packages on every port in a LAGG. This is also a fact for enterprise switches if your LAGG is configured on the default VLAN (VLAN 1). The default VLAN in any switch is software supported so never use the default VLAN for anything except perhaps management VLAN.

Leo David · Jul 7, 2017

Hi.
I'm using pve 4.3, and an external ceph Jewel storage. Can i safely upgrade to pve 5.0 ?

wolfgang · Jul 7, 2017

Leo David said:
Hi.
I'm using pve 4.3, and an external ceph Jewel storage. Can i safely upgrade to pve 5.0 ?

~~If you use this as production system I would wait for the upgrade until ceph is no longer RC.~~
Sorry I over read the external

Default we use jewel librbd what work perfect with external ceph cluster.

joulester · Jul 7, 2017

Do I still use the "sed -i 's/jessie/stretch/g' /etc/apt/sources.list.d/pve-enterprise.list" if I dont have a subscription?

Leo David · Jul 7, 2017

wolfgang said:
If you use this as production system I would wait for the upgrade until ceph is no longer RC.

Thanks. So is not recomended to use PVE 5 with an external Ceph below Luminous ?
I've just reinstalled Ceph Jewel in production, and thought that I can benefit PVE 5 features with this new ceph cluster...

Rhinox · Jul 7, 2017

BloodyIron said:
From what I've been reading the naming is actually a systemd thing, not kernel or debian specific. Hence 16.04 is systemd

More exactly, it is udev-thing. Even distros without systemd use these new "predictable network interface names"...

martin · Jul 7, 2017

Thanks a lot for all the feedback!

As this thread is filled up with too many topics to a lot different issues, its too hard to follow and I close it for further posting.

For all further questions and issues, please just open new threads!

Martin

Search

Search

Proxmox VE 5.0 released!

BloodyIron

Renowned Member

BloodyIron

Renowned Member

PigLover

Renowned Member

gsupp

Well-Known Member

PigLover

Renowned Member

BloodyIron

Renowned Member

BloodyIron

Renowned Member

mir

Famous Member

BloodyIron

Renowned Member

mir

Famous Member

BloodyIron

Renowned Member

Leo David

Well-Known Member

wolfgang

Proxmox Retired Staff

joulester

Renowned Member

Leo David

Well-Known Member

Rhinox

Active Member

martin

Proxmox Staff Member

We value your privacy