[SOLVED] Epic sos Proxmox 7.0.10

ieronymous

Active Member
Apr 1, 2019
251
18
38
44
Please I need some inside info, ASAP if possible.

Sorry if it has been written elsewhere but my anxiety cant let me think straight right now. After upgrading from 6.4.13 to 7.0.10 and keeping copies of my files
/etc/kernel/cmdline
/etc/default/grub
/etc/issue
/etc/modules (in case someone has passthrough mods)
/etc/ssh/ssh_config
/etc/ssh/sshd_config
ip address before and after (there it is a mesh)
cant login to proxmox (I have access to the cli though via the machine it self) and it doesnt have net access of course...probably by a picture I ve taken during boot
Failed to start Import ZFS pool HHproxData (see systemctl status zfs-import@HHproxData.service for details)

I tried loading at start both 5.11.22-2-pve and 5.4.124-1-pve with the same results of course no gui access and no net

What I ve checked so far (many irrelevant but then again I dint expect to have issues since 3 machines went smooth) on the specific one that is meant to replace the main server in a production level, I had every VM , connections , etc setup and after update cant do anything. The proxmox access is based on vmbr0 which is based on bond0 and bon0 is based on ports enp7s0 enp0s25. It had been configured in both sides proxmox-switch the specific ports as 802.3ad.

Below paths are the same
/etc/modules
/etc/issue

Below paths changed in some lines or not in a mandatory way, so I need to ask
/etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs
Even though I have in all other machines different name for the node, here before and after upgrade has the name pve-1 . Does here needs the name of the proxmox server I gave it during installation?

/etc/default/grub
Here changed the lines
from GRUB_DISTRIBUTOR="Proxmox Virtual Environment" to GRUB_DISTRIBUTOR="lsb_release -i -s 2s /dev/null || echo debian" Do I change it back?
line GRUB_DISABLE_OS_PROBER=true doesnt exist anymore. Do I switch it back?
line GRUB_DISABLE_RECOVERY="true" now is begin commented. Do I revert it back?

/etc/ssh/ssh_config
now includes the line Include/etc/ssh/ssh.config.d/*.conf It is probably there in order to take into consideration other custom files as well but
shall I comment this out?

/etc/ssh/sshd_config
also now includes the line Include/etc/ssh/ssh.config.d/*.conf It is probably there in order to take into consideration other custom files as well but shall I comment this out?

Finally the ip address before and after (I ll post only the after lines, since in <<<<my_comments>>>> you can see what it has been altered)
[/COLOR] 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: enp7s0: <BROADCAST,MULTICAST <<<<SLAVE is missing>>>>,UP,LOWER_UP> mtu 1500 qdisc mq <<<<master is missing>>>> <<<<bond0 is missing>>>> state UP group default qlen 1000 link/ether f8:b1:56:d1:25:cf <<<<now became f8:b1:56:d1:26:28>>>> brd ff:ff:ff:ff:ff:ff <<<<inet6 fe80::fab1:56ff:fed1:2628/64 scope link now exists this line>>>> <<<<valid_lft forever preferred_lft forever now exists this line>>>> 3: enp0s25: <BROADCAST,MULTICAST,<<<<SLAVE is missing>>>>,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast <<<<master bond0 not exists>>>> state UP group default qlen 1000 link/ether f8:b1:56:d1:25:cf brd ff:ff:ff:ff:ff:ff <<<<inet6 fe80::fab1:56ff:fed1:25cf/64 scope link now exists this line>>>> <<<<valid_lft forever preferred_lft forever now exists this line>>>> 4: enp131s0f0: <BROADCAST,MULTICAST,<<<<SLAVE is missing>>>>,UP,LOWER_UP> mtu 1500 qdisc mq <<<<master bond1 not exists>>>> state UP group default qlen 1000 link/ether 00:1b:21:25:fb:98 brd ff:ff:ff:ff:ff:ff <<<<inet6 fe80::21b:21ff:fe25:fb98/64 scope link now exists this line>>>> <<<<valid_lft forever preferred_lft forever now exists this line>>>> 5: enp131s0f1: <BROADCAST,MULTICAST,<<<<SLAVE is missing>>>>,UP,LOWER_UP> mtu 1500 qdisc mq <<<<master bond1 not exists>>>> state UP group default qlen 1000 link/ether 00:1b:21:25:fb:99 <<<<before it was 00:1b:21:25:fb:98>>>> brd ff:ff:ff:ff:ff:ff <<<<inet6 fe80::21b:21ff:fe25:fb99/64 scope link now exists this line>>>> <<<<valid_lft forever preferred_lft forever now exists this line>>>> 6: enp132s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq <<<<master bond2 not exists>>>> state UP group default qlen 1000 link/ether 00:1b:21:25:fb:99 <<<<before ti was 00:1b:21:25:fb:9c>>>> brd ff:ff:ff:ff:ff:ff <<<<inet6 fe80:21b:21ff:fe25:fb9c/64 scope link now exists this line>>>> <<<<valid_lft forever preferred_lft forever now exists this line>>>> 7: enp132s0f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq <<<<master bond2 not exists>>>> state UP group default qlen 1000 link/ether 00:1b:21:25:fb:9d <<<<before it was 00:1b:21:25:fb:9c>>>> brd ff:ff:ff:ff:ff:ff <<<<inet6 fe80:21b:21ff:fe25:fb9d/64 scope link now exists this line>>>> <<<<valid_lft forever preferred_lft forever now exists this line>>>> 8: bond0: <<<<<NO CARRIER exists now>>>>,BROADCAST,MULTICAST,MASTER,UP,<<<<LOWER_UP doesnt exist anymore>>>>> mtu 1500 qdisc noqueue master vmbr0 state <<<<UP now it is DOWN>>>> group default qlen 1000 link/ether 1e:b7:df:4e:4a:f0 brd ff:ff:ff:ff:ff:ff <<<before it was f8:b1:56:d1:25:cf brd ff:ff:ff:ff:ff:ff>>>> 9: bond1: <<<<NO CARRIER exists now>>>>,BROADCAST,MULTICAST,MASTER,UP,<<<<LOWER_UP doesnt exist anymore>>>> mtu 1500 qdisc noqueue master vmbr1 state <<<<UP now it is DOWN>>>> group default qlen 1000 link/ether 8a:81:b0:36:c4:82 brd ff:ff:ff:ff:ff:ff <<<<before it was 00:1b:21:25:fb:98 brd ff:ff:ff:ff:ff:ff>>>> 10: bond2: <<<<NO CARRIER exists now>>>>,BROADCAST,MULTICAST,MASTER,UP,LOWER_UP,<<<<LOWER_UP doesnt exist anymore>>>> mtu 1500 qdisc noqueue master vmbr2 state <<<<UP now it is DOWN>>>> group default qlen 1000 link/ether d2:0b:48:25:8e:eb brd ff:ff:ff:ff:ff:ff <<<<before it was 00:1b:21:25:fb:9c brd ff:ff:ff:ff:ff:ff>>>> 11: vmbr0: <<<<NO CARRIER exists now>>>>,BROADCAST,MULTICAST,UP,LOWER_UP,<<<<LOWER_UP doesnt exist anymore>>>>> mtu 1500 qdisc noqueue state <<<<UP now it is DOWN>>>> group default qlen 1000 link/ether 16:fc:21:59:fa:81 brd ff:ff:ff:ff:ff:ff <<<<before it was f8:b1:56:d1:25:cf brd ff:ff:ff:ff:ff:ff>>>> inet 192.168.1.201/24 brd 192.168.1.255 scope global vmbr0 valid_lft forever preferred_lft forever <<<<inet6 fe80::fab1:56ff:fed1:25cf/64 scope link now exists >>>> <<<<valid_lft forever preferred_lft forever now exists >>>> 12: vmbr1: <<<<NO CARRIER exists now>>>>,BROADCAST,MULTICAST,UP,LOWER_UP,<<<<LOWER_UP doesnt exist anymore>>>>> mtu 1500 qdisc noqueue state <<<<UP now it is DOWN>>>> group default qlen 1000 link/ether fa:33:04:ca:e5:a9 brd ff:ff:ff:ff:ff:ff <<<<before it was 00:1b:21:25:fb:98 brd ff:ff:ff:ff:ff:ff>>>> <<<<inet6 fe80::21b:21ff:fe25:fb98/64 scope link not exist anymore>>>> <<<<valid_lft forever preferred_lft forever not exists anymore>>>> 13: vmbr2: <<<<NO CARRIER exists now>>>>,BROADCAST,MULTICAST,UP,LOWER_UP,<<<<LOWER_UP doesnt exist anymore>>>>> mtu 1500 qdisc noqueue state <<<<UP now it is DOWN>>>> group default qlen 1000 link/ether 22:2f:dd:ec:5a:70 brd ff:ff:ff:ff:ff:ff <<<< before it was 00:1b:21:25:fb:9c brd ff:ff:ff:ff:ff:ff>>>> <<<<inet6 <<<<fe80::21b:21ff:fe25:fb9c/64 scope link not exists anymore>>>> <<<<valid_lft forever preferred_lft forever not exists anymore>>>> <<<<All below lines doesnt exist anymore>>>> 14: tap101i0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master fwbr101i0 state UNKNOWN group default qlen 1000 link/ether b2:78:b3:22:d8:de brd ff:ff:ff:ff:ff:ff 15: fwbr101i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 06:8b:6e:d2:62:db brd ff:ff:ff:ff:ff:ff 16: fwpr101p0@fwln101i0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr2 state UP group default qlen 1000 link/ether e2:b4:85:1d:42:c1 brd ff:ff:ff:ff:ff:ff 17: fwln101i0@fwpr101p0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master fwbr101i0 state UP group default qlen 1000 link/ether 06:8b:6e:d2:62:db brd ff:ff:ff:ff:ff:ff [COLOR=rgb(0, 0, 0)]

As you can see above, the ports name stayed the same, but in some ports the mac address changed in some bonds and vmbrs as well. In all bonds and vmbrs i have the message NO CARRIER and LOWER_UP doesnt exist anymore any some more which you can find by looking at the <<<<.........>>>>> messages of mine.

New Edit.... here is what the path /etc/network/interfaces looks like now
net_interface.jpg

I have 4+ hours trying, first to have access and then see what happened with that Failed to start Import ZFS pool HHproxData message which is also crucial.

Thank you in advance for any thoughts, guidance.

PS -Probably I should keep a copy of /etc/network/interfaces as well but now it is late for that :( :(
-During upgrade because I knew it wouldn t cause any problems (based on the 3 machines I did previously) I answered not to keep the current files but the maintainer's ones. So the answer was yes to all the 4-5 questions during the upgrade
-Do I need to change the mac addresses of those ports, bonds, vmbrs back to what it was..... is there a cli way (of course if there is, only a cli way would be).
 
Last edited:
new edit:!!!!! I went to /etc/network/interfaces and changed the vmbr0 not to be based on bond0 but to the port enp7s0, rebooted and now I have access to the gui. Thing is, if you see below, that except the vmbr0 which I changed it to have access to the gui, and prox to have net access in geenral, everything seems ok, as it was before the update. Ok the mac s changed but where here it uses macs to make actions like bonds and vmbrs? i dont know what to change in order to play as before.


1626435984673.png

Also nothing changed from the switch side
3 Trunks were used, each trunk has 2 ports and the trunk type is LACP with STP mode enabled. Everything still is as before
 
Last edited:
Failed to start Import ZFS pool HHproxData (see systemctl status zfs-import@HHproxData.service for details
As long as the pool HHproxData is imported after booting this can be ignored:
* PVE creates this service for each pool you create via GUI - since quite a few users ran into problems because their pools were not in the cache-file and thus did not get imported... - If the pool is ok and present after booting then it's already imported from the cache file and you need not worry about it.

Ok the mac s changed but where here it uses macs to make actions like bonds and vmbrs? i dont know what to change in order to play as before.

Check the upgrade instructions - there are quite a few hints regarding the new MacAddress policy and how to deal with that:
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Network

I hope this helps!
 
  • Like
Reactions: Moayad
This is also the outcome from plugging and unplugging the net cables from the switch

ports.jpg

After each plugin the enp7s0 comes up with a flow control status of RX
but the enp0s25 port comes up with Flow Control Status as None
 
Last edited:
This is also the outcome from plugging and unplugging the net cables from the switch
yes ? - this are the messages I'd expect if the cable gets unplugged and plugged back in.
 
As long as the pool HHproxData is imported after booting this can be ignored:
* PVE creates this service for each pool you create via GUI - since quite a few users ran into problems because their pools were not in the cache-file and thus did not get imported... - If the pool is ok and present after booting then it's already imported from the cache file and you need not worry about it.
Thank you, but until I managed to see the gui I wasnt sure if the pool gets imported after boot (probably I could check it from cli also but I am not thinking straight right now). Is there a way to stub that pool in a .conf file and make that message dissapear?
 
yes ? - this are the messages I'd expect if the cable gets unplugged and plugged back in.
probably yes (didn t get if you were ironic or not, don t care right now) but since those ports are included in bon0 and bond0 is included in vmbr0 and have been set up as LACP(802.3.ad) maybe the second port shouldn t state as None . I dont know, I just try to troubleshoot by error and trial here.
 
Last edited:
Check the upgrade instructions - there are quite a few hints regarding the new MacAddress policy and how to deal with that:
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Network

I hope this helps!
If you just know the how to, I would appreciate to tell me how to do it, instead of pointing me to general idea paragraphs.

<<<<<A unique and persistent MAC address is now calculated using the bridge name and the unique machine-id (/etc/machine-id), which is generated at install time. Ok nano'ed there and noticed a hexadecimal number now what?

Please either ensure that any ebtable or similar rules that use the previous bridge MAC-Address are updated (how to do that) or configure the desired bridge MAC-Address explicitly, by switching to ifupdown2 and adding hwaddress to the respective entry in /etc/network/interfaces.>>>>>
how is the syntax...all words no examples. See what I mean?

Once again even if it seems this way, I dont have an attitude, I am extremely stressed till morning. i had an appointment with the SQl programmer to fix the data base inside a Vm running the SQL Server itself and not only aint being able to see proxmox gui since the upgrade but network configuration broke as well.
 
path /etc/systemd/network/ is empty (mentions though Total 9)

Tried deleting both vmbr0 and bon0 creating them again, same thing no gui, no net this way
 
Last edited:
(didn t get if you were ironic or not, don t care right now)
sorry if I wasn't clear - what I wanted to say is that these messages are normal if you unplug/plug the cable.

Thank you, but until I managed to see the gui I wasnt sure if the pool gets imported after boot (probably I could check it from cli also but I am not thinking straight right now). Is there a way to stub that pool in a .conf file and make that message dissapear?
you can just remove the unit for importing the pool (`rm /etc/systemd/system/zfs-import@HHproxData.service`)

but the enp0s25 port comes up with Flow Control Status as None
I would not worry too much about that - I think it is related to the NIC driver and should work irrespective of the Flow Control settings
You can check what the nics support with `ethtool -a <nic>` (e.g. `ethtool -a enp7s0`)

Please either ensure that any ebtable or similar rules that use the previous bridge MAC-Address are updated (how to do that) or configure the desired bridge MAC-Address explicitly, by switching to ifupdown2 and adding hwaddress to the respective entry in /etc/network/interfaces.>>>>>
how is the syntax...all words no examples. See what I mean?
Read the complete HOWTO - there is an example a few section further up:
https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0#Check_Linux_Network_Bridge_MAC

Once again even if it seems this way, I dont have an attitude, I am extremely stressed till morning. i had an appointment with the SQl programmer to fix the data base inside a Vm running the SQL Server itself and not only aint being able to see proxmox gui since the upgrade but network configuration broke as well
Maybe you could let the system run without the bonds (simply put one of the NICs of the bond as switchport) for today - and check the situation on your next work day, when you're a bit less stressed?

path /etc/systemd/network/ is empty (mentions though Total 9)
PVE does not use systemd-networkd for interface configuration - but ifupdown2 - the network config is in '/etc/network/interfaces'


I hope this helps!
 
sorry if I wasn't clear - what I wanted to say is that these messages are normal if you unplug/plug the cable.
ok it is always nice to double check when you are at the edge of having tried everything.

you can just remove the unit for importing the pool (`rm /etc/systemd/system/zfs-import@HHproxData.service`)
Why would I want that? Isnt there a reason that it is trying to import it at first place? Are there any other units more important that imports the pool afterwards? which?

You can check what the nics support with `ethtool -a <nic>` (e.g. `ethtool -a enp7s0`)
Nice info to my custom troubleshoot guides

Maybe you could let the system run without the bonds (simply put one of the NICs of the bond as switchport) for today - and check the situation on your next work day, when you're a bit less stressed?
It would be my next step if I didnt find the solution as you ll see in my next post.

PVE does not use systemd-networkd for interface configuration - but ifupdown2 - the network config is in '/etc/network/interfaces'
Nice to know, somewhere I ve read a bug with ifupdown2, cant remember though what it was. ifupdown2 has its own conf file? path?
 
Founddddddddddddddddddddddddddddddddddd ittttttttttttttttttttttttttttttttttttttt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Sorry for the enthusiasm but I ve got extremely having to found it by myself.

The solution was inside the /etc/network/interfaces line (if I had a backup file to compare i would have known earlier!@!!!!!!@!@!!@)
What I had to do was to remove or comment the line above each port the
auto enp7s0 and enp0s25 Both those ports were part of the bond. So with auto they were trying to autoconfigure themselves and that was the problem of being in the bond with different configuration each (ok probably the explanation is this or similar or relevant but it works!!!!!!)

Of course even if it plays now, I don t know if it matters that bond0 and vmbr0 have above them the line auto bond0 and auto vmbr0....too tired to try that also New edit... examples of the official proxmox page for the network using bond and vmbr in the /etc/network/interfaces files, shows above bon and vmbr the lines auto bond and auto vmbr

Bottom line ..... for those out there preparing to do the upgrade (having weird network configurations) ... make a copy of the /etc/network/interfaces along with the other ones (you can read them above at my initial post)

@Stoiko Ivanov I am not editing this as solved because we have a conversation about other matters too. I will afterwards.

Thank you though!!!!!!!!!!!!
 
Last edited:
@Stoiko Ivanov in order to sum up i ncase you can help further with the below also ...................

1./etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs
Even though I have in all other machines different name for the node, here before and after upgrade has the name pve-1 .
Does here needs the name for the proxmox server which I gave it during installation?

2./etc/default/grub
Here upgrade changed the lines
from GRUB_DISTRIBUTOR="Proxmox Virtual Environment" to GRUB_DISTRIBUTOR="lsb_release -i -s 2s /dev/null || echo debian"
Do I change it back? Do you know what this line does?
line GRUB_DISABLE_OS_PROBER=true doesnt exist anymore. Do I switch it back? Do you know what this line does?
line GRUB_DISABLE_RECOVERY="true" now is begin commented. Do I revert it back? Do you know what this line does?

3.For the pool that doesnt get imported during boot I mentioned above as well <<<<<Failed to start Import ZFS pool HHproxData (see systemctl status zfs-import@HHproxData.service for details)>>>>>>
You suggested either to ignore it based on your answer
<<<As long as the pool HHproxData is imported after booting this can be ignored:
* PVE creates this service for each pool you create via GUI - since quite a few users ran into problems because their pools
were not in the cache-file and thus did not get imported... - If the pool is ok and present after booting
then it's already imported from the cache file and you need not worry about it.>>>>>
or to remove the unit for importing the pool (`rm /etc/systemd/system/zfs-import@HHproxData.service`)

I run the command systemctl status zfs-import@HHproxData.service which showed up
● zfs-import@HHproxData.service - Import ZFS pool HHproxData Loaded: loaded (/lib/systemd/system/zfs-import@.service; enabled; vendor preset: enabled) Active: failed (Result: exit-code) since Fri 2021-07-16 17:32:06 EEST; 22h ago Docs: man:zpool(8) Process: 2196 ExecStart=/sbin/zpool import -N -d /dev/disk/by-id -o cachefile=none HHproxData (code=exited, status=1/FAILURE) Main PID: 2196 (code=exited, status=1/FAILURE) CPU: 58ms Jul 16 17:32:05 HHnod systemd[1]: Starting Import ZFS pool HHproxData... Jul 16 17:32:06 HHnod zpool[2196]: cannot import 'HHproxData': no such pool available Jul 16 17:32:06 HHnod systemd[1]: zfs-import@HHproxData.service: Main process exited, code=exited, status=1/FAILURE Jul 16 17:32:06 HHnod systemd[1]: zfs-import@HHproxData.service: Failed with result 'exit-code'. Jul 16 17:32:06 HHnod systemd[1]: Failed to start Import ZFS pool HHproxData. ~
Any way to fix it instead of ignore it or delete it?
 
Even though I have in all other machines different name for the node, here before and after upgrade has the name pve-1 .
pve-1 is not the hostname of the machine but the name of the zfs dataset containing '/' - this is normal

2./etc/default/grub
Here upgrade changed the lines
check if the file `/etc/default/grub.d/proxmox-ve.cfg` exists - it should contain the necessary changes for PVE.
However - if you have /etc/kernel/cmdline - it sounds like the system is booted using systemd-boot +proxmox-boot-tool
see the reference documentation to find out how your system is actually booted:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_determine_bootloader_used

if you're using systemd-boot then the grub configs are not relevant for you

line GRUB_DISABLE_OS_PROBER=true doesnt exist anymore. Do I switch it back? Do you know what this line does?
line GRUB_DISABLE_RECOVERY="true" now is begin commented. Do I revert it back? Do you know what this line does?
the config-file for grub is best described in the reference manual: https://www.gnu.org/software/grub/manual/grub/grub.html#Simple-configuration
Jul 16 17:32:06 HHnod zpool[2196]: cannot import 'HHproxData': no such pool available
This sounds more like the disks are not yet recognized when the system is booting (and the pool gets imported later when the system is running
you can try to add a bit of root-delay as described in:
https://pve.proxmox.com/wiki/ZFS:_Tips_and_Tricks#Boot_fails_and_goes_into_busybox
(you need to add the rootdelay parameter in /etc/kernel/cmdline for systemd-boot and in /etc/default/grub for grub)

I hope this explains it!
 
  • Like
Reactions: ieronymous

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!