Proxmox VE 8.0 (beta) released!

JasonJoel · Jun 16, 2023

EDIT: Disregard. Works as designed, it was user error.

Clean ISO install.

Couldn't get ceph to install. No matter what repositories I added/removed in the GUI it always tried to install from the enterprise repository.

Finally just did "pveceph install -repository no-subscription" from the CLI, which worked.

t.lamprecht · Jun 16, 2023

Dizel958 said:
I install the driver with the following commands:
chmod +x ./NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run
./NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run --dkms

And here are the error messages he writes to me:

ERROR: An error occurred while performing the step: "Building kernel modules". See /var/log/nvidia-installer.log
for details.

ERROR: An error occurred while performing the step: "Checking to see whether the nvidia-vgpu-vfio kernel module
was successfully built". See /var/log/nvidia-installer.log for details.

ERROR: The nvidia-vgpu-vfio kernel module was not created.

ERROR: Installation has failed. Please see the file '/var/log/nvidia-installer.log' for details. You may find
suggestions on fixing installation problems in the README available on the Linux driver download page at
www.nvidia.com.

What could be the problem ?
Мy system and config:

See https://pve.proxmox.com/wiki/Upgrade_from_7_to_8#NVIDIA_vGPU_Compatibility

Essentially, NVIDIA could not be bothered to get their drivers working on 5.16+ kernel, while we tried to backport a patch for restoring compatibility with their out-of-tree mdev approach, namely a patch proposed by an NVIDIA employer themselves, it alone wasn't seemingly enough (our local tests with this fail too), we asked in a reply to that patch on the Ubuntu kernel mailing list about how it should work with just this patch, but so far we got no reply.

As the driver is proprietary we cannot really do more for now, so as the upgrade guide (and pve7to8 checker script) mentions, people relying on NVIDIA SRIOV vGPU should stay at Proxmox VE 7 with the 5.15 kernel for the moment being, let's hope this gets resolved by NVIDIA (or the FOSS community has finished doing the heavy lifting again for that PITA of a company, and the new open source driver in work for NVIDIA is done and supports this too).

t.lamprecht · Jun 16, 2023

Ramalama said:
However, since the update today to 6.2.16-2 fixed all my issues, lol.
Not sure what you changed and if you actually merged some fixes related to that or not,
but i just rebooted to be sure 5x into 6.2.16-1 and 6.2.16-2 and its definitively the kernel issue, lol

I mean it's great to hear that it works now for you, but that update was rather minor (newer ZFS and two small backports).
The only potential relevant change could be the backported mdev patch where we (or better said NVIDIA) tried to restore compatibility with their driver.

t.lamprecht · Jun 16, 2023

JasonJoel said:
Clean ISO install.

Couldn't get ceph to install. No matter what repositories I added/removed in the GUI it always tried to install from the enterprise repository.

Finally just did "pveceph install -repository no-subscription" from the CLI, which worked.

W.r.t. adding/removing you mean the Repository GUI or the for Ceph specific installation wizard?
I.e., what are you actually doing, as while I was sure QA tested this for the beta release, I just did a completely fresh installation to be sure and successfully used our ceph wizard with the no-subscription repository to install Ceph:

Dizel958 · Jun 16, 2023

t.lamprecht said:
See https://pve.proxmox.com/wiki/Upgrade_from_7_to_8#NVIDIA_vGPU_Compatibility

Essentially, NVIDIA could not be bothered to get their drivers working on 5.16+ kernel, while we tried to backport a patch for restoring compatibility with their out-of-tree mdev approach, namely a patch proposed by an NVIDIA employer themselves, it alone wasn't seemingly enough (our local tests with this fail too), we asked in a reply to that patch on the Ubuntu kernel mailing list about how it should work with just this patch, but so far we got no reply.

As the driver is proprietary we cannot really do more for now, so as the upgrade guide (and pve7to8 checker script) mentions, people relying on NVIDIA SRIOV vGPU should stay at Proxmox VE 7 with the 5.15 kernel for the moment being, let's hope this gets resolved by NVIDIA (or the FOSS community has finished doing the heavy lifting again for that PITA of a company, and the new open source driver in work for NVIDIA is done and supports this too).

So the problem is in the driver itself, it simply does not support the 5.16+ kernel
I did a test:
This driver is not installed on the host - "NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run"
This driver is installed on the host - "NVIDIA-Linux-x86_64-525.105.17-grid.run"
It turns out "NVIDIA-Linux-x86_64-525.105.17-grid.run" supports kernel 5.16+

dcsapak · Jun 16, 2023

Dizel958 said:
So the problem is in the driver itself, it simply does not support the 5.16+ kernel
I did a test:
This driver is not installed on the host - "NVIDIA-Linux-x86_64-525.105.14-vgpu-kvm.run"
This driver is installed on the host - "NVIDIA-Linux-x86_64-525.105.17-grid.run"
It turns out "NVIDIA-Linux-x86_64-525.105.17-grid.run" supports kernel 5.16+

but the '-grid.run' driver is the guest driver, not the host one. if you want to split the card into vgpus you need to install the '-vgpu-kvm.run' driver on the pve host, which is currently not possible with our 6.2 kernel

tuxis · Jun 16, 2023

I upgraded one node on our test-cluster, and tried migrating machines on a PVE7-node to the upgraded node, that fails with:

Code:

2023-06-16 11:25:33 ERROR: migration aborted (duration 00:00:00): internal error: cannot check version of invalid string '8.0.0~8' at /usr/share/perl5/PVE/QemuServer/Helpers.pm line 186.

Bugreport: https://bugzilla.proxmox.com/show_bug.cgi?id=4784

RolandK · Jun 16, 2023

i'm installing debian 12 bookworm for the first time (graphical setup) and gave it a try in proxmox 8 beta

i was out for lunch and the installation screen was at partitioning dialog, now returning back and it seems it has redraw-issues because of screensaver kicking in. if i move the mouse, the elements re-appear, but the screen does not redraw as a whole

not sure if this is a qemu/proxmox or debian12 issue, though.

did somebody also come across this or can reproduce that ?

Neobin · Jun 16, 2023

RolandK said:
did somebody also come across this or can reproduce that ?

https://forum.proxmox.com/threads/t...10-minutes-and-doesnt-wake-up-properly.128590
(But without solution (yet) (and maybe also not PVE 8 specific).)

RolandK · Jun 16, 2023

yes, that looks _very_ similar, but that seems to be the proxmox installer in older proxmox/qemu and not debian bookworm installer

Neobin · Jun 16, 2023

RolandK said:
but that seems to be the proxmox installer in older proxmox/qemu and not debian bookworm installer

Sure, but maybe it has nothing to do with the guest OS in general and/or its version and/or the PVE-host version?

Did you test with other guest OSs and versions on PVE 8?
Did you test with Debian 12 on PVE 7?

npestana · Jun 16, 2023

I noticed the Cluster logs tab have two login lines separated by two second (maybe one for TOTP?) on each login. Also, after 15 minutes, another login line is added to the log. Could it be that a token have less than 15 minutes expire time and new one is generated?

Another thing, I have noticed the IPv6 firewall rules have this ipset reference: PVEFW-0-management-v6. Do you add any management IP rule into it like you do for IPv4, or, is it empty for the time being?

RolandK · Jun 16, 2023

Neobin said:
Sure, but maybe it has nothing to do with the guest OS in general and/or its version and/or the PVE-host version?

Did you test with other guest OSs and versions on PVE 8?

Did you test with Debian 12 on PVE 7?

ok, it also happens with pve 7.4, but not on real hardware.

can somebody confirm/test ? i would open a bugreport then.

JasonJoel · Jun 16, 2023

t.lamprecht said:
W.r.t. adding/removing you mean the Repository GUI or the for Ceph specific installation wizard?
I.e., what are you actually doing, as while I was sure QA tested this for the beta release, I just did a completely fresh installation to be sure and successfully used our ceph wizard with the no-subscription repository to install Ceph:

Sorry, I could have been more clear.

I did a clean install from the iso. Added the node to my existing cluster. Then went to install ceph from the web gui (without changing/adding repositories).

When clicking on the ceph web install wizard, it immediately said the enterprise repository wasn't signed, couldn't find files, and exited.

I then tried adding the no-subscription ceph repository via the web gui, then going to the ceph install in the web gui again. Still failed.

Going back to the repository setup screen in the gui, the only ceph one was again the enterprise one - which is odd as I added both the test and no-subscription ones previously.

That' when I gave up and did it from the CLI.

I am going to install another node from scratch tonight or tomorrow and will see if it does the same thing - or not.

EDIT: Disregard. Works as designed, it was user error.

RolandK · Jun 17, 2023

RolandK said:
ok, it also happens with pve 7.4, but not on real hardware.

can somebody confirm/test ? i would open a bugreport then.

done:

https://bugzilla.proxmox.com/show_bug.cgi?id=4786

JasonJoel · Jun 17, 2023

t.lamprecht said:
and successfully used our ceph wizard with the no-subscription repository

You were correct. I am an idiot and simply forgot to change the repository in the gui install to no-subscription.

It works fine as-is.

SMH

t.lamprecht · Jun 17, 2023

npestana said:
I noticed the Cluster logs tab have two login lines separated by two second (maybe one for TOTP?) on each login.

Good questions, and yes exactly, for second factors like TOTP there are two ticket calls, one for the first factor (password) providing a "half-loggged-in" ticket and then the one that confirms the second factor challenge. Currently, both produce a cluster-log.

npestana said:
Also, after 15 minutes, another login line is added to the log. Could it be that a token have less than 15 minutes expire time and new one is generated?

While a ticket is valid for two hours, the Web UI currently refreshes it every 15 minutes already.

npestana said:
Another thing, I have noticed the IPv6 firewall rules have this ipset reference: PVEFW-0-management-v6. Do you add any management IP rule into it like you do for IPv4, or, is it empty for the time being?

Yes, we add the same standard rules as for IPv4:

Code:

ip6tables-save | grep management
-A PVEFW-HOST-IN -p tcp -m set --match-set PVEFW-0-management-v6 src -m tcp --dport 8006 -j RETURN
-A PVEFW-HOST-IN -p tcp -m set --match-set PVEFW-0-management-v6 src -m tcp --dport 5900:5999 -j RETURN
-A PVEFW-HOST-IN -p tcp -m set --match-set PVEFW-0-management-v6 src -m tcp --dport 3128 -j RETURN
-A PVEFW-HOST-IN -p tcp -m set --match-set PVEFW-0-management-v6 src -m tcp --dport 22 -j RETURN
-A PVEFW-HOST-IN -p tcp -m set --match-set PVEFW-0-management-v6 src -m tcp --dport 60000:60050 -j RETURN

npestana · Jun 17, 2023

t.lamprecht said:
Good questions, and yes exactly, for second factors like TOTP there are two ticket calls, one for the first factor (password) providing a "half-loggged-in" ticket and then the one that confirms the second factor challenge. Currently, both produce a cluster-log.

It could be good to add more verbosity to that line. Adding a second phrase stating like, password login, totp login and token renewal. Just for quick auditing on the UI.

t.lamprecht said:
Yes, we add the same standard rules as for IPv4:

I saw that, sorry for not being so explicit. When I execute the ipset list, for IPv6 the following output is present. It does not show any member for IPv6 now. My question is, does it currently add any member at some point? I want to understand that and make sure I make correct firewall at router level since the ISP gives me a dynamic prefix through SLAAC.

Name: PVEFW-0-management-v6
Type: hash:net
Revision: 7
Header: family inet6 hashsize 64 maxelem 64 bucketsize 12 initval 0x########
Size in memory: 1240
References: 5
Number of entries: 0
Members:

EDIT: Another question, regarding the logs in /var/log/daemon.log. Have the file been moved? I want to setup fail2ban and every example I found uses it.

effgee · Jun 18, 2023

Hello all. So far my experience with PVE 8 is excellent, running it in several places as well as on my laptop for development purposes.
I am encountering an issue though with suspend and bluetooth, on my laptop.

Problems with USB(internal) bluetooth are quite common so it took me awhile to perhaps find where the issue lays.

On boot, bluetooth works great.

If I suspend (to ram) the machine bluetooth adapter disappears and it seems it is because of some missing firmware? I tried to install the firmware-atheros (but this is prohibited due to pve/pve-firmware)

The error is:

Code:

2023-06-18T17:47:30.185838+03:00 fattop-pve kernel: [  906.266058] bluetooth hci0: Direct firmware load for qca/rampatch_usb_00000302.bin failed with error -2
2023-06-18T17:47:30.185839+03:00 fattop-pve kernel: [  906.266064] Bluetooth: hci0: failed to request rampatch file: qca/rampatch_usb_00000302.bin (-2)

So I always have to reboot if I suspend my laptop, this is less than optimal.
Perhaps somehow this firmware was excluded?

I have one kernel update to run and will try it and update if something changes.

tteckster · Jun 18, 2023

effgee said:
Hello all. So far my experience with PVE 8 is excellent, running it in several places as well as on my laptop for development purposes.
I am encountering an issue though with suspend and bluetooth.

Problems with USB(internal) bluetooth are quite common so it took me awhile to perhaps find where the issue lays.

On boot, bluetooth works great.

If I suspend (to ram) the machine bluetooth disappears and it seems it is because of some missing firmware? I tried to install the firmware-atheros (but this is prohibited due to pve-firmware)

The error is:

Code:

2023-06-18T17:47:30.185838+03:00 fattop-pve kernel: [ 906.266058] bluetooth hci0: Direct firmware load for qca/rampatch_usb_00000302.bin failed with error -2 2023-06-18T17:47:30.185839+03:00 fattop-pve kernel: [ 906.266064] Bluetooth: hci0: failed to request rampatch file: qca/rampatch_usb_00000302.bin (-2)

So I always have to reboot if I suspend my laptop, this is less than optimal.
Perhaps somehow this firmware was excluded?

I have one kernel update to run and will try it and update if something changes.

Here is a solution for the lid issue:

Bash:

nano /etc/systemd/logind.conf

[Login]
#NAutoVTs=6
#ReserveVT=6
#KillUserProcesses=no
#KillOnlyUsers=
#KillExcludeUsers=root
#InhibitDelayMaxSec=5
#HandlePowerKey=poweroff
#HandleSuspendKey=suspend
#HandleHibernateKey=hibernate
HandleLidSwitch=ignore
#HandleLidSwitchExternalPower=suspend
HandleLidSwitchDocked=ignore
#PowerKeyIgnoreInhibited=no
#SuspendKeyIgnoreInhibited=no
#HibernateKeyIgnoreInhibited=no
#LidSwitchIgnoreInhibited=yes
#HoldoffTimeoutSec=30s
#IdleAction=ignore
#IdleActionSec=30min
#RuntimeDirectorySize=10%
#RemoveIPC=yes
#InhibitorsMax=8192
#SessionsMax=8192

systemctl restart systemd-logind.service

Proxmox VE 8.0 (beta) released!

Member

Proxmox Staff Member

Proxmox Staff Member

Proxmox Staff Member

Member

Proxmox Staff Member

Famous Member

Famous Member

Distinguished Member

Famous Member

Distinguished Member

New Member

Famous Member

Member

Famous Member

Member

Proxmox Staff Member

New Member

Renowned Member

Member

We value your privacy