Proxmox VE 8.0 released!

Znevna · Jul 13, 2023

donhwyo said:
[...]Where did you find info about "net.naming-scheme="?[...]

Looking for changelogs/news regarding systemd naming scheme changes, I've found this on github: news.
PVE 7.4 had naming scheme v 247.something before I've updated, so 247 made sense.
PVE 8.0 has naming scheme v252.
Someone in a systemd irc channel pointed me to the history from here: https://www.freedesktop.org/software/systemd/man/systemd.net-naming-scheme.html#History
v251 brought back the 'slot' stuff.
So I went looking around and found that file above... yes.. pretty hidden that option.
LE: reading that History section again makes it no longer that hidden... but I didn't read it carefully the last time it seems:

The following "naming schemes" have been defined (which may be chosen at system boot-up time via the net.naming-scheme= kernel command line switch, see above):

ianmbetts · Jul 14, 2023

Installed this today, initially on top of clean deb 12 install ( since in the past with pve 6/7 I never succeeded with the pve installer iso).

This time however I found deb 12 was generating an arbitrary hdd enumeration, a real PITA when you come to setup Ceph OSDs, so I decided to give the pve 8 installer ISO a go.

Wow!
Worked like s dream, all HDD enumerated in order they appear in the chassis, not sure how that works, but absolutely first class job guys!
Saved me a ton of time too.

Thankyou!

Riesling.Dry · Jul 15, 2023

Just wanted to say, that meanwhile I upgraded 3 more servers and everything went perfectly fine without ANY issue!
Again: MANY thanks to you, the proxmox team! Keep up your BRILLIANT work!

Ramalama · Jul 16, 2023

Hey Proxmox Guys/Girls,

i just checked dmesg errors and well:

[226676.229736] cgroup: Setting release_agent not allowed

I seen it on the forum somewhere, that someone mentioned that,
but i don't remember where :-(

However, i have no clue what triggered it (surely an LXC Container), but i have no issues and everything is working perfect/fast/etc...
I had anyway never issues with proxmox and tbh never found any bugs

However, maybe thats known already, maybe not, and well yeah, im not that much of a help without the clue what triggered it :-(
But if anyone readed that somewhere either about "release_agent", can pin me to that thread, (i tryed to search but wasn't successfull either)

Or ignore this post, as this isn't very helpfull and i have no issues anyway.
Cheers

Neobin · Jul 16, 2023

Ramalama said:
But if anyone readed that somewhere either about "release_agent", can pin me to that thread

Maybe?:
https://forum.proxmox.com/threads/come-and-help-me-to-find-out-why-this-is-happening.130556

Ramalama · Jul 16, 2023

Neobin said:
Maybe?:
https://forum.proxmox.com/threads/come-and-help-me-to-find-out-why-this-is-happening.130556

Yes exactly that, thank you!

BeDazzler · Jul 17, 2023

Hi All,

I have 2 clusters running PVE 7.3 where we are about to upgrade hardware to new HP Proliant servers.

I've done some reading and can see at least some benefits and improvements for installing PVE 8, however these PVE clusters all run local storage with Infiniband cards and don't use any special features (ie: they are simple hypervisors).

My question is - do I deploy new hardware and install PVE 7.4 (and wait until PVE 8 matures a little more) or do I go ahead and install PVE 8 ?

All my current 7.3 hosts have no issues and run perfectly fine.

Happy to hear thoughts and suggestions.

Thanks.

jdancer · Jul 17, 2023

BeDazzler said:
Hi All,

I have 2 clusters running PVE 7.3 where we are about to upgrade hardware to new HP Proliant servers.

I've done some reading and can see at least some benefits and improvements for installing PVE 8, however these PVE clusters all run local storage with Infiniband cards and don't use any special features (ie: they are simple hypervisors).

My question is - do I deploy new hardware and install PVE 7.4 (and wait until PVE 8 matures a little more) or do I go ahead and install PVE 8 ?

All my current 7.3 hosts have no issues and run perfectly fine.

Happy to hear thoughts and suggestions.

Thanks.

If this is production, I would wait until 8 matures. All zero releases regardless of software vendor is "buggy". Still got another year of support for PVE 7 anyhow.

MoreDakka · Jul 17, 2023

Neobin said:
See here:

https://forum.proxmox.com/threads/update-von-7-auf-8-fehlgeschlagen.129327

https://forum.proxmox.com/threads/u...8-but-have-a-hiccup-with-linux-headers.129353

https://forum.proxmox.com/threads/node-does-not-boot-with-kernel-6-2-16-3.130041

Neobin,
I can get myself into trouble with Linux but not always out of it.
How can I fix this problem that I'm in? Would it be better to scrap this idea and stick with 7? I've only done 1 of 4 nodes, I'm assuming this issue will happen with the other 4. If I can get this fixed and running on the first one then I can apply the fix to the others after upgrading.

Thanks!

BeDazzler · Jul 17, 2023

jdancer said:
If this is production, I would wait until 8 matures. All zero releases regardless of software vendor is "buggy". Still got another year of support for PVE 7 anyhow.

Yes this is production. Thanks for the input, much appreciated.

BlueMatt · Jul 18, 2023

Awesome, upgrade seems to have gone smoothly! Can you comment a bit more on the decision to use a 6.2-based kernel? I know in the past you'd referenced Fedora (which has already moved on as it tracks upstream) and Ubuntu lunar, which will only be maintained for about a year as inspiration for skipping the 6.1 stable series, but it seems this will leave Proxmox stuck maintaining its own kernel on its own in a year?

t.lamprecht · Jul 18, 2023

BlueMatt said:
Can you comment a bit more on the decision to use a 6.2-based kernel? I know in the past you'd referenced Fedora (which has already moved on as it tracks upstream)

We never referenced Fedora.

BlueMatt said:
Ubuntu lunar

Exactly we currently base off Ubuntu Lunar's kernel as base for our own adaptions and will switch over to the 6.5 based kernel from Ubuntu 23.10 Mantic in Q4, and next year we will switch to the kernel that the Ubuntu 24.04 LTS release will use for the rest of the Proxmox VE 8.x lifetime, just like in the last releases.

This ensures we got a modern kernel for good support for newer Hardware but without all too frequent jumps.

fabian · Jul 18, 2023

BlueMatt said:
Awesome, upgrade seems to have gone smoothly! Can you comment a bit more on the decision to use a 6.2-based kernel? I know in the past you'd referenced Fedora (which has already moved on as it tracks upstream) and Ubuntu lunar, which will only be maintained for about a year as inspiration for skipping the 6.1 stable series, but it seems this will leave Proxmox stuck maintaining its own kernel on its own in a year?

we tend to base our kernel on the kernel provided by the most current Ubuntu release, until the next Ubuntu LTS is released, at which point we usually track that release's kernel as upstream. this is not set in stone though, we always evaluate whether it's a good fit or not before switching, or whether there is a need to provide newer versions as opt-in before switching the default.

lennark · Jul 18, 2023

Hello Proxmox, Love the work you guys have going, its awesome.

Proxmox Upgrade & New install on 2 NUCs, a NUC7i7DNK & NUC8BEH

- The NUC7i7DNKtook the upgrade like a champ, no issues what so ever!
- - Has Integrated GPU - 00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 620 (rev 07)
- The NUC8BEH took it not as well on the new kernel.
- - Has Integrated GPU - 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-U GT3e [Iris Plus Graphics 655] (rev 01)

Backstory
Both nucs managed to upgrade but for some odd reason the NUC8BEH stopped responding after a minute or two, or 5, So an investigation was ensued, as it was running headless I took it to my bench and plugged in a monitor. Booted up and showed no issue and worked as intended, thought it was a fluke and took it right back.

Plugged it back in headless.. it got up for a few minutes and then acted up again, no network no game

aka was not reachable.
Back to the bench it was, no issue again.. until I unplugged the monitor, a minute or two later, dead fish.

Investigation ensued part 2
It seems the new kernel has some issues on the NUC8BEH with being headless, a kernel graphics driver issue, even though no display is in?

So started to scavenge the forum here to go trough a lot of the issues with the errors in the dmesg I could see, a particular one being
about power delivery and a change from D3hot or D3cold to D0 because of (config space inaccessible) when unplugging the monitor.

So started to look into GRUB settings, and tried the list below in the order I found them.
nosgx initcall_blacklist=sysfb_init video=efifb:off video=simplefb:off video=vesafb:off iommu=pt pcie_aspm=off

But as of now have not found a solution other than to order a small dummy hdmi plug (aka. the backup plan)

.

Anyway whilst doing the reboots I reverted to 5.15.108-1-pve #1 SMP PVE 5.15.108-1 (2023-06-17T09:41Z) x86_64 GNU/Linux` and the issue is gone again.

going back up to 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) x86_64 GNU/Linux and the issue is back.

on 5.15.108-1-pve I do still see the D3cold to D0 error when unplugging the monitor but the box never ever freezes.. or looses network`

This is the dmesg dump on 5.15.108-1-pve full dmesg is attached.

Code:

[  515.297967] pcieport 0000:00:1c.4: pciehp: Slot(8): Link Down
[  515.297968] pcieport 0000:02:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[  515.297970] pcieport 0000:00:1c.4: pciehp: Slot(8): Card not present
[  515.298044] pcieport 0000:03:02.0: can't change power state from D3cold to D0 (config space inaccessible)
[  515.298061] xhci_hcd 0000:3a:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[  515.298064] xhci_hcd 0000:3a:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[  515.298084] xhci_hcd 0000:3a:00.0: Controller not ready at resume -19
[  515.298086] xhci_hcd 0000:3a:00.0: PCI post-resume error -19!
[  515.298087] xhci_hcd 0000:3a:00.0: HC died; cleaning up
[  515.298097] xhci_hcd 0000:3a:00.0: remove, state 4
[  515.298099] usb usb4: USB disconnect, device number 1
[  515.298316] xhci_hcd 0000:3a:00.0: USB bus 4 deregistered
[  515.298322] xhci_hcd 0000:3a:00.0: remove, state 4
[  515.298324] usb usb3: USB disconnect, device number 1
[  515.298506] xhci_hcd 0000:3a:00.0: Host halt failed, -19
[  515.298511] xhci_hcd 0000:3a:00.0: Host not accessible, reset failed.
[  515.298590] xhci_hcd 0000:3a:00.0: USB bus 3 deregistered
[  515.298743] pcieport 0000:03:01.0: can't change power state from D3cold to D0 (config space inaccessible)
[  515.298898] pcieport 0000:03:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[  515.298988] pci_bus 0000:04: busn_res: [bus 04] is released
[  515.299056] pci 0000:03:00.0: Removing from iommu group 14
[  515.299070] pci_bus 0000:05: busn_res: [bus 05-39] is released
[  515.299113] pci 0000:03:01.0: Removing from iommu group 15
[  515.464563] pci 0000:3a:00.0: Removing from iommu group 16
[  515.464577] pci_bus 0000:3a: busn_res: [bus 3a] is released
[  515.464666] pci 0000:03:02.0: Removing from iommu group 16
[  515.464805] pci_bus 0000:03: busn_res: [bus 03-3a] is released
[  515.464919] pci 0000:02:00.0: Removing from iommu group 13

To see a dmesg errors on 6.2.16-3-pve I hade to resort to some schenanigans dmesg -w > somefile & as the dmesg message was way longer than the moment it dropped networking, as having to unplug the monitor to cause the issue on 6.2.16-3-pve.

Code:

[  174.736314] i915 0000:00:02.0: [drm] *ERROR* AUX B/DDI B/PHY B: not done (status 0x00000000)
[  174.736323] i915 0000:00:02.0: [drm] *ERROR* AUX B/DDI B/PHY B: not done (status 0x00000000)
[  174.736333] i915 0000:00:02.0: [drm] *ERROR* AUX B/DDI B/PHY B: not done (status 0x00000000)
[  174.736343] i915 0000:00:02.0: [drm] *ERROR* AUX B/DDI B/PHY B: not done (status 0x00000000)
[  174.736346] i915 0000:00:02.0: [drm] *ERROR* Error reading LSPCON mode
[  174.736347] i915 0000:00:02.0: [drm] *ERROR* LSPCON resume failed
[  174.736356] i915 0000:00:02.0: [drm] *ERROR* AUX B/DDI B/PHY B: not done (status 0x00000000)
[  174.737340] i915 0000:00:02.0: [drm] *ERROR* AUX B/DDI B/PHY B: not done (status 0x00000000)
... 10000 rows of same message ...
[  174.738278] i915 0000:00:02.0: [drm] *ERROR* AUX B/DDI B/PHY B: not done (status 0x00000000)
...
...
...
[  174.766220] ------------[ cut here ]------------
[  174.766220] RPM raw-wakeref not held
[  174.766245] WARNING: CPU: 1 PID: 24 at drivers/gpu/drm/i915/intel_runtime_pm.h:127 release_async_put_domains+0x115/0x120 [i915]
[  174.766386] Modules linked in: cmac nls_utf8 cifs cifs_arc4 rdma_cm iw_cm ib_cm ib_core cifs_md4 fscache netfs ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_set xt_physdev xt_addrtype xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_mark iptable_filter bpfilter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nf_tables bonding tls softdog sunrpc nfnetlink_log binfmt_misc nfnetlink snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_compress intel_rapl_msr ac97_bus intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal snd_pcm_dmaengine
[  174.766433]  intel_powerclamp coretemp i915 iwlmvm snd_hda_intel btusb drm_buddy snd_intel_dspcfg kvm_intel btrtl btbcm ttm mac80211 btintel drm_display_helper snd_intel_sdw_acpi libarc4 btmtk mei_pxp mei_hdcp cec snd_hda_codec rc_core kvm irqbypass crct10dif_pclmul iwlwifi polyval_clmulni polyval_generic ghash_clmulni_intel snd_hda_core sha512_ssse3 snd_hwdep aesni_intel crypto_simd snd_pcm cryptd drm_kms_helper rapl i2c_algo_bit bluetooth syscopyarea snd_timer intel_wmi_thunderbolt sysfillrect intel_cstate pcspkr snd mei_me ee1004 soundcore cfg80211 ecdh_generic mei sysimgblt ecc wmi_bmof intel_pch_thermal joydev input_leds acpi_pad acpi_tad mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb hid_logitech_hidpp hid_logitech_dj dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c hid_generic usbmouse usbkbd
[  174.766498]  usbhid hid rtsx_pci_sdmmc nvme i2c_i801 crc32_pclmul xhci_pci e1000e xhci_pci_renesas i2c_smbus nvme_core rtsx_pci nvme_common ahci libahci xhci_hcd video wmi pinctrl_cannonlake
[  174.766511] CPU: 1 PID: 24 Comm: kworker/1:0 Tainted: P        W  O       6.2.16-4-pve #1
[  174.766513] Hardware name: Intel(R) Client Systems NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0094.2023.0612.1527 06/12/2023
[  174.766515] Workqueue: events output_poll_execute [drm_kms_helper]
[  174.766535] RIP: 0010:release_async_put_domains+0x115/0x120 [i915]
[  174.766648] Code: 1d a3 f0 1d 00 80 fb 01 0f 87 2c 58 0e 00 83 e3 01 0f 85 50 ff ff ff 48 c7 c7 f4 a7 fc c1 c6 05 83 f0 1d 00 01 e8 db 17 88 c0 <0f> 0b e9 36 ff ff ff 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90
[  174.766650] RSP: 0018:ffffaf9c00143c98 EFLAGS: 00010246
[  174.766652] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[  174.766653] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  174.766654] RBP: ffffaf9c00143cd0 R08: 0000000000000000 R09: 0000000000000000
[  174.766655] R10: 0000000000000000 R11: 0000000000000000 R12: ffffaf9c00143ce0
[  174.766657] R13: ffff988b899f8978 R14: ffff988b899f8000 R15: 0000000000000002
[  174.766658] FS:  0000000000000000(0000) GS:ffff9892e0e80000(0000) knlGS:0000000000000000
[  174.766659] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  174.766661] CR2: 0000561ff52a2000 CR3: 00000006a3e10005 CR4: 00000000003706e0
[  174.766662] Call Trace:
[  174.766663]  <TASK>
[  174.766667]  intel_display_power_flush_work+0xc1/0xf0 [i915]
[  174.766777]  intel_dp_detect+0x3a8/0x730 [i915]
[  174.766888]  ? ww_mutex_lock+0x19/0xa0
[  174.766893]  drm_helper_probe_detect_ctx+0x57/0x120 [drm_kms_helper]
[  174.766911]  output_poll_execute+0x192/0x250 [drm_kms_helper]
[  174.766926]  process_one_work+0x222/0x430
[  174.766930]  worker_thread+0x50/0x3e0
[  174.766932]  ? __pfx_worker_thread+0x10/0x10
[  174.766934]  kthread+0xe6/0x110
[  174.766937]  ? __pfx_kthread+0x10/0x10
[  174.766940]  ret_from_fork+0x29/0x50
[  174.766944]  </TASK>
[  174.766945] ---[ end trace 0000000000000000 ]---
[  174.766947] ------------[ cut here ]------------
...
...
...
[  197.802354] i915 0000:00:02.0: Use count on power well PW_2 is already zero
[  197.802378] WARNING: CPU: 1 PID: 171 at drivers/gpu/drm/i915/display/intel_display_power_well.c:127 intel_power_well_put+0xa1/0xb0 [i915]
[  197.802520] Modules linked in: cmac nls_utf8 cifs cifs_arc4 rdma_cm iw_cm ib_cm ib_core cifs_md4 fscache netfs ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_set xt_physdev xt_addrtype xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_comment xt_mark iptable_filter bpfilter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nf_tables bonding tls softdog sunrpc nfnetlink_log binfmt_misc nfnetlink snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_cnl snd_sof_inte
l_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_compress intel_rapl_msr ac97_bus intel_rapl_common intel_tcc_cooling x86_pkg_temp_thermal snd_pcm_dmaengine
[  197.802567]  intel_powerclamp coretemp i915 iwlmvm snd_hda_intel btusb drm_buddy snd_intel_dspcfg kvm_intel btrtl btbcm ttm mac80211 btintel drm_display_helper snd_intel_sdw_acpi libarc4 btmtk mei_pxp mei_hdcp cec snd_hda_codec rc_core kvm irqbypass crct10dif_pclmul iwlwifi polyval_clmulni polyval_generic ghash_clmulni_intel snd_hda_core sha512_ssse3 snd_hwdep aesni_intel crypto_simd snd_pcm cryptd drm_kms_helper rapl i2c_algo_bit bluetooth syscopyarea snd_timer intel_wmi_thunderbolt sysfillrect intel_cstate pcspkr snd mei_me ee1004 soundcore cfg80211 ecdh_generic mei sysimgblt ecc wmi_bmof intel_pch_thermal joydev input_l
eds acpi_pad acpi_tad mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb hid_logitech_hidpp hid_logitech_dj dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c hid_generic usbmouse usbkbd
[  197.802631]  usbhid hid rtsx_pci_sdmmc nvme i2c_i801 crc32_pclmul xhci_pci e1000e xhci_pci_renesas i2c_smbus nvme_core rtsx_pci nvme_common ahci libahci xhci_hcd video wmi pinctrl_cannonlake
[  197.802645] CPU: 1 PID: 171 Comm: kworker/1:4 Tainted: P        W  O       6.2.16-4-pve #1
[  197.802647] Hardware name: Intel(R) Client Systems NUC8i3BEH/NUC8BEB, BIOS BECFL357.86A.0094.2023.0612.1527 06/12/2023
[  197.802649] Workqueue: events output_poll_execute [drm_kms_helper]
[  197.802668] RIP: 0010:intel_power_well_put+0xa1/0xb0 [i915]
[  197.802782] Code: 40 48 8d 04 c1 4c 8b 30 4d 85 ed 75 03 4c 8b 2f e8 54 0f 24 c1 4c 89 f1 4c 89 ea 48 c7 c7 f8 47 fa c1 48 89 c6 e8 df b0 87 c0 <0f> 0b 8b 43 18 e9 72 ff ff ff 0f 1f 44 00 00 90 90 90 90 90 90 90
[  197.802784] RSP: 0018:ffffaf9c006efc10 EFLAGS: 00010246
[  197.802786] RAX: 0000000000000000 RBX: ffff988b8139e080 RCX: 0000000000000000
[  197.802788] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  197.802789] RBP: ffffaf9c006efc30 R08: 0000000000000000 R09: 0000000000000000
[  197.802790] R10: 0000000000000000 R11: 0000000000000000 R12: ffff988b899f8000
[  197.802791] R13: ffff988b85dfb770 R14: ffffffffc1fcacf5 R15: 0000000000000002
[  197.802793] FS:  0000000000000000(0000) GS:ffff9892e0e80000(0000) knlGS:0000000000000000
[  197.802794] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  197.802796] CR2: 00007f413cd0f298 CR3: 00000006a3e10004 CR4: 00000000003706e0
[  197.802797] Call Trace:
[  197.802799]  <TASK>
[  197.802802]  __intel_display_power_put_domain+0xed/0x1e0 [i915]
[  197.802912]  ? __intel_runtime_pm_get+0x32/0xa0 [i915]
[  197.802988]  release_async_put_domains+0x88/0x120 [i915]
[  197.803097]  intel_display_power_flush_work+0xc1/0xf0 [i915]
[  197.803206]  intel_dp_detect+0x3a8/0x730 [i915]
[  197.803318]  ? ww_mutex_lock+0x19/0xa0
[  197.803324]  drm_helper_probe_detect_ctx+0x57/0x120 [drm_kms_helper]
[  197.803342]  output_poll_execute+0x192/0x250 [drm_kms_helper]
[  197.803358]  process_one_work+0x222/0x430
[  197.803362]  worker_thread+0x50/0x3e0
[  197.803365]  ? __pfx_worker_thread+0x10/0x10
[  197.803367]  kthread+0xe6/0x110
[  197.803370]  ? __pfx_kthread+0x10/0x10
[  197.803373]  ret_from_fork+0x29/0x50
[  197.803377]  </TASK>
[  197.803378] ---[ end trace 0000000000000000 ]---
[  197.803380] ------------[ cut here ]------------

So Im writing up this lengthy post to see if any of you know what I can try to resolv the issue with the kernel drivers or a secret `GRUB` flag that fixes the not so pleasant freeze/hang glitch. or should I just go with the backup plan as this might be one of those odd ones out.
Or just bring the issue to light if anyone else gets the same occurence.

I did a full reinstall aswell and the issue was present on a empty proxmox aswell on the new kernel.

Best Regards!
LK

Last but not least, you guys are doing an awesome product! keep it up and keep on kicking!

PS. for now I have reinstalled proxmox 7 on it and its rocking solid again, but do have the version 8 on another ssd, if I would want to venture into some kernel debugging again.

Dunuin · Jul 20, 2023

Finally upgraded my 4 PVE nodes from PVE7.4 to PVE8. Looks to work fine so far but when shutting down my TrueNAS Core VMs, both of them complain about an "an unscheduled system reboot" once I start the VM again. Did someone see the same problem? Not sure if that is something bad or not. I wanted to check the TrueNAS logs to see if at least the ZFS pools got unmounted properly when shutting down the VM but syslog is shutdown when shutting down the VM and I also can'T see the messages on the virtual console as it gets black once the VM has finished shutdown. So I can't see whats actually going on there. Maybe a problem with TrueNAS Core and the new QEMU version (Q35 is set to "latest")?

Ramalama · Jul 20, 2023

Dunuin said:
Finally upgraded my 4 PVE nodes from PVE7.4 to PVE8. Looks to work fine so far but when shutting down my TrueNAS Core VMs, both of them complain about an "an unscheduled system reboot" once I start the VM again. Did someone see the same problem? Not sure if that is something bad or not. I wanted to check the TrueNAS logs to see if at least the ZFS pools got unmounted properly when shutting down the VM but syslog is shutdown when shutting down the VM and I also can'T see the messages on the virtual console as it gets black once the VM has finished shutdown. So I can't see whats actually going on there. Maybe a problem with TrueNAS Core and the new QEMU version (Q35 is set to "latest")?

13.0 U5.2 ?
Anything special? Im gonna test if you want.
An vm config would help to replicate your instance.

I have here not updated my 4 nodes, i simply reinstalled them all cleanly.

Dunuin · Jul 20, 2023

Yes, both are "TrueNAS-13.0-U5.2". One with 1x SAS2008, one with 2x SAS2008 HBA passthrough.

VMs:

Code:

bios: ovmf
boot: order=ide2;scsi0
cores: 8
cpu: host,flags=+pcid;+spec-ctrl;+ssbd;+aes
efidisk0: VMpool2_VLT_VM8K:vm-144-disk-1,efitype=4m,size=1M
hostpci0: 0000:03:00,pcie=1
hostpci1: 0000:04:00,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 32768
meta: creation-qemu=7.1.0,ctime=1674579022
name: MainNAS
net0: virtio=CE:7C:EB:4F:FA:71,bridge=vmbr43,firewall=1,mtu=1500,queues=8
net1: virtio=12:55:A6:9C:DA:66,bridge=vmbr41,firewall=1,mtu=1500,queues=8
net2: virtio=8E:69:D7:D4:E9:51,bridge=vmbr42,firewall=1,mtu=1500,queues=8
net3: virtio=46:95:F5:04:63:49,bridge=vmbr45,firewall=1,mtu=9000,queues=8
net4: virtio=FA:49:FD:99:31:E7,bridge=vmbr46,firewall=1,mtu=1500,queues=8
net5: virtio=4A:DF:50:D1:A9:8C,bridge=vmbr47,firewall=1,mtu=1500,queues=8
net6: virtio=9A:51:1B:D6:A2:63,bridge=vmbr48,firewall=1,mtu=9000,queues=8
net7: virtio=EE:71:BB:9B:BC:0E,bridge=vmbr49,firewall=1,mtu=9000,queues=8
net8: virtio=A2:94:E1:AA:2A:1F,bridge=vmbr51,firewall=1,mtu=9000,queues=8
net9: virtio=5A:70:3B:4B:D6:97,bridge=vmbr201,firewall=1,mtu=9000,queues=8
numa: 0
onboot: 1
ostype: other
scsi0: VMpool2_VLT_VM8K:vm-144-disk-3,discard=on,iothread=1,size=80G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=5cd992a5-7526-4ce7-9106-ae6bd1d367d8
sockets: 1
startup: order=300,up=300
tags: host_enterprise
tpmstate0: VMpool2_VLT_VM8K:vm-144-disk-2,size=4M,version=v2.0
vmgenid: 2fb24a9a-9c0f-4223-b0c7-a30faf58eccd

Code:

bios: ovmf
boot: order=scsi0;ide2
cores: 6
cpu: host,flags=+pcid;+spec-ctrl;+ssbd;+aes
efidisk0: dpool_vlt_VM8K:vm-148-disk-0,efitype=4m,size=1M
hostpci0: 0000:02:00,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 20480
meta: creation-qemu=7.2.0,ctime=1684101783
name: BackupNAS
net0: virtio=26:41:E6:2A:7D:B5,bridge=vmbr43,firewall=1,queues=6
net1: virtio=A6:56:02:2E:4F:72,bridge=vmbr41,firewall=1,queues=6
net2: virtio=8E:F7:0E:1A:BB:DF,bridge=vmbr42,firewall=1,queues=6
net3: virtio=9A:6D:83:4F:E3:F4,bridge=vmbr45,firewall=1,mtu=9000,queues=6
net4: virtio=1E:F6:F2:25:0B:CC,bridge=vmbr46,firewall=1,queues=6
net5: virtio=6A:E1:DB:1E:9D:47,bridge=vmbr47,firewall=1,queues=6
net6: virtio=A6:1D:29:D6:14:34,bridge=vmbr48,firewall=1,mtu=9000,queues=6
net7: virtio=C2:31:2F:73:0B:3F,bridge=vmbr49,firewall=1,mtu=9000,queues=6
net8: virtio=06:FF:A9:8E:82:4A,bridge=vmbr51,firewall=1,mtu=9000,queues=6
net9: virtio=3A:2C:8B:ED:7B:5E,bridge=vmbr200,firewall=1,mtu=9000,queues=6
numa: 0
onboot: 1
ostype: other
scsi0: dpool_vlt_VM8K:vm-148-disk-1,discard=on,iothread=1,size=80G
scsihw: virtio-scsi-single
smbios1: uuid=961db8b4-c0ad-4db3-89c9-9d463344977f
sockets: 1
startup: order=100,up=300
tags: host_nostromo
tpmstate0: dpool_vlt_VM8K:vm-148-disk-2,size=4M,version=v2.0
vmgenid: 81304c4e-6a76-494e-84cc-7edeaafaf755

TrueNAS complains after each start:

Code:

WARNING
MainNAS.<redacted> had an unscheduled system reboot. The operating system successfully came back online at Thu Jul 20 14:54:26 2023.

Will try to switch them back to QEMU 7.2 to see if QEMU 8.0 is problematic.

Dunuin · Jul 20, 2023

Found something. The PVE shutdown task log show this:

Code:

VM quit/powerdown failed - terminating now with SIGTERM
TASK OK

Took the shutdown task 15 seconds to finish with an OK. So PVE isn't gracefully shutting down the VM but doing a hard stop without waiting for the timeout (which should only happen after some minutes)?

Ramalama · Jul 20, 2023

Dunuin said:

Yes, both are "TrueNAS-13.0-U5.2". One with 1x SAS2008, one with 2x SAS2008 HBA passthrough.

VMs:

Code:

bios: ovmf
boot: order=ide2;scsi0
cores: 8
cpu: host,flags=+pcid;+spec-ctrl;+ssbd;+aes
efidisk0: VMpool2_VLT_VM8K:vm-144-disk-1,efitype=4m,size=1M
hostpci0: 0000:03:00,pcie=1
hostpci1: 0000:04:00,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 32768
meta: creation-qemu=7.1.0,ctime=1674579022
name: MainNAS
net0: virtio=CE:7C:EB:4F:FA:71,bridge=vmbr43,firewall=1,mtu=1500,queues=8
net1: virtio=12:55:A6:9C:DA:66,bridge=vmbr41,firewall=1,mtu=1500,queues=8
net2: virtio=8E:69:D7:D4:E9:51,bridge=vmbr42,firewall=1,mtu=1500,queues=8
net3: virtio=46:95:F5:04:63:49,bridge=vmbr45,firewall=1,mtu=9000,queues=8
net4: virtio=FA:49:FD:99:31:E7,bridge=vmbr46,firewall=1,mtu=1500,queues=8
net5: virtio=4A:DF:50:D1:A9:8C,bridge=vmbr47,firewall=1,mtu=1500,queues=8
net6: virtio=9A:51:1B:D6:A2:63,bridge=vmbr48,firewall=1,mtu=9000,queues=8
net7: virtio=EE:71:BB:9B:BC:0E,bridge=vmbr49,firewall=1,mtu=9000,queues=8
net8: virtio=A2:94:E1:AA:2A:1F,bridge=vmbr51,firewall=1,mtu=9000,queues=8
net9: virtio=5A:70:3B:4B:D6:97,bridge=vmbr201,firewall=1,mtu=9000,queues=8
numa: 0
onboot: 1
ostype: other
scsi0: VMpool2_VLT_VM8K:vm-144-disk-3,discard=on,iothread=1,size=80G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=5cd992a5-7526-4ce7-9106-ae6bd1d367d8
sockets: 1
startup: order=300,up=300
tags: host_enterprise
tpmstate0: VMpool2_VLT_VM8K:vm-144-disk-2,size=4M,version=v2.0
vmgenid: 2fb24a9a-9c0f-4223-b0c7-a30faf58eccd

Code:

bios: ovmf
boot: order=scsi0;ide2
cores: 6
cpu: host,flags=+pcid;+spec-ctrl;+ssbd;+aes
efidisk0: dpool_vlt_VM8K:vm-148-disk-0,efitype=4m,size=1M
hostpci0: 0000:02:00,pcie=1
ide2: none,media=cdrom
machine: q35
memory: 20480
meta: creation-qemu=7.2.0,ctime=1684101783
name: BackupNAS
net0: virtio=26:41:E6:2A:7D:B5,bridge=vmbr43,firewall=1,queues=6
net1: virtio=A6:56:02:2E:4F:72,bridge=vmbr41,firewall=1,queues=6
net2: virtio=8E:F7:0E:1A:BB:DF,bridge=vmbr42,firewall=1,queues=6
net3: virtio=9A:6D:83:4F:E3:F4,bridge=vmbr45,firewall=1,mtu=9000,queues=6
net4: virtio=1E:F6:F2:25:0B:CC,bridge=vmbr46,firewall=1,queues=6
net5: virtio=6A:E1:DB:1E:9D:47,bridge=vmbr47,firewall=1,queues=6
net6: virtio=A6:1D:29:D6:14:34,bridge=vmbr48,firewall=1,mtu=9000,queues=6
net7: virtio=C2:31:2F:73:0B:3F,bridge=vmbr49,firewall=1,mtu=9000,queues=6
net8: virtio=06:FF:A9:8E:82:4A,bridge=vmbr51,firewall=1,mtu=9000,queues=6
net9: virtio=3A:2C:8B:ED:7B:5E,bridge=vmbr200,firewall=1,mtu=9000,queues=6
numa: 0
onboot: 1
ostype: other
scsi0: dpool_vlt_VM8K:vm-148-disk-1,discard=on,iothread=1,size=80G
scsihw: virtio-scsi-single
smbios1: uuid=961db8b4-c0ad-4db3-89c9-9d463344977f
sockets: 1
startup: order=100,up=300
tags: host_nostromo
tpmstate0: dpool_vlt_VM8K:vm-148-disk-2,size=4M,version=v2.0
vmgenid: 81304c4e-6a76-494e-84cc-7edeaafaf755

TrueNAS complains after each start:

Code:

WARNING
MainNAS.<redacted> had an unscheduled system reboot. The operating system successfully came back online at Thu Jul 20 14:54:26 2023.

Will try to switch them back to QEMU 7.2 to see if QEMU 8.0 is problematic.

Hey @Dunuin , im sorry but i can't replicate it.

I tryed everything to trigger an "unscheduled system reboot", but it doesn't appears here.
Sadly i can't passthrough an HBA here, but i made 6x32gb scsi disks and configured them into an Z2 Raid in TrueNAS.

What i tryed:
- Reboot Truenas
- Shutdown/Start Truenas
- Installed qemu-guest-agent:
-- Reboot truenas through Truenas GUI
-- Reboot truenas through Proxmox (qemu)
-- Shutdown+Start Truenas through Proxmox (qemu)

I only didn't tryed the acpi way to reboot/shutdown/start through Proxmox.

Checked all logs via dmesg, if there are errors and did after every test an "dmesg | grep unscheduled"...
I had no errors... :-(
And no errors in the gui aswell.

Code:

cat /etc/pve/qemu-server/102.conf
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
cpu: host,flags=+pcid;+spec-ctrl;+ssbd;+aes
efidisk0: NVME_ZFS_R10:vm-102-disk-0,efitype=4m,size=1M
ide2: none,media=cdrom
machine: q35
memory: 8192
meta: creation-qemu=8.0.2,ctime=1689860377
name: truenas
net0: virtio=D2:FD:1A:D8:DB:65,bridge=vmbr0,queues=8
numa: 0
ostype: other
scsi0: NVME_ZFS_R10:vm-102-disk-1,discard=on,iothread=1,size=32G,ssd=1
scsi1: HDD_Z2:vm-102-disk-0,iothread=1,size=32G
scsi2: HDD_Z2:vm-102-disk-1,iothread=1,size=32G
scsi3: HDD_Z2:vm-102-disk-2,iothread=1,size=32G
scsi4: HDD_Z2:vm-102-disk-3,iothread=1,size=32G
scsi5: HDD_Z2:vm-102-disk-4,iothread=1,size=32G
scsi6: HDD_Z2:vm-102-disk-5,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=2c5af4af-6b5c-414e-8ff6-c48d5d2e8ad9
sockets: 1
tpmstate0: NVME_ZFS_R10:vm-102-disk-2,size=4M,version=v2.0
vmgenid: cad7a581-167c-4f60-80f1-8bf18a411cb7

Tryed to get as similar as your VM Config, just don't have any HBA i can passthrough.
I think someone should test, that can passthrough an HBA actually.

If i can try sth else, tell me.
Cheers

Ramalama · Jul 20, 2023

Dunuin said:
Found something. The PVE shutdown task log show this:

Code:

VM quit/powerdown failed - terminating now with SIGTERM TASK OK

Took the shutdown task 15 seconds to finish with an OK. So PVE isn't gracefully shutting down the VM but doing a hard stop without waiting for the timeout (which should only happen after some minutes)?

Awesome, that makes absolutely sense!

Well, can you change to acpi or install qemu-guest-agent?

BTW, 15 seconds seems a bit short to me anyway.
Im wondering, i didn't modified the default behaviour of that 2min timeout.

EDIT2:
i used the qemu-guest-agent from here:
https://www.truenas.com/community/resources/qemu-guest-agent-for-truenas-core-13.191/
-> at the top is a download button

and the guide from here:
https://www.truenas.com/community/resources/qemu-guest-agent.167/

Worked perfectly here!

Proxmox VE 8.0 released!

New Member

Member

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Active Member

Renowned Member

Active Member

Active Member

Renowned Member

Proxmox Staff Member

Proxmox Staff Member

Member

Attachments

Distinguished Member

Renowned Member

Distinguished Member

Distinguished Member

Renowned Member

Renowned Member

We value your privacy