Nach Update auf 8... nach kurzer Zeit nicht mehr erreichbar

Moin Leute, ich habe mal ein bisschen rum probiert und festgestellt, dass mein Problem an dem 6er Kernel liegt. Ich bin wieder beim 5.15 und siehe da, er läuft stabil mit 8.0. Vielleicht hilft das ja auch jemanden weiter mit ähnlichem Problem. VG
wäre es möglich einen journalcontrol -b haben wenn das Problem auftritt. Wenn das Problem bei den letzten boot war geht auch journalctl -b -1
 
Mein System läuft mit 6.2.16-2-pve stabil.
Alle derzeit verfügbaren späteren Kernel führen nach ein paar Stunden zur Unerreichbarkeit des Systems.
 
Hallo zusammen,
unter 7.4 hat bei mir der 6er Kernel "6.2.11-2-pve" stabil funktioniert.

Nun, unter 8.0.x habe ich mit den Kernelversionen 6.2.16-1-pve, 6.2.16-2-pve, 6.2.16-3-pve und 6.2.16-4-pve nach ca. 12 Stunden manchmal auch spätestens nach 72 Stunden die gleiche Situation, dass ich den Server nicht mehr per Netzwerk erreichen kann. Leider habe ich auch kein Bildschirm in der Nähe des Servers ...

Da ich in der 7.4 zuletzt den Kernel 6.2.11-2-pve stabil genutzt habe, bin ich nun auf diesen Kernel angepinnt. Hiermit läuft die 8.0.3 auch sehr stabil.

Bei letzten Absturz finde ich folgende Einträge im syslog:

Jul 17 06:05:45 proxmox systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Jul 17 06:05:45 proxmox systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Jul 17 06:05:45 proxmox systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Jul 17 06:17:01 proxmox CRON[1123192]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 17 06:17:01 proxmox CRON[1123193]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 17 06:17:01 proxmox CRON[1123192]: pam_unix(cron:session): session closed for user root
Jul 17 06:25:01 proxmox CRON[1125679]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 17 06:25:01 proxmox CRON[1125680]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Jul 17 06:25:01 proxmox CRON[1125679]: pam_unix(cron:session): session closed for user root
Jul 17 06:29:44 proxmox kernel: ------------[ cut here ]------------
Jul 17 06:29:44 proxmox kernel: NETDEV WATCHDOG: eno1 (r8169): transmit queue 0 timed out
Jul 17 06:29:44 proxmox kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
Jul 17 06:29:44 proxmox kernel: Modules linked in: tcp_diag inet_diag nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat overlay veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter scsi_transport_iscsi nf_tables sunrpc bonding tls softdog nfnetlink_log binfmt_misc nfnetlink snd_hda_codec_hdmi snd_sof_pci_intel_icl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_hda_codec_realtek snd_sof_xtensa_dsp snd_hda_codec_generic snd_sof ledtrig_audio snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus x86_pkg_temp_thermal snd_soc_core intel_powerclamp snd_compress ac97_bus snd_pcm_dmaengine kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm snd_hda_codec irqbypass snd_hda_core crct10dif_pclmul
Jul 17 06:29:44 proxmox kernel: polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel iwlmvm snd_hwdep cmdlinepart crypto_simd mei_hdcp mei_pxp intel_rapl_msr spi_nor btusb cryptd processor_thermal_device_pci_legacy mac80211 btrtl libarc4 snd_pcm processor_thermal_device btbcm i915 processor_thermal_rfim btintel btmtk snd_timer processor_thermal_mbox mtd drm_buddy ttm drm_display_helper processor_thermal_rapl cec rc_core drm_kms_helper i2c_algo_bit syscopyarea intel_cstate pcspkr iwlwifi intel_rapl_common mei_me bluetooth snd ecdh_generic wmi_bmof sysfillrect int340x_thermal_zone ee1004 soundcore 8250_dw cfg80211 ecc mei intel_soc_dts_iosf sysimgblt mac_hid acpi_tad acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap coretemp drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb uas usb_storage dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme spi_pxa2xx_platform
Jul 17 06:29:44 proxmox kernel: dw_dmac dw_dmac_core nvme_core i2c_i801 sdhci_pci cqhci r8169 spi_intel_pci xhci_pci ahci xhci_pci_renesas crc32_pclmul spi_intel nvme_common i2c_smbus realtek intel_lpss_pci sdhci intel_lpss libahci xhci_hcd idma64 video wmi pinctrl_jasperlake
Jul 17 06:29:44 proxmox kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P O 6.2.16-4-pve #1
Jul 17 06:29:44 proxmox kernel: Hardware name: Intel(R) Client Systems NUC11ATKC4/NUC11ATBC4, BIOS ATJSLCPX.0037.2022.0715.1547 07/15/2022
Jul 17 06:29:44 proxmox kernel: RIP: 0010:dev_watchdog+0x23a/0x250
Jul 17 06:29:44 proxmox kernel: Code: 00 e9 2b ff ff ff 48 89 df c6 05 4a 6c 7d 01 01 e8 6b 08 f8 ff 44 89 f1 48 89 de 48 c7 c7 98 64 e0 b0 48 89 c2 e8 86 a6 30 ff <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
Jul 17 06:29:44 proxmox kernel: RSP: 0018:ffffb7e9c01b0e38 EFLAGS: 00010246
Jul 17 06:29:44 proxmox kernel: RAX: 0000000000000000 RBX: ffff9fac522d0000 RCX: 0000000000000000
Jul 17 06:29:44 proxmox kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jul 17 06:29:44 proxmox kernel: RBP: ffffb7e9c01b0e68 R08: 0000000000000000 R09: 0000000000000000
Jul 17 06:29:44 proxmox kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9fac522d04c8
Jul 17 06:29:44 proxmox kernel: R13: ffff9fac522d041c R14: 0000000000000000 R15: 0000000000000000
Jul 17 06:29:44 proxmox kernel: FS: 0000000000000000(0000) GS:ffff9fafaff00000(0000) knlGS:0000000000000000
Jul 17 06:29:44 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 17 06:29:44 proxmox kernel: CR2: 00007f87f1126068 CR3: 0000000200c10000 CR4: 0000000000350ee0
Jul 17 06:29:44 proxmox kernel: Call Trace:
Jul 17 06:29:44 proxmox kernel: <IRQ>
Jul 17 06:29:44 proxmox kernel: ? __pfx_dev_watchdog+0x10/0x10
Jul 17 06:29:44 proxmox kernel: call_timer_fn+0x29/0x160
Jul 17 06:29:44 proxmox kernel: ? __pfx_dev_watchdog+0x10/0x10
Jul 17 06:29:44 proxmox kernel: __run_timers+0x259/0x310
Jul 17 06:29:44 proxmox kernel: run_timer_softirq+0x1d/0x40
Jul 17 06:29:44 proxmox kernel: __do_softirq+0xd6/0x346
Jul 17 06:29:44 proxmox kernel: ? hrtimer_interrupt+0x11f/0x250
Jul 17 06:29:44 proxmox kernel: __irq_exit_rcu+0xa2/0xd0
Jul 17 06:29:44 proxmox kernel: irq_exit_rcu+0xe/0x20
Jul 17 06:29:44 proxmox kernel: sysvec_apic_timer_interrupt+0x92/0xd0
Jul 17 06:29:44 proxmox kernel: </IRQ>
Jul 17 06:29:44 proxmox kernel: <TASK>
Jul 17 06:29:44 proxmox kernel: asm_sysvec_apic_timer_interrupt+0x1b/0x20
Jul 17 06:29:44 proxmox kernel: RIP: 0010:cpuidle_enter_state+0xde/0x6f0
Jul 17 06:29:44 proxmox kernel: Code: 27 f7 4f e8 d4 79 4a ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 02 82 49 ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00 00
Jul 17 06:29:44 proxmox kernel: RSP: 0018:ffffb7e9c0137e38 EFLAGS: 00000246
Jul 17 06:29:44 proxmox kernel: RAX: 0000000000000000 RBX: ffff9fafaff3da00 RCX: 0000000000000000
Jul 17 06:29:44 proxmox kernel: RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
Jul 17 06:29:44 proxmox kernel: RBP: ffffb7e9c0137e88 R08: 0000000000000000 R09: 0000000000000000
Jul 17 06:29:44 proxmox kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb18c33a0
Jul 17 06:29:44 proxmox kernel: R13: 0000000000000002 R14: 0000000000000002 R15: 0000c1e530206a8b
Jul 17 06:29:44 proxmox kernel: ? cpuidle_enter_state+0xce/0x6f0
Jul 17 06:29:44 proxmox kernel: cpuidle_enter+0x2e/0x50
Jul 17 06:29:44 proxmox kernel: do_idle+0x216/0x2a0
Jul 17 06:29:44 proxmox kernel: cpu_startup_entry+0x1d/0x20
Jul 17 06:29:44 proxmox kernel: start_secondary+0x122/0x160
Jul 17 06:29:44 proxmox kernel: secondary_startup_64_no_verify+0xe5/0xeb
Jul 17 06:29:44 proxmox kernel: </TASK>
Jul 17 06:29:44 proxmox kernel: ---[ end trace 0000000000000000 ]---
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:30:54 proxmox kernel: net_ratelimit: 9 callbacks suppressed
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:31:01 proxmox pvedaemon[1057636]: worker exit
Jul 17 06:31:01 proxmox pvedaemon[951]: worker 1057636 finished
Jul 17 06:31:01 proxmox pvedaemon[951]: starting 1 worker(s)
Jul 17 06:31:01 proxmox pvedaemon[951]: worker 1127599 started
Jul 17 06:32:03 proxmox kernel: net_ratelimit: 9 callbacks suppressed

journalctl -b -1 siehe Datei
 

Attachments

  • journalctl_dump.txt
    188 KB · Views: 1
Hallo zusammen,
unter 7.4 hat bei mir der 6er Kernel "6.2.11-2-pve" stabil funktioniert.

Nun, unter 8.0.x habe ich mit den Kernelversionen 6.2.16-1-pve, 6.2.16-2-pve, 6.2.16-3-pve und 6.2.16-4-pve nach ca. 12 Stunden manchmal auch spätestens nach 72 Stunden die gleiche Situation, dass ich den Server nicht mehr per Netzwerk erreichen kann. Leider habe ich auch kein Bildschirm in der Nähe des Servers ...

Da ich in der 7.4 zuletzt den Kernel 6.2.11-2-pve stabil genutzt habe, bin ich nun auf diesen Kernel angepinnt. Hiermit läuft die 8.0.3 auch sehr stabil.

Bei letzten Absturz finde ich folgende Einträge im syslog:

Jul 17 06:05:45 proxmox systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Jul 17 06:05:45 proxmox systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Jul 17 06:05:45 proxmox systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Jul 17 06:17:01 proxmox CRON[1123192]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 17 06:17:01 proxmox CRON[1123193]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 17 06:17:01 proxmox CRON[1123192]: pam_unix(cron:session): session closed for user root
Jul 17 06:25:01 proxmox CRON[1125679]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 17 06:25:01 proxmox CRON[1125680]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Jul 17 06:25:01 proxmox CRON[1125679]: pam_unix(cron:session): session closed for user root
Jul 17 06:29:44 proxmox kernel: ------------[ cut here ]------------
Jul 17 06:29:44 proxmox kernel: NETDEV WATCHDOG: eno1 (r8169): transmit queue 0 timed out
Jul 17 06:29:44 proxmox kernel: WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
Jul 17 06:29:44 proxmox kernel: Modules linked in: tcp_diag inet_diag nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat overlay veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter scsi_transport_iscsi nf_tables sunrpc bonding tls softdog nfnetlink_log binfmt_misc nfnetlink snd_hda_codec_hdmi snd_sof_pci_intel_icl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_hda_codec_realtek snd_sof_xtensa_dsp snd_hda_codec_generic snd_sof ledtrig_audio snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus x86_pkg_temp_thermal snd_soc_core intel_powerclamp snd_compress ac97_bus snd_pcm_dmaengine kvm_intel snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi kvm snd_hda_codec irqbypass snd_hda_core crct10dif_pclmul
Jul 17 06:29:44 proxmox kernel: polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel iwlmvm snd_hwdep cmdlinepart crypto_simd mei_hdcp mei_pxp intel_rapl_msr spi_nor btusb cryptd processor_thermal_device_pci_legacy mac80211 btrtl libarc4 snd_pcm processor_thermal_device btbcm i915 processor_thermal_rfim btintel btmtk snd_timer processor_thermal_mbox mtd drm_buddy ttm drm_display_helper processor_thermal_rapl cec rc_core drm_kms_helper i2c_algo_bit syscopyarea intel_cstate pcspkr iwlwifi intel_rapl_common mei_me bluetooth snd ecdh_generic wmi_bmof sysfillrect int340x_thermal_zone ee1004 soundcore 8250_dw cfg80211 ecc mei intel_soc_dts_iosf sysimgblt mac_hid acpi_tad acpi_pad zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap coretemp drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb uas usb_storage dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme spi_pxa2xx_platform
Jul 17 06:29:44 proxmox kernel: dw_dmac dw_dmac_core nvme_core i2c_i801 sdhci_pci cqhci r8169 spi_intel_pci xhci_pci ahci xhci_pci_renesas crc32_pclmul spi_intel nvme_common i2c_smbus realtek intel_lpss_pci sdhci intel_lpss libahci xhci_hcd idma64 video wmi pinctrl_jasperlake
Jul 17 06:29:44 proxmox kernel: CPU: 2 PID: 0 Comm: swapper/2 Tainted: P O 6.2.16-4-pve #1
Jul 17 06:29:44 proxmox kernel: Hardware name: Intel(R) Client Systems NUC11ATKC4/NUC11ATBC4, BIOS ATJSLCPX.0037.2022.0715.1547 07/15/2022
Jul 17 06:29:44 proxmox kernel: RIP: 0010:dev_watchdog+0x23a/0x250
Jul 17 06:29:44 proxmox kernel: Code: 00 e9 2b ff ff ff 48 89 df c6 05 4a 6c 7d 01 01 e8 6b 08 f8 ff 44 89 f1 48 89 de 48 c7 c7 98 64 e0 b0 48 89 c2 e8 86 a6 30 ff <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
Jul 17 06:29:44 proxmox kernel: RSP: 0018:ffffb7e9c01b0e38 EFLAGS: 00010246
Jul 17 06:29:44 proxmox kernel: RAX: 0000000000000000 RBX: ffff9fac522d0000 RCX: 0000000000000000
Jul 17 06:29:44 proxmox kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Jul 17 06:29:44 proxmox kernel: RBP: ffffb7e9c01b0e68 R08: 0000000000000000 R09: 0000000000000000
Jul 17 06:29:44 proxmox kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff9fac522d04c8
Jul 17 06:29:44 proxmox kernel: R13: ffff9fac522d041c R14: 0000000000000000 R15: 0000000000000000
Jul 17 06:29:44 proxmox kernel: FS: 0000000000000000(0000) GS:ffff9fafaff00000(0000) knlGS:0000000000000000
Jul 17 06:29:44 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 17 06:29:44 proxmox kernel: CR2: 00007f87f1126068 CR3: 0000000200c10000 CR4: 0000000000350ee0
Jul 17 06:29:44 proxmox kernel: Call Trace:
Jul 17 06:29:44 proxmox kernel: <IRQ>
Jul 17 06:29:44 proxmox kernel: ? __pfx_dev_watchdog+0x10/0x10
Jul 17 06:29:44 proxmox kernel: call_timer_fn+0x29/0x160
Jul 17 06:29:44 proxmox kernel: ? __pfx_dev_watchdog+0x10/0x10
Jul 17 06:29:44 proxmox kernel: __run_timers+0x259/0x310
Jul 17 06:29:44 proxmox kernel: run_timer_softirq+0x1d/0x40
Jul 17 06:29:44 proxmox kernel: __do_softirq+0xd6/0x346
Jul 17 06:29:44 proxmox kernel: ? hrtimer_interrupt+0x11f/0x250
Jul 17 06:29:44 proxmox kernel: __irq_exit_rcu+0xa2/0xd0
Jul 17 06:29:44 proxmox kernel: irq_exit_rcu+0xe/0x20
Jul 17 06:29:44 proxmox kernel: sysvec_apic_timer_interrupt+0x92/0xd0
Jul 17 06:29:44 proxmox kernel: </IRQ>
Jul 17 06:29:44 proxmox kernel: <TASK>
Jul 17 06:29:44 proxmox kernel: asm_sysvec_apic_timer_interrupt+0x1b/0x20
Jul 17 06:29:44 proxmox kernel: RIP: 0010:cpuidle_enter_state+0xde/0x6f0
Jul 17 06:29:44 proxmox kernel: Code: 27 f7 4f e8 d4 79 4a ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 02 82 49 ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00 00
Jul 17 06:29:44 proxmox kernel: RSP: 0018:ffffb7e9c0137e38 EFLAGS: 00000246
Jul 17 06:29:44 proxmox kernel: RAX: 0000000000000000 RBX: ffff9fafaff3da00 RCX: 0000000000000000
Jul 17 06:29:44 proxmox kernel: RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
Jul 17 06:29:44 proxmox kernel: RBP: ffffb7e9c0137e88 R08: 0000000000000000 R09: 0000000000000000
Jul 17 06:29:44 proxmox kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffb18c33a0
Jul 17 06:29:44 proxmox kernel: R13: 0000000000000002 R14: 0000000000000002 R15: 0000c1e530206a8b
Jul 17 06:29:44 proxmox kernel: ? cpuidle_enter_state+0xce/0x6f0
Jul 17 06:29:44 proxmox kernel: cpuidle_enter+0x2e/0x50
Jul 17 06:29:44 proxmox kernel: do_idle+0x216/0x2a0
Jul 17 06:29:44 proxmox kernel: cpu_startup_entry+0x1d/0x20
Jul 17 06:29:44 proxmox kernel: start_secondary+0x122/0x160
Jul 17 06:29:44 proxmox kernel: secondary_startup_64_no_verify+0xe5/0xeb
Jul 17 06:29:44 proxmox kernel: </TASK>
Jul 17 06:29:44 proxmox kernel: ---[ end trace 0000000000000000 ]---
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:29:44 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:30:54 proxmox kernel: net_ratelimit: 9 callbacks suppressed
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:30:54 proxmox kernel: r8169 0000:02:00.0 eno1: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Jul 17 06:31:01 proxmox pvedaemon[1057636]: worker exit
Jul 17 06:31:01 proxmox pvedaemon[951]: worker 1057636 finished
Jul 17 06:31:01 proxmox pvedaemon[951]: starting 1 worker(s)
Jul 17 06:31:01 proxmox pvedaemon[951]: worker 1127599 started
Jul 17 06:32:03 proxmox kernel: net_ratelimit: 9 callbacks suppressed

journalctl -b -1 siehe Datei
Dann pin doch den 6.2-11 Kernel erst einmal an mit dem proxmox-boot-tool
 
Hallo,
bin gerade auf 8.0.3 mit dem 6.2.16-6-pve Kernel. Stürzt nach kurzer Zeit ab. Zurück auf 5.15.108-1-pve läuft es.
Wie kann ich denn den 6.2.11-2 Kernel installieren
Code:
root@pve:~# proxmox-boot-tool kernel add 6.2.11-2-pve
E: no kernel image found in /boot for '6.2.11-2-pve', not adding.
root@pve:~# apt install pve-kernel-6.2.11-2-pve
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package pve-kernel-6.2.11-2-pve
E: Couldn't find any package by glob 'pve-kernel-6.2.11-2-pve'
 
Der Kernel 6.2.11-2 befindet sich im 7.4 Repository (bullseye). Ich hatte den Kernel schon länger unter 7.4 im Einsatz. Nach dem Upgrade auf 8.0.x war der Kernel noch auf meinem Server vorhanden.

Im 8.0.x Repository (bookworm) gibt es den Kernel nicht.

apt list pve-kernel*-pve
Listing... Done
pve-kernel-5.15.102-1-pve/now 5.15.102-1 amd64 [installed,local]
pve-kernel-5.15.107-2-pve/now 5.15.107-2 amd64 [residual-config]
pve-kernel-5.15.108-1-pve/now 5.15.108-1 amd64 [installed,local]
pve-kernel-6.1.10-1-pve/stable 6.1.10-1 amd64
pve-kernel-6.2.11-2-pve/now 6.2.11-2 amd64 [installed,local]
pve-kernel-6.2.16-1-pve/stable 6.2.16-1 amd64
pve-kernel-6.2.16-2-pve/stable 6.2.16-2 amd64
pve-kernel-6.2.16-3-pve/stable,now 6.2.16-3 amd64 [installed,automatic]
pve-kernel-6.2.16-4-pve/stable,now 6.2.16-5 amd64 [installed,automatic]
 
Verzeihung, Deutsch ist nicht meine erste Sprache. Ich habe vor 2 Wochen ein Intel NUC von 7 nach 8 gebracht, und da auch Problemen gesehen. Auch headless, und mit Monitor kein Problem.
Was es fur mich gelöst hat:

GRUB_CMDLINE_LINUX_DEFAULT="quiet nomodeset"

in /etc/default/grub gefolgt von update-grub

Vielleicht hilft das hier auch
 
Kurzer Input von meiner Seite: Hatte auf meinem Intel NUC System nach Update auf Version 8 auch die spontanen freezes. Keine Erreichbarkeit mehr von außen.
Bin jetzt auf Kernel 5.15.108-1 zurück, jetzt seit 5 Tagen keine Probleme mehr.
 
Mit dem 5.15.108-1 bin ich auch ohne Probleme auf der 8.Version unterwegs.
Ist es sinnvoll "unbedingt" den 6.2.11-2 zu instalieren (also zurück auf V 7.4, hier den 6.2.11 installieren)
 
Kurzer Zwischenstand von meinem N100 System.
Meine Uptime beträgt nun sagenhafte drei Tage ohne Freeze. Das ist ein neuer Rekord für Proxmox 8 auf dem System. Bisher lief es immer nur 1-2 Tage max bis zum nächsten Freeze.
Ich habe erfolglos einige Sachen probiert (intel-microcode, c-states im BIOS ausgeschaltet), aber die letzten Änderungen scheinen tatsächlich eine Verbesserung zu bringen.

Wie auch hier beschrieben (https://forum.proxmox.com/threads/vms-freeze-with-100-cpu.127459/post-575892), habe ich mitigations=off gesetzt UND KSM deaktivert. mitigations=off alleine hat keine Verbesserung gebracht. Ob KSM disablen alleine hilft, habe ich bisher nicht getestet.

Am Rande, ich habe den aktuellsten Kernel installiert...
 
Last edited:
Ich habe gestern das Upgrade auf Version 8 durchgeführt, und bin nun ebenfalls von dem Problem betroffen.
Für mich sieht es so aus als wenn nach einer unbestimmten Zeit durch einen Bug die Node ihre Verbindung verliert und dadurch unerreichbar wird. Alles andere im folgenden Log sind dann Folgeprobleme.

Hier das Log:

Code:
Aug 08 09:17:19 pve corosync[1060]:   [KNET  ] link: host: 2 link: 0 is down
Aug 08 09:17:19 pve corosync[1060]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 08 09:17:19 pve corosync[1060]:   [KNET  ] host: host: 2 has no active links
Aug 08 09:17:20 pve corosync[1060]:   [TOTEM ] Token has not been received in 2250 ms
Aug 08 09:17:21 pve corosync[1060]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3000ms), waiting 3600ms for consensus.
Aug 08 09:17:25 pve corosync[1060]:   [QUORUM] Sync members[1]: 1
Aug 08 09:17:25 pve corosync[1060]:   [QUORUM] Sync left[1]: 2
Aug 08 09:17:25 pve corosync[1060]:   [TOTEM ] A new membership (1.fb) was formed. Members left: 2
Aug 08 09:17:25 pve corosync[1060]:   [TOTEM ] Failed to receive the leave message. failed: 2
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] notice: members: 1/961
Aug 08 09:17:25 pve pmxcfs[961]: [status] notice: members: 1/961
Aug 08 09:17:25 pve corosync[1060]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 08 09:17:25 pve corosync[1060]:   [QUORUM] Members[1]: 1
Aug 08 09:17:25 pve corosync[1060]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 08 09:17:25 pve pmxcfs[961]: [status] notice: node lost quorum
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] crit: received write while not quorate - trigger resync
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] crit: leaving CPG group
Aug 08 09:17:25 pve pve-ha-lrm[1127]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pve/lrm_status.tmp.1127' - Permission denied
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] notice: start cluster connection
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] crit: cpg_join failed: 14
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] crit: can't initialize service
Aug 08 09:17:31 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (Connection timed out)
Aug 08 09:17:31 pve pvestatd[1079]: status update time (7.437 seconds)
Aug 08 09:17:31 pve pmxcfs[961]: [dcdb] notice: members: 1/961
Aug 08 09:17:31 pve pmxcfs[961]: [dcdb] notice: all data is up to date
Aug 08 09:17:41 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (Connection timed out)
Aug 08 09:17:42 pve pvestatd[1079]: status update time (7.461 seconds)
Aug 08 09:17:48 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:17:57 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:07 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:09 pve pvescheduler[364317]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:18:09 pve pvescheduler[364316]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:18:17 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:27 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:37 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:47 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:57 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:07 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:09 pve pvescheduler[364551]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:19:09 pve pvescheduler[364550]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:19:17 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:27 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:37 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:47 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:53 pve kernel: ------------[ cut here ]------------
Aug 08 09:19:53 pve kernel: NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out
Aug 08 09:19:53 pve kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
Aug 08 09:19:53 pve kernel: Modules linked in: tcp_diag inet_diag dm_snapshot cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunne>
Aug 08 09:19:53 pve kernel:  snd_hda_intel polyval_generic ghash_clmulni_intel snd_intel_dspcfg sha512_ssse3 drm_display_helper snd_intel_sdw_acpi aesni_intel cec snd_hda_codec crypto_simd dell_wmi cryptd snd_hda_core rc_core le>
Aug 08 09:19:53 pve kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O       6.2.16-6-pve #1
Aug 08 09:19:53 pve kernel: Hardware name: Dell Inc. Wyse 5070 Thin Client/0TKM9Y, BIOS 1.21.0 11/17/2022
Aug 08 09:19:53 pve kernel: RIP: 0010:dev_watchdog+0x23a/0x250
Aug 08 09:19:53 pve kernel: Code: 00 e9 2b ff ff ff 48 89 df c6 05 8b 6c 7d 01 01 e8 6b 08 f8 ff 44 89 f1 48 89 de 48 c7 c7 98 65 40 ae 48 89 c2 e8 86 a6 30 ff <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
Aug 08 09:19:53 pve kernel: RSP: 0018:ffffad1400118e38 EFLAGS: 00010246
Aug 08 09:19:53 pve kernel: RAX: 0000000000000000 RBX: ffff8b4fc0dec000 RCX: 0000000000000000
Aug 08 09:19:53 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Aug 08 09:19:53 pve kernel: RBP: ffffad1400118e68 R08: 0000000000000000 R09: 0000000000000000
Aug 08 09:19:53 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b4fc0dec4c8
Aug 08 09:19:53 pve kernel: R13: ffff8b4fc0dec41c R14: 0000000000000000 R15: 0000000000000000
Aug 08 09:19:53 pve kernel: FS:  0000000000000000(0000) GS:ffff8b532fc80000(0000) knlGS:0000000000000000
Aug 08 09:19:53 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 08 09:19:53 pve kernel: CR2: 00007f0d76dc10a0 CR3: 0000000123a10000 CR4: 0000000000350ee0
Aug 08 09:19:53 pve kernel: Call Trace:
Aug 08 09:19:53 pve kernel:  <IRQ>
Aug 08 09:19:53 pve kernel:  ? __pfx_dev_watchdog+0x10/0x10
Aug 08 09:19:53 pve kernel:  call_timer_fn+0x29/0x160
Aug 08 09:19:53 pve kernel:  ? __pfx_dev_watchdog+0x10/0x10
Aug 08 09:19:53 pve kernel:  __run_timers+0x259/0x310
Aug 08 09:19:53 pve kernel:  run_timer_softirq+0x1d/0x40
Aug 08 09:19:53 pve kernel:  __do_softirq+0xd6/0x346
Aug 08 09:19:53 pve kernel:  ? hrtimer_interrupt+0x11f/0x250
Aug 08 09:19:53 pve kernel:  __irq_exit_rcu+0xa2/0xd0
Aug 08 09:19:53 pve kernel:  irq_exit_rcu+0xe/0x20
Aug 08 09:19:53 pve kernel:  sysvec_apic_timer_interrupt+0x92/0xd0
Aug 08 09:19:53 pve kernel:  </IRQ>
Aug 08 09:19:53 pve kernel:  <TASK>
Aug 08 09:19:53 pve kernel:  asm_sysvec_apic_timer_interrupt+0x1b/0x20
Aug 08 09:19:53 pve kernel: RIP: 0010:cpuidle_enter_state+0xde/0x6f0
Aug 08 09:19:53 pve kernel: Code: 27 97 52 e8 d4 79 4a ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 02 82 49 ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00 00
Aug 08 09:19:53 pve kernel: RSP: 0018:ffffad14000cbe38 EFLAGS: 00000246
Aug 08 09:19:53 pve kernel: RAX: 0000000000000000 RBX: ffff8b532fcbd900 RCX: 0000000000000000
Aug 08 09:19:53 pve kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
Aug 08 09:19:53 pve kernel: RBP: ffffad14000cbe88 R08: 0000000000000000 R09: 0000000000000000
Aug 08 09:19:53 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffaeec33e0
Aug 08 09:19:53 pve kernel: R13: 0000000000000007 R14: 0000000000000007 R15: 0000403e891e370d
Aug 08 09:19:53 pve kernel:  ? cpuidle_enter_state+0xce/0x6f0
Aug 08 09:19:53 pve kernel:  cpuidle_enter+0x2e/0x50
Aug 08 09:19:53 pve kernel:  do_idle+0x216/0x2a0
Aug 08 09:19:53 pve kernel:  cpu_startup_entry+0x1d/0x20
Aug 08 09:19:53 pve kernel:  start_secondary+0x122/0x160
Aug 08 09:19:53 pve kernel:  secondary_startup_64_no_verify+0xe5/0xeb
Aug 08 09:19:53 pve kernel:  </TASK>
Aug 08 09:19:53 pve kernel: ---[ end trace 0000000000000000 ]---
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:19:54 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:19:54 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:19:57 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:20:03 pve kernel: net_ratelimit: 9 callbacks suppressed
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:20:04 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).

An der Netzwerk/Cluster-Konfiguration wurde nichts verändert, hier trotzdem der Vollständigkeit halber:
Code:
root@pve:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.0.27/24
        gateway 192.168.0.1
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0
Code:
root@pve:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.27
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.9
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Proxmox-Farm
  config_version: 4
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Aktueller Kernel im Einsatz: 6.2.16-6-pve

Ich werde zunächst auch versuchen, einen niedrigeren Kernel einzusetzen.
 
Ich habe gestern das Upgrade auf Version 8 durchgeführt, und bin nun ebenfalls von dem Problem betroffen.
Für mich sieht es so aus als wenn nach einer unbestimmten Zeit durch einen Bug die Node ihre Verbindung verliert und dadurch unerreichbar wird. Alles andere im folgenden Log sind dann Folgeprobleme.

Hier das Log:

Code:
Aug 08 09:17:19 pve corosync[1060]:   [KNET  ] link: host: 2 link: 0 is down
Aug 08 09:17:19 pve corosync[1060]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
Aug 08 09:17:19 pve corosync[1060]:   [KNET  ] host: host: 2 has no active links
Aug 08 09:17:20 pve corosync[1060]:   [TOTEM ] Token has not been received in 2250 ms
Aug 08 09:17:21 pve corosync[1060]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3000ms), waiting 3600ms for consensus.
Aug 08 09:17:25 pve corosync[1060]:   [QUORUM] Sync members[1]: 1
Aug 08 09:17:25 pve corosync[1060]:   [QUORUM] Sync left[1]: 2
Aug 08 09:17:25 pve corosync[1060]:   [TOTEM ] A new membership (1.fb) was formed. Members left: 2
Aug 08 09:17:25 pve corosync[1060]:   [TOTEM ] Failed to receive the leave message. failed: 2
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] notice: members: 1/961
Aug 08 09:17:25 pve pmxcfs[961]: [status] notice: members: 1/961
Aug 08 09:17:25 pve corosync[1060]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 08 09:17:25 pve corosync[1060]:   [QUORUM] Members[1]: 1
Aug 08 09:17:25 pve corosync[1060]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 08 09:17:25 pve pmxcfs[961]: [status] notice: node lost quorum
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] crit: received write while not quorate - trigger resync
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] crit: leaving CPG group
Aug 08 09:17:25 pve pve-ha-lrm[1127]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pve/lrm_status.tmp.1127' - Permission denied
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] notice: start cluster connection
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] crit: cpg_join failed: 14
Aug 08 09:17:25 pve pmxcfs[961]: [dcdb] crit: can't initialize service
Aug 08 09:17:31 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (Connection timed out)
Aug 08 09:17:31 pve pvestatd[1079]: status update time (7.437 seconds)
Aug 08 09:17:31 pve pmxcfs[961]: [dcdb] notice: members: 1/961
Aug 08 09:17:31 pve pmxcfs[961]: [dcdb] notice: all data is up to date
Aug 08 09:17:41 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (Connection timed out)
Aug 08 09:17:42 pve pvestatd[1079]: status update time (7.461 seconds)
Aug 08 09:17:48 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:17:57 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:07 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:09 pve pvescheduler[364317]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:18:09 pve pvescheduler[364316]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:18:17 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:27 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:37 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:47 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:18:57 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:07 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:09 pve pvescheduler[364551]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:19:09 pve pvescheduler[364550]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:19:17 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:27 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:37 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:47 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:19:53 pve kernel: ------------[ cut here ]------------
Aug 08 09:19:53 pve kernel: NETDEV WATCHDOG: enp1s0 (r8169): transmit queue 0 timed out
Aug 08 09:19:53 pve kernel: WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x23a/0x250
Aug 08 09:19:53 pve kernel: Modules linked in: tcp_diag inet_diag dm_snapshot cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunne>
Aug 08 09:19:53 pve kernel:  snd_hda_intel polyval_generic ghash_clmulni_intel snd_intel_dspcfg sha512_ssse3 drm_display_helper snd_intel_sdw_acpi aesni_intel cec snd_hda_codec crypto_simd dell_wmi cryptd snd_hda_core rc_core le>
Aug 08 09:19:53 pve kernel: CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O       6.2.16-6-pve #1
Aug 08 09:19:53 pve kernel: Hardware name: Dell Inc. Wyse 5070 Thin Client/0TKM9Y, BIOS 1.21.0 11/17/2022
Aug 08 09:19:53 pve kernel: RIP: 0010:dev_watchdog+0x23a/0x250
Aug 08 09:19:53 pve kernel: Code: 00 e9 2b ff ff ff 48 89 df c6 05 8b 6c 7d 01 01 e8 6b 08 f8 ff 44 89 f1 48 89 de 48 c7 c7 98 65 40 ae 48 89 c2 e8 86 a6 30 ff <0f> 0b e9 1c ff ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00
Aug 08 09:19:53 pve kernel: RSP: 0018:ffffad1400118e38 EFLAGS: 00010246
Aug 08 09:19:53 pve kernel: RAX: 0000000000000000 RBX: ffff8b4fc0dec000 RCX: 0000000000000000
Aug 08 09:19:53 pve kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Aug 08 09:19:53 pve kernel: RBP: ffffad1400118e68 R08: 0000000000000000 R09: 0000000000000000
Aug 08 09:19:53 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b4fc0dec4c8
Aug 08 09:19:53 pve kernel: R13: ffff8b4fc0dec41c R14: 0000000000000000 R15: 0000000000000000
Aug 08 09:19:53 pve kernel: FS:  0000000000000000(0000) GS:ffff8b532fc80000(0000) knlGS:0000000000000000
Aug 08 09:19:53 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 08 09:19:53 pve kernel: CR2: 00007f0d76dc10a0 CR3: 0000000123a10000 CR4: 0000000000350ee0
Aug 08 09:19:53 pve kernel: Call Trace:
Aug 08 09:19:53 pve kernel:  <IRQ>
Aug 08 09:19:53 pve kernel:  ? __pfx_dev_watchdog+0x10/0x10
Aug 08 09:19:53 pve kernel:  call_timer_fn+0x29/0x160
Aug 08 09:19:53 pve kernel:  ? __pfx_dev_watchdog+0x10/0x10
Aug 08 09:19:53 pve kernel:  __run_timers+0x259/0x310
Aug 08 09:19:53 pve kernel:  run_timer_softirq+0x1d/0x40
Aug 08 09:19:53 pve kernel:  __do_softirq+0xd6/0x346
Aug 08 09:19:53 pve kernel:  ? hrtimer_interrupt+0x11f/0x250
Aug 08 09:19:53 pve kernel:  __irq_exit_rcu+0xa2/0xd0
Aug 08 09:19:53 pve kernel:  irq_exit_rcu+0xe/0x20
Aug 08 09:19:53 pve kernel:  sysvec_apic_timer_interrupt+0x92/0xd0
Aug 08 09:19:53 pve kernel:  </IRQ>
Aug 08 09:19:53 pve kernel:  <TASK>
Aug 08 09:19:53 pve kernel:  asm_sysvec_apic_timer_interrupt+0x1b/0x20
Aug 08 09:19:53 pve kernel: RIP: 0010:cpuidle_enter_state+0xde/0x6f0
Aug 08 09:19:53 pve kernel: Code: 27 97 52 e8 d4 79 4a ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 02 82 49 ff 80 7d d0 00 0f 85 eb 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 12 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c7 04 00 00
Aug 08 09:19:53 pve kernel: RSP: 0018:ffffad14000cbe38 EFLAGS: 00000246
Aug 08 09:19:53 pve kernel: RAX: 0000000000000000 RBX: ffff8b532fcbd900 RCX: 0000000000000000
Aug 08 09:19:53 pve kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
Aug 08 09:19:53 pve kernel: RBP: ffffad14000cbe88 R08: 0000000000000000 R09: 0000000000000000
Aug 08 09:19:53 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffaeec33e0
Aug 08 09:19:53 pve kernel: R13: 0000000000000007 R14: 0000000000000007 R15: 0000403e891e370d
Aug 08 09:19:53 pve kernel:  ? cpuidle_enter_state+0xce/0x6f0
Aug 08 09:19:53 pve kernel:  cpuidle_enter+0x2e/0x50
Aug 08 09:19:53 pve kernel:  do_idle+0x216/0x2a0
Aug 08 09:19:53 pve kernel:  cpu_startup_entry+0x1d/0x20
Aug 08 09:19:53 pve kernel:  start_secondary+0x122/0x160
Aug 08 09:19:53 pve kernel:  secondary_startup_64_no_verify+0xe5/0xeb
Aug 08 09:19:53 pve kernel:  </TASK>
Aug 08 09:19:53 pve kernel: ---[ end trace 0000000000000000 ]---
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:19:53 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:19:54 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:19:54 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:19:57 pve pvestatd[1079]: BackupToPVE2: error fetching datastores - 500 Can't connect to 192.168.0.60:8007 (No route to host)
Aug 08 09:20:03 pve kernel: net_ratelimit: 9 callbacks suppressed
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_chipcmd_cond == 1 (loop: 100, delay: 100).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_ephyar_cond == 1 (loop: 100, delay: 10).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:20:03 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).
Aug 08 09:20:04 pve kernel: r8169 0000:01:00.0 enp1s0: rtl_eriar_cond == 1 (loop: 100, delay: 100).

An der Netzwerk/Cluster-Konfiguration wurde nichts verändert, hier trotzdem der Vollständigkeit halber:
Code:
root@pve:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.0.27/24
        gateway 192.168.0.1
        bridge-ports enp1s0
        bridge-stp off
        bridge-fd 0
Code:
root@pve:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.0.27
  }
  node {
    name: pve2
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.0.9
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Proxmox-Farm
  config_version: 4
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

Aktueller Kernel im Einsatz: 6.2.16-6-pve

Ich werde zunächst auch versuchen, einen niedrigeren Kernel einzusetzen.
Hi, ich sehe kein quorum bei dir. Hast du das Corosync Netzwerk mal getestet? Bei dir scheint es kein Kernelproblem zu sein, wenn Corosync auslöst.
 
Hi, ich sehe kein quorum bei dir. Hast du das Corosync Netzwerk mal getestet? Bei dir scheint es kein Kernelproblem zu sein, wenn Corosync auslöst.

Momentan sind bei mir nur zwei Nodes in einem Cluster im Einsatz, daher kein Quorum. Aber ist Corosync nicht eine Folge vom Verbindungsverlust?
Hier das Log aus Sicht der zweiten Node, die durchgehend erreichbar ist:
Code:
Aug 08 09:17:19 pve2 corosync[923]:   [KNET  ] link: host: 1 link: 0 is down
Aug 08 09:17:19 pve2 corosync[923]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 08 09:17:19 pve2 corosync[923]:   [KNET  ] host: host: 1 has no active links
Aug 08 09:17:20 pve2 corosync[923]:   [TOTEM ] Token has not been received in 2250 ms
Aug 08 09:17:21 pve2 corosync[923]:   [TOTEM ] A processor failed, forming new configuration: token timed out (3000ms), waiting 3600ms for consensus.
Aug 08 09:17:25 pve2 corosync[923]:   [QUORUM] Sync members[1]: 2
Aug 08 09:17:25 pve2 corosync[923]:   [QUORUM] Sync left[1]: 1
Aug 08 09:17:25 pve2 corosync[923]:   [TOTEM ] A new membership (2.fb) was formed. Members left: 1
Aug 08 09:17:25 pve2 corosync[923]:   [TOTEM ] Failed to receive the leave message. failed: 1
Aug 08 09:17:25 pve2 pmxcfs[821]: [dcdb] notice: members: 2/821
Aug 08 09:17:25 pve2 pmxcfs[821]: [status] notice: members: 2/821
Aug 08 09:17:25 pve2 corosync[923]:   [QUORUM] This node is within the non-primary component and will NOT provide any services.
Aug 08 09:17:25 pve2 corosync[923]:   [QUORUM] Members[1]: 2
Aug 08 09:17:25 pve2 corosync[923]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 08 09:17:25 pve2 pmxcfs[821]: [status] notice: node lost quorum
Aug 08 09:17:25 pve2 pmxcfs[821]: [dcdb] crit: received write while not quorate - trigger resync
Aug 08 09:17:25 pve2 pmxcfs[821]: [dcdb] crit: leaving CPG group
Aug 08 09:17:25 pve2 pve-ha-lrm[1137]: unable to write lrm status file - unable to open file '/etc/pve/nodes/pve2/lrm_status.tmp.1137' - Permission denied
Aug 08 09:17:25 pve2 pmxcfs[821]: [dcdb] notice: start cluster connection
Aug 08 09:17:25 pve2 pmxcfs[821]: [dcdb] crit: cpg_join failed: 14
Aug 08 09:17:25 pve2 pmxcfs[821]: [dcdb] crit: can't initialize service
Aug 08 09:17:31 pve2 pmxcfs[821]: [dcdb] notice: members: 2/821
Aug 08 09:17:31 pve2 pmxcfs[821]: [dcdb] notice: all data is up to date
Aug 08 09:18:12 pve2 pvescheduler[418950]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:18:12 pve2 pvescheduler[418949]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:19:12 pve2 pvescheduler[419213]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:19:12 pve2 pvescheduler[419212]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:20:12 pve2 pvescheduler[419477]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:20:12 pve2 pvescheduler[419476]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:21:12 pve2 pvescheduler[419746]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:21:12 pve2 pvescheduler[419745]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:22:12 pve2 pvescheduler[420010]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:22:12 pve2 pvescheduler[420009]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:23:12 pve2 pvescheduler[420282]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:23:12 pve2 pvescheduler[420281]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:24:12 pve2 pvescheduler[420551]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:24:12 pve2 pvescheduler[420550]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:25:12 pve2 pvescheduler[420815]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:25:12 pve2 pvescheduler[420814]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:26:12 pve2 pvescheduler[421081]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:26:12 pve2 pvescheduler[421080]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:27:12 pve2 pvescheduler[421345]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:27:12 pve2 pvescheduler[421344]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:28:12 pve2 pvescheduler[421609]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:28:12 pve2 pvescheduler[421608]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:29:12 pve2 pvescheduler[421872]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:29:12 pve2 pvescheduler[421871]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:29:38 pve2 pvedaemon[973]: <root@pam> successful auth for user 'root@pam'
Aug 08 09:30:12 pve2 pvescheduler[422135]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:30:12 pve2 pvescheduler[422134]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:31:13 pve2 pvescheduler[422404]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:31:13 pve2 pvescheduler[422403]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:32:12 pve2 pvescheduler[422669]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Aug 08 09:32:12 pve2 pvescheduler[422668]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Aug 08 09:32:48 pve2 corosync[923]:   [KNET  ] rx: host: 1 link: 0 is up
Aug 08 09:32:48 pve2 corosync[923]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Aug 08 09:32:48 pve2 corosync[923]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Aug 08 09:32:48 pve2 corosync[923]:   [KNET  ] pmtud: Global data MTU changed to: 1397
Aug 08 09:32:48 pve2 corosync[923]:   [QUORUM] Sync members[2]: 1 2
Aug 08 09:32:48 pve2 corosync[923]:   [QUORUM] Sync joined[1]: 1
Aug 08 09:32:48 pve2 corosync[923]:   [TOTEM ] A new membership (1.104) was formed. Members joined: 1
Aug 08 09:32:48 pve2 corosync[923]:   [QUORUM] This node is within the primary component and will provide service.
Aug 08 09:32:48 pve2 corosync[923]:   [QUORUM] Members[2]: 1 2
Aug 08 09:32:48 pve2 corosync[923]:   [MAIN  ] Completed service synchronization, ready to provide service.
Aug 08 09:32:48 pve2 pmxcfs[821]: [status] notice: node has quorum
Aug 08 09:32:51 pve2 pmxcfs[821]: [dcdb] notice: members: 1/959, 2/821
Aug 08 09:32:51 pve2 pmxcfs[821]: [dcdb] notice: starting data syncronisation
Aug 08 09:32:51 pve2 pmxcfs[821]: [dcdb] notice: received sync request (epoch 1/959/00000001)
Aug 08 09:32:51 pve2 pmxcfs[821]: [status] notice: members: 1/959, 2/821
Aug 08 09:32:51 pve2 pmxcfs[821]: [status] notice: starting data syncronisation
Aug 08 09:32:51 pve2 pmxcfs[821]: [status] notice: received sync request (epoch 1/959/00000001)
Aug 08 09:32:51 pve2 pmxcfs[821]: [dcdb] notice: received all states
Aug 08 09:32:51 pve2 pmxcfs[821]: [dcdb] notice: leader is 2/821
Aug 08 09:32:51 pve2 pmxcfs[821]: [dcdb] notice: synced members: 2/821
Aug 08 09:32:51 pve2 pmxcfs[821]: [dcdb] notice: start sending inode updates
Aug 08 09:32:51 pve2 pmxcfs[821]: [dcdb] notice: sent all (2) updates
Aug 08 09:32:51 pve2 pmxcfs[821]: [dcdb] notice: all data is up to date
Aug 08 09:32:51 pve2 corosync[923]:   [TOTEM ] Retransmit List: 15
Aug 08 09:32:51 pve2 corosync[923]:   [TOTEM ] Retransmit List: 15
Aug 08 09:32:51 pve2 corosync[923]:   [TOTEM ] Retransmit List: 15
 
Das sieht so aus als wenn die Netzwerkverbindung ausfällt. Können die Knoten sich sauber pingen?
 
Hey,

nein, node 1 (pve) ist nach einer unbestimmten Zeit einfach nicht mehr anpingbar bzw. erreichbar.
Node 2 (pve2), die hardwareseitig auf dem identischen Modell basiert (Dell Wyse 5070), war bisher von diesem Problem nicht betroffen.
Das mag aber nur eine Frage der Zeit sein.

Ich vermute mittlerweile dass ich von diesem Bug betroffen bin:
r8169 Realtek NIC driver causing system hang on PVE8 (kernel 6.2)

Als Work-Around funktioniert bei vielen das NIC-Kernemodul "r8168-dkms" zu nutzen, da der Kernel scheinbar einen Bug mit dem eingebauten NIC-Modul r8169 hat. Das ist z.B. hier beschrieben. Der WorkAround funktioniert aber bei mir nicht, da bei mir das r8168-Modul nicht geladen werden kann (nicht das blacklisten des r8169 ist das Problem, sondern irgendwas mit dem r8168 selbst), und ich nun auch keine lust auf stundenlanges fixen eines Work-Aroundes habe.

Ich werde weiterhin daher erst einmal den Work-Around eines älteren Kernels versuchen, und das Problem damit hoffentlich aussitzen.
 
Hat hier jemand Neuigkeiten? Ich habe es die Tage wieder mit dem 6.2.16-12-pve Kernel probiert. Problem mit hängendem Netzwerk taucht nach einigen Stunden Betrieb wieder auf. Jetzt wieder 5.15er Kernel gepinnt. Stabiler Betrieb.

Ich würde allerdings schon gerne auf einen 6.2er Kernel gehen. Aber irgendwie scheint es keine Lösung zu geben?
 
Hat hier jemand Neuigkeiten? Ich habe es die Tage wieder mit dem 6.2.16-12-pve Kernel probiert. Problem mit hängendem Netzwerk taucht nach einigen Stunden Betrieb wieder auf. Jetzt wieder 5.15er Kernel gepinnt. Stabiler Betrieb.

Ich würde allerdings schon gerne auf einen 6.2er Kernel gehen. Aber irgendwie scheint es keine Lösung zu geben?

gleiches Bild weiterhin hier (Intel NUC6CAYH): Linux 5.15.108-1-pve stabil, ab 6.x instabil

auch in diesem thread nichts neues:
https://forum.proxmox.com/threads/system-hanging-after-upgrade-nic-driver.129366/page-3
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!