Plötzlicher Reboot nach Erstellen von Cluster

rothkraut

Member
Nov 25, 2025
36
14
8
Heyho zusammen,

ich hatte gerade ein ganz speziellen Fehler.

Ich habe an einem Außenstandort einen Proxmox-Host mit einigen VMs stehen. Ich habe nun endlich Hardware für den zweiten Host vor Ort und wollte ein Cluster erstellen.

Auf der bereits konfigurierten Node habe ich über die Web-GUI ein Cluster erstellen wollen. Habe die Links (ich nutze zwei Corosync-Netzen) angelegt. Per simplen Ping auf die andere Node habe ich vorher auch überprüft, dass die beiden NICs über diese Netze kommunizieren können.

Das Task lief auch mit "OK" sauber durch. Alle VMs waren kurz mit einem Fragezeichen versehen. Aber dann wieder grün. Als ich dann die Join-Information abrufen wollte hat die Web-Gui nicht mehr funktioniert.

Der Server war plötzlich am rebooten.

Server kam nach kurzer Zeit wieder hoch. Cluster war immer noch erstellt. VMs liefen wieder an etc.


Im Journal finde ich nur einige dieser Fehlermeldungen:
kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1

Journal für Corosync:

Code:
May 04 16:15:23 Server-01 systemd[1]: corosync.service - Corosync Cluster Engine was skipped because of an unmet condition check (ConditionPathExists=/etc/corosync/corosync.conf).
Jun 10 18:43:28 Server-01 systemd[1]: corosync.service - Corosync Cluster Engine was skipped because of an unmet condition check (ConditionPathExists=/etc/corosync/corosync.conf).
Jun 10 18:43:39 Server-01 systemd[1]: Starting corosync.service - Corosync Cluster Engine...
Jun 10 18:43:39 Server-01 (corosync)[1026481]: corosync.service: Referenced but unset environment variable evaluates to an empty string: COROSYNC_OPTIONS
Jun 10 18:43:39 Server-01 corosync[1026481]:   [MAIN  ] Corosync Cluster Engine  starting up
Jun 10 18:43:39 Server-01 corosync[1026481]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim nozzle snmp pie relro bindnow
Jun 10 18:43:39 Server-01 corosync[1026481]:   [TOTEM ] Initializing transport (Kronosnet).
Jun 10 18:43:39 Server-01 corosync[1026481]:   [TOTEM ] totemknet initialized
Jun 10 18:43:39 Server-01 corosync[1026481]:   [KNET  ] pmtud: MTU manually set to: 0
Jun 10 18:43:39 Server-01 corosync[1026481]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: cmap
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: cfg
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: cpg
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] Watchdog not enabled by configuration
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] resource load_15min missing a recovery key.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] resource memory_used missing a recovery key.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] no resources configured.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync watchdog service [7]



Seit dem reboot läuft der Host ohne Probleme. Davor lief er auch seit Mai ohne Schwierigkeiten.

Ich kann mir vorstellen, das es erneut passiert, sobald ich versuche die zweite Node zum Cluster zu adden.

PVE Version ist die 9.1.9
 
Last edited:
Hallo @rothkraut

vielen Dank für deinen Post!

Aus dem Corosync log geht jetzt erstmal nichts hervor, was einen Reboot des Servers auslösen würde.
Kannst du bitte das komplette Journal ohne Filter für den Zeitraum um den Reboot zur Verfügung stellen?

MfG
Jonas
 
Die WRMSR-Meldung (Unhandled WRMSR(0x1d9)) ist ein Red Herring, das ist nur ein MSR (IA32_DEBUGCTL) den der Gast schreiben will aber der Host nicht durchreicht. Passiert ständig und ist harmlos.

Was den Reboot angeht: das Corosync-Log sieht normal aus, das ist einfach der Start nach dem Reboot. Spannend wäre eher, was VOR dem Reboot im Journal steht. Neben dem was @j.theisen schon gefragt hat, schau mal ob last -x reboot shutdown dir was zeigt und ob unter /var/crash/ oder via coredumpctl list ein Kernel-Dump liegt. Wenn der Host sauber neugestartet wurde (kein harter Reset), müsste im Journal noch was stehen, z.B. OOM-Killer oder ein Kernel-Oops. Bei einem harten Lockup/Panic ist leider oft nix mehr im Journal, weil der nicht mehr zum Schreiben kam.

Welchen Kernel fährst du? uname -r wäre gut zu wissen.
 
Die WRMSR-Meldung (Unhandled WRMSR(0x1d9)) ist ein Red Herring, das ist nur ein MSR (IA32_DEBUGCTL) den der Gast schreiben will aber der Host nicht durchreicht. Passiert ständig und ist harmlos.

Was den Reboot angeht: das Corosync-Log sieht normal aus, das ist einfach der Start nach dem Reboot. Spannend wäre eher, was VOR dem Reboot im Journal steht. Neben dem was @j.theisen schon gefragt hat, schau mal ob last -x reboot shutdown dir was zeigt und ob unter /var/crash/ oder via coredumpctl list ein Kernel-Dump liegt. Wenn der Host sauber neugestartet wurde (kein harter Reset), müsste im Journal noch was stehen, z.B. OOM-Killer oder ein Kernel-Oops. Bei einem harten Lockup/Panic ist leider oft nix mehr im Journal, weil der nicht mehr zum Schreiben kam.

Welchen Kernel fährst du? uname -r wäre gut zu wissen.


Unter /var/ gibt es directory "crash" nicht.

Code:
uname -r
7.0.0-3-pve


Code:
last -x reboot shutdown

reboot   system boot  7.0.0-3-pve      Wed Jun 10 18:46 - still running
reboot   system boot  7.0.0-3-pve      Mon May  4 16:15 - crash
reboot   system boot  6.17.2-1-pve     Mon May  4 15:56 - 16:13  (00:17)
shutdown system down  6.17.2-1-pve     Mon May  4 16:13 - 16:15  (00:01)

Hier wäre der letzte Crash 04. Mai gewesen. Kann mich da aber nicht entsinnen, dass da was vorgefallen ist. Glaube da habe ich den Server aufgesetzt.



Code:
Jun 10 18:43:28 Server-01 pvedaemon[1012156]: <root@pam> starting task UPID:Server-01:000FA998:131B7FCC:6A293FD0:clustercreate:C-PVE-PNR-01:root@pam:
Jun 10 18:43:28 Server-01 systemd[1]: corosync.service - Corosync Cluster Engine was skipped because of an unmet condition check (ConditionPathExists=/etc/corosync/corosync.conf).
Jun 10 18:43:28 Server-01 systemd[1]: Stopping pve-cluster.service - The Proxmox VE cluster filesystem...
Jun 10 18:43:28 Server-01 pmxcfs[1278]: [main] notice: teardown filesystem
Jun 10 18:43:28 Server-01 systemd[1]: etc-pve.mount: Deactivated successfully.

Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: State 'stop-sigterm' timed out. Killing.
Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: Killing process 1278 (pmxcfs) with signal SIGKILL.
Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=9/KILL
Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: Failed with result 'timeout'.
Jun 10 18:43:38 Server-01 systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: Consumed 2h 40min 25.585s CPU time, 145.3M memory peak.
Jun 10 18:43:38 Server-01 systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jun 10 18:43:38 Server-01 pmxcfs[1026471]: [main] notice: resolved node name 'Server-01' to '10.155.101.11' for default node IP address
Jun 10 18:43:38 Server-01 pmxcfs[1026471]: [main] notice: resolved node name 'Server-01' to '10.155.101.11' for default node IP address
Jun 10 18:43:38 Server-01 pmxcfs[1026471]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 1)
Jun 10 18:43:38 Server-01 pmxcfs[1026471]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 1)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [quorum] crit: quorum_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [quorum] crit: can't initialize service
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [confdb] crit: cmap_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [confdb] crit: can't initialize service
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [dcdb] crit: cpg_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [dcdb] crit: can't initialize service
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [status] crit: cpg_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [status] crit: can't initialize service
Jun 10 18:43:38 Server-01 pveproxy[1023520]: ipcc_send_rec[1] failed: Permission denied
Jun 10 18:43:38 Server-01 pveproxy[1022400]: ipcc_send_rec[1] failed: Permission denied
Jun 10 18:43:38 Server-01 pveproxy[1023658]: ipcc_send_rec[1] failed: Permission denied
Jun 10 18:43:39 Server-01 systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
Jun 10 18:43:39 Server-01 systemd[1]: Starting corosync.service - Corosync Cluster Engine...
Jun 10 18:43:39 Server-01 pvedaemon[1012156]: <root@pam> end task UPID:Server-01:000FA998:131B7FCC:6A293FD0:clustercreate:C-PVE-PNR-01:root@pam: OK
Jun 10 18:43:39 Server-01 (corosync)[1026481]: corosync.service: Referenced but unset environment variable evaluates to an empty string: COROSYNC_OPTIONS
Jun 10 18:43:39 Server-01 corosync[1026481]:   [MAIN  ] Corosync Cluster Engine  starting up
Jun 10 18:43:39 Server-01 corosync[1026481]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim nozzle snmp pie relro bindnow
Jun 10 18:43:39 Server-01 corosync[1026481]:   [TOTEM ] Initializing transport (Kronosnet).
Jun 10 18:43:39 Server-01 pve-ha-lrm[1516]: lost lock 'ha_agent_Server-01_lock - cfs lock update failed - Permission denied
Jun 10 18:43:39 Server-01 pve-ha-crm[1501]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied
Jun 10 18:43:39 Server-01 kernel: sctp: Hash tables configured (bind 8192/8192)
Jun 10 18:43:39 Server-01 corosync[1026481]:   [TOTEM ] totemknet initialized
Jun 10 18:43:39 Server-01 corosync[1026481]:   [KNET  ] pmtud: MTU manually set to: 0
Jun 10 18:43:39 Server-01 corosync[1026481]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: cmap
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: cfg
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: cpg
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] Watchdog not enabled by configuration
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] resource load_15min missing a recovery key.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] resource memory_used missing a recovery key.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] no resources configured.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QUORUM] Using quorum provider corosync_votequorum
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QUORUM] This node is within the primary component and will provide service.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QUORUM] Members[0]:
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: votequorum
Jun 10 18:43:40 Server-01 systemd[1]: Started corosync.service - Corosync Cluster Engine.
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [status] notice: update cluster info (cluster name  C-PVE-PNR-01, version = 1)
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [status] notice: node has quorum
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [dcdb] notice: members: 1/1026473
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [dcdb] notice: all data is up to date
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [status] notice: members: 1/1026473
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [status] notice: all data is up to date
Jun 10 18:43:44 Server-01 pve-ha-lrm[1516]: status change active => lost_agent_lock
Jun 10 18:43:44 Server-01 pve-ha-crm[1501]: status change master => lost_manager_lock
Jun 10 18:43:44 Server-01 pve-ha-crm[1501]: watchdog closed (disabled)
Jun 10 18:43:44 Server-01 watchdog-mux[1111]: client (PID 1501) has disconnected cleanly. Removing it gracefully from the watch
Jun 10 18:43:44 Server-01 pve-ha-crm[1501]: status change lost_manager_lock => wait_for_quorum
Jun 10 18:43:54 Server-01 pve-ha-crm[1501]: status change wait_for_quorum => slave
Jun 10 18:43:54 Server-01 pve-ha-crm[1501]: status change wait_for_quorum => slave
Jun 10 18:44:11 Server-01 watchdog-mux[1111]: client (PID 1516) watchdog is about to expire
Jun 10 18:44:11 Server-01 systemd-journald[638]: Received client request to sync journal.

Jun 10 18:44:21 Server-01 watchdog-mux[1111]: client (PID 1516) watchdog expired - disable watchdog updates
Jun 10 18:44:22 Server-01 watchdog-mux[1111]: exit watchdog-mux with active connections
Jun 10 18:44:22 Server-01 systemd-journald[638]: Received client request to sync journal.
Jun 10 18:44:22 Server-01 kernel: watchdog: watchdog0: watchdog did not stop!
Jun 10 18:44:22 Server-01 watchdog-mux[1111]: exit watchdog-mux with active connections
Jun 10 18:44:19 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:19 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:19 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:19 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:21 Server-01 watchdog-mux[1111]: client (PID 1516) watchdog expired - disable watchdog updates
Jun 10 18:44:22 Server-01 watchdog-mux[1111]: exit watchdog-mux with active connections
Jun 10 18:44:22 Server-01 systemd-journald[638]: Received client request to sync journal.
Jun 10 18:44:22 Server-01 kernel: watchdog: watchdog0: watchdog did not stop!
Jun 10 18:44:22 Server-01 systemd[1]: watchdog-mux.service: Deactivated successfully.

Jun 10 18:44:21 Server-01 watchdog-mux[1111]: client (PID 1516) watchdog expired - disable watchdog updates
Jun 10 18:44:22 Server-01 watchdog-mux[1111]: exit watchdog-mux with active connections
Jun 10 18:44:22 Server-01 systemd-journald[638]: Received client request to sync journal.
Jun 10 18:44:22 Server-01 kernel: watchdog: watchdog0: watchdog did not stop!
Jun 10 18:44:22 Server-01 systemd[1]: watchdog-mux.service: Deactivated successfully.
Jun 10 18:44:22 Server-01 systemd[1]: watchdog-mux.service: Consumed 1min 13.854s CPU time, 2.2M memory Peak
 
Last edited:
OK, das last -x ist aufschlussreich. Das bedeutet, die Session die am 4. Mai gestartet wurde endete mit einem Crash – also keinem sauberen Shutdown. Das war dann dein Reboot am 10. Juni. Bestätigt also: harter Absturz, kein geordneter Reboot.

Das Journal das du gepostet hast geht aber nur bis 18:43:40 (Corosync-Start). Laut last war der Reboot erst um 18:46. Die 3 Minuten dazwischen fehlen noch. Kannst du nochmal das komplette Journal für den Zeitraum 18:43 bis 18:46 posten? Am besten mit:
Code:
journalctl --since "2026-06-10 18:43:00" --until "2026-06-10 18:47:00" --no-pager
Und coredumpctl list hattest du noch nicht gepostet, das würde auch helfen. Wenn da nix drin steht war es vermutlich ein Kernel-Panic der nicht mehr geloggt werden konnte, aber schauen wir mal.
 
  • Like
Reactions: rothkraut
Code:
Jun 10 18:43:00 Server-01 pvedaemon[1012156]: <root@pam> successful auth for user 'root@pam'
Jun 10 18:43:28 Server-01 pvedaemon[1012156]: <root@pam> starting task UPID:Server-01:000FA998:131B7FCC:6A293FD0:clustercreate:C-PVE-PNR-01:root@pam:
Jun 10 18:43:28 Server-01 systemd[1]: corosync.service - Corosync Cluster Engine was skipped because of an unmet condition check (ConditionPathExists=/etc/corosync/corosync.conf).
Jun 10 18:43:28 Server-01 systemd[1]: Stopping pve-cluster.service - The Proxmox VE cluster filesystem...
Jun 10 18:43:28 Server-01 pmxcfs[1278]: [main] notice: teardown filesystem
Jun 10 18:43:28 Server-01 systemd[1]: etc-pve.mount: Deactivated successfully.
Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: State 'stop-sigterm' timed out. Killing.
Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: Killing process 1278 (pmxcfs) with signal SIGKILL.
Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: Main process exited, code=killed, status=9/KILL
Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: Failed with result 'timeout'.
Jun 10 18:43:38 Server-01 systemd[1]: Stopped pve-cluster.service - The Proxmox VE cluster filesystem.
Jun 10 18:43:38 Server-01 systemd[1]: pve-cluster.service: Consumed 2h 40min 25.585s CPU time, 145.3M memory peak.
Jun 10 18:43:38 Server-01 systemd[1]: Starting pve-cluster.service - The Proxmox VE cluster filesystem...
Jun 10 18:43:38 Server-01 pmxcfs[1026471]: [main] notice: resolved node name 'Server-01' to '10.155.101.11' for default node IP address
Jun 10 18:43:38 Server-01 pmxcfs[1026471]: [main] notice: resolved node name 'Server-01' to '10.155.101.11' for default node IP address
Jun 10 18:43:38 Server-01 pmxcfs[1026471]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 1)
Jun 10 18:43:38 Server-01 pmxcfs[1026471]: [dcdb] notice: wrote new corosync config '/etc/corosync/corosync.conf' (version = 1)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [quorum] crit: quorum_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [quorum] crit: can't initialize service
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [confdb] crit: cmap_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [confdb] crit: can't initialize service
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [dcdb] crit: cpg_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [dcdb] crit: can't initialize service
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [status] crit: cpg_initialize failed: CS_ERR_LIBRARY (failed to connect to corosync)
Jun 10 18:43:38 Server-01 pmxcfs[1026473]: [status] crit: can't initialize service
Jun 10 18:43:38 Server-01 pveproxy[1023520]: ipcc_send_rec[1] failed: Permission denied
Jun 10 18:43:38 Server-01 pveproxy[1022400]: ipcc_send_rec[1] failed: Permission denied
Jun 10 18:43:38 Server-01 pveproxy[1023658]: ipcc_send_rec[1] failed: Permission denied
Jun 10 18:43:39 Server-01 systemd[1]: Started pve-cluster.service - The Proxmox VE cluster filesystem.
Jun 10 18:43:39 Server-01 systemd[1]: Starting corosync.service - Corosync Cluster Engine...
Jun 10 18:43:39 Server-01 pvedaemon[1012156]: <root@pam> end task UPID:Server-01:000FA998:131B7FCC:6A293FD0:clustercreate:C-PVE-PNR-01:root@pam: OK
Jun 10 18:43:39 Server-01 (corosync)[1026481]: corosync.service: Referenced but unset environment variable evaluates to an empty string: COROSYNC_OPTIONS
Jun 10 18:43:39 Server-01 corosync[1026481]:   [MAIN  ] Corosync Cluster Engine  starting up
Jun 10 18:43:39 Server-01 corosync[1026481]:   [MAIN  ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf vqsim nozzle snmp pie relro bindnow
Jun 10 18:43:39 Server-01 corosync[1026481]:   [TOTEM ] Initializing transport (Kronosnet).
Jun 10 18:43:39 Server-01 pve-ha-lrm[1516]: lost lock 'ha_agent_Server-01_lock - cfs lock update failed - Permission denied
Jun 10 18:43:39 Server-01 pve-ha-crm[1501]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied
Jun 10 18:43:39 Server-01 kernel: sctp: Hash tables configured (bind 8192/8192)
Jun 10 18:43:39 Server-01 corosync[1026481]:   [TOTEM ] totemknet initialized
Jun 10 18:43:39 Server-01 corosync[1026481]:   [KNET  ] pmtud: MTU manually set to: 0
Jun 10 18:43:39 Server-01 corosync[1026481]:   [KNET  ] common: crypto_nss.so has been loaded from /usr/lib/x86_64-linux-gnu/kronosnet/crypto_nss.so
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync configuration map access [0]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: cmap
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync configuration service [1]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: cfg
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: cpg
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync profile loading service [4]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync resource monitoring service [6]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] Watchdog not enabled by configuration
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] resource load_15min missing a recovery key.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] resource memory_used missing a recovery key.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [WD    ] no resources configured.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync watchdog service [7]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QUORUM] Using quorum provider corosync_votequorum
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QUORUM] This node is within the primary component and will provide service.
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QUORUM] Members[0]:
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync vote quorum service v1.0 [5]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: votequorum
Jun 10 18:43:40 Server-01 corosync[1026481]:   [SERV  ] Service engine loaded: corosync cluster quorum service v0.1 [3]
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QB    ] server name: quorum
Jun 10 18:43:40 Server-01 corosync[1026481]:   [TOTEM ] Configuring link 0
Jun 10 18:43:40 Server-01 corosync[1026481]:   [TOTEM ] Configured link number 0: local addr: 10.155.110.11, port=5405
Jun 10 18:43:40 Server-01 corosync[1026481]:   [TOTEM ] Configuring link 1
Jun 10 18:43:40 Server-01 corosync[1026481]:   [TOTEM ] Configured link number 1: local addr: 10.155.110.111, port=5406
Jun 10 18:43:40 Server-01 corosync[1026481]:   [KNET  ] link: Resetting MTU for link 0 because host 1 joined
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QUORUM] Sync members[1]: 1
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QUORUM] Sync joined[1]: 1
Jun 10 18:43:40 Server-01 corosync[1026481]:   [TOTEM ] A new membership (1.5) was formed. Members joined: 1
Jun 10 18:43:40 Server-01 corosync[1026481]:   [QUORUM] Members[1]: 1
Jun 10 18:43:40 Server-01 corosync[1026481]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jun 10 18:43:40 Server-01 systemd[1]: Started corosync.service - Corosync Cluster Engine.
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [status] notice: update cluster info (cluster name  C-PVE-PNR-01, version = 1)
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [status] notice: node has quorum
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [dcdb] notice: members: 1/1026473
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [dcdb] notice: all data is up to date
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [status] notice: members: 1/1026473
Jun 10 18:43:44 Server-01 pmxcfs[1026473]: [status] notice: all data is up to date
Jun 10 18:43:44 Server-01 pve-ha-lrm[1516]: status change active => lost_agent_lock
Jun 10 18:43:44 Server-01 pve-ha-crm[1501]: status change master => lost_manager_lock
Jun 10 18:43:44 Server-01 pve-ha-crm[1501]: watchdog closed (disabled)
Jun 10 18:43:44 Server-01 watchdog-mux[1111]: client (PID 1501) has disconnected cleanly. Removing it gracefully from the watch
Jun 10 18:43:44 Server-01 pve-ha-crm[1501]: status change lost_manager_lock => wait_for_quorum
Jun 10 18:43:54 Server-01 pve-ha-crm[1501]: status change wait_for_quorum => slave
Jun 10 18:44:11 Server-01 watchdog-mux[1111]: client (PID 1516) watchdog is about to expire
Jun 10 18:44:11 Server-01 systemd-journald[638]: Received client request to sync journal.
Jun 10 18:44:22 Server-01 watchdog-mux[1111]: exit watchdog-mux with active connections
Jun 10 18:44:22 Server-01 systemd-journald[638]: Received client request to sync journal.
Jun 10 18:44:22 Server-01 kernel: watchdog: watchdog0: watchdog did not stop!
Jun 10 18:44:22 Server-01 systemd[1]: watchdog-mux.service: Deactivated successfully.
Jun 10 18:44:22 Server-01 systemd[1]: watchdog-mux.service: Consumed 1min 13.854s CPU time, 2.2M memory peak.
Jun 10 18:44:24 Server-01 kernel: kvm_pr_unimpl_wrmsr: 667 callbacks suppressed
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:24 Server-01 kernel: kvm_intel: kvm [554381]: vcpu0, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_pr_unimpl_wrmsr: 666 callbacks suppressed
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
Jun 10 18:44:29 Server-01 kernel: kvm_intel: kvm [554381]: vcpu1, guest rIP: 0xfffff8056e7c97d2 Unhandled WRMSR(0x1d9) = 0x1
-- Boot 96e0ac11a8424059baf8b1f13d8787e3 --
Jun 10 18:46:11 Server-01 kernel: Linux version 7.0.0-3-pve (build@proxmox) (gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC PMX 7.0.0-3 (2026-04-21T22:56Z) ()
Jun 10 18:46:11 Server-01 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-7.0.0-3-pve root=/dev/mapper/pve-root ro quiet
Jun 10 18:46:11 Server-01 kernel: KERNEL supported cpus:
Jun 10 18:46:11 Server-01 kernel:   Intel GenuineIntel
Jun 10 18:46:11 Server-01 kernel:   AMD AuthenticAMD
Jun 10 18:46:11 Server-01 kernel:   Hygon HygonGenuine
Jun 10 18:46:11 Server-01 kernel:   Centaur CentaurHauls
Jun 10 18:46:11 Server-01 kernel:   zhaoxin   Shanghai
Jun 10 18:46:11 Server-01 kernel: x86/tme: not enabled by BIOS
Jun 10 18:46:11 Server-01 kernel: x86/CPU: Running old microcode
Jun 10 18:46:11 Server-01 kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Jun 10 18:46:11 Server-01 kernel: BIOS-provided physical RAM map:
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff]  System RAM
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x000000000009e000-0x00000000000fffff]  device reserved
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000049ea6fff]  System RAM
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x0000000049ea7000-0x000000004c6a6fff]  ACPI NVS
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x000000004c6a7000-0x00000000641fdfff]  System RAM
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x00000000641fe000-0x0000000074dfefff]  device reserved
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x0000000074dff000-0x00000000771fefff]  ACPI NVS
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x00000000771ff000-0x00000000777fefff]  ACPI data
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x00000000777ff000-0x00000000777fffff]  System RAM
Jun 10 18:46:11 Server-01 kernel: BIOS-e820: [mem 0x0000000077800000-0x000000008fffffff]  device reserved

Log geht nur bis 18:44 Dann startet auch schon der boot.
 
Last edited:
Und coredumpctl list hattest du noch nicht gepostet, das würde auch helfen. Wenn da nix drin steht war es vermutlich ein Kernel-Panic der nicht mehr geloggt werden konnte, aber schauen wir mal.
Code:
coredumpctl list
No coredumps found.


Allerdings war coredumpctl bis gerade eben noch nicht installiert.
 
Journal endet 18:43:44, dann Stille bis zum Reset um 18:46. Kein Coredump. War ein harter Kernel-Panic.

Gut, dass @j.theisen das reproduzieren konnte, dann ist es ein konkreter Bug. Beantworte mal seine Frage wegen HA, das klingt nach dem Trigger.
 
  • Like
Reactions: rothkraut
@j.theisen @Bu66as vielen Dank für die Unterstützung! Dann bin ich erstmal beruhigt. Meine größte Angst war, dass irgend eine Fehlkonfiguration vorliegt.

Ich habe die zweite Node noch nicht hinzugefügt. Dies sollte nun aber auch kein Problem darstellen, da das Cluster ja bereits existiert.
 
  • Like
Reactions: j.theisen