Hi,
I setup a 3 node cluster with a few VM in HA, with both ZFS (boot, backup) and CEPH (VM, Data).
Last night I had a weird event, both network switch automaticly updated at 4am, at the same time.
all 3 node lost quorum but node1 have a 24 drive remote SAS (card+optical cable), so booting this take some time. it look like the Node rebooted following rejoining the Cluster ?
Any idea why? I
I setup a 3 node cluster with a few VM in HA, with both ZFS (boot, backup) and CEPH (VM, Data).
Last night I had a weird event, both network switch automaticly updated at 4am, at the same time.
all 3 node lost quorum but node1 have a 24 drive remote SAS (card+optical cable), so booting this take some time. it look like the Node rebooted following rejoining the Cluster ?
Any idea why? I
Code:
May 09 04:13:04 Proxmox1 pvestatd[4396]: got timeout
May 09 04:13:05 Proxmox1 ceph-osd[107593]: 2024-05-09T04:13:05.422-0400 7b7d7bec36c0 -1 osd.47 3973 heartbeat_check: no reply from 10.9.9.9:6834 osd.33 since back 2024-05-09T04:13:02.420761-0400 front 2024-05-09T04:12:12.714875-0400 (oldest deadline 2024-05-09T04:12:33.814527-0400)
May 09 04:13:05 Proxmox1 ceph-osd[107593]: 2024-05-09T04:13:05.422-0400 7b7d7bec36c0 -1 osd.47 3973 heartbeat_check: no reply from 10.9.9.9:6854 osd.38 since back 2024-05-09T04:13:02.420790-0400 front 2024-05-09T04:12:12.714837-0400 (oldest deadline 2024-05-09T04:12:33.814527-0400)
May 09 04:13:05 Proxmox1 pve-ha-lrm[5079]: successfully acquired lock 'ha_agent_Proxmox1_lock'
May 09 04:13:05 Proxmox1 pve-ha-lrm[5079]: status change lost_agent_lock => active
May 09 04:13:05 Proxmox1 watchdog-mux[3452]: exit watchdog-mux with active connections
May 09 04:13:05 Proxmox1 systemd-journald[1908]: Received client request to sync journal.
May 09 04:13:05 Proxmox1 kernel: watchdog: watchdog0: watchdog did not stop!
-- Boot be24e83b2fe84f3ab749263ae7e9b45a --
May 09 04:16:02 Proxmox1 kernel: Linux version 6.5.13-5-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) ()
May 09 04:16:02 Proxmox1 kernel: Command line: initrd=\EFI\proxmox\6.5.13-5-pve\initrd.img-6.5.13-5-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
May 09 04:16:02 Proxmox1 kernel: KERNEL supported cpus:
May 09 04:16:02 Proxmox1 kernel: Intel GenuineIntel
May 09 04:16:02 Proxmox1 kernel: AMD AuthenticAMD
May 09 04:16:02 Proxmox1 kernel: Hygon HygonGenuine
May 09 04:16:02 Proxmox1 kernel: Centaur CentaurHauls
May 09 04:16:02 Proxmox1 kernel: zhaoxin Shanghai
May 09 04:16:02 Proxmox1 kernel: BIOS-provided physical RAM map:
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x0000000000100000-0x000000007a088fff] usable
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007a089000-0x000000007af0afff] reserved
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007af0b000-0x000000007b93afff] ACPI NVS
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007b93b000-0x000000007bab2fff] ACPI data
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007bab3000-0x000000007bae8fff] usable
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007bae9000-0x000000007bafefff] ACPI data
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007baff000-0x000000007bafffff] usable
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x0000000080000000-0x000000008fffffff] reserved
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x00000000feda8000-0x00000000fedabfff] reserved
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x00000000ff310000-0x00000000ffffffff] reserved
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x0000000100000000-0x000000807fffffff] usable
May 09 04:16:02 Proxmox1 kernel: NX (Execute Disable) protection: active