Node rebooted on lost of network?

Icefire

Member
Mar 20, 2022
18
0
6
43
Hi,
I setup a 3 node cluster with a few VM in HA, with both ZFS (boot, backup) and CEPH (VM, Data).

Last night I had a weird event, both network switch automaticly updated at 4am, at the same time.
all 3 node lost quorum but node1 have a 24 drive remote SAS (card+optical cable), so booting this take some time. it look like the Node rebooted following rejoining the Cluster ?

Any idea why? I

Code:
May 09 04:13:04 Proxmox1 pvestatd[4396]: got timeout
May 09 04:13:05 Proxmox1 ceph-osd[107593]: 2024-05-09T04:13:05.422-0400 7b7d7bec36c0 -1 osd.47 3973 heartbeat_check: no reply from 10.9.9.9:6834 osd.33 since back 2024-05-09T04:13:02.420761-0400 front 2024-05-09T04:12:12.714875-0400 (oldest deadline 2024-05-09T04:12:33.814527-0400)
May 09 04:13:05 Proxmox1 ceph-osd[107593]: 2024-05-09T04:13:05.422-0400 7b7d7bec36c0 -1 osd.47 3973 heartbeat_check: no reply from 10.9.9.9:6854 osd.38 since back 2024-05-09T04:13:02.420790-0400 front 2024-05-09T04:12:12.714837-0400 (oldest deadline 2024-05-09T04:12:33.814527-0400)
May 09 04:13:05 Proxmox1 pve-ha-lrm[5079]: successfully acquired lock 'ha_agent_Proxmox1_lock'
May 09 04:13:05 Proxmox1 pve-ha-lrm[5079]: status change lost_agent_lock => active
May 09 04:13:05 Proxmox1 watchdog-mux[3452]: exit watchdog-mux with active connections
May 09 04:13:05 Proxmox1 systemd-journald[1908]: Received client request to sync journal.
May 09 04:13:05 Proxmox1 kernel: watchdog: watchdog0: watchdog did not stop!
-- Boot be24e83b2fe84f3ab749263ae7e9b45a --
May 09 04:16:02 Proxmox1 kernel: Linux version 6.5.13-5-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) ()
May 09 04:16:02 Proxmox1 kernel: Command line: initrd=\EFI\proxmox\6.5.13-5-pve\initrd.img-6.5.13-5-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
May 09 04:16:02 Proxmox1 kernel: KERNEL supported cpus:
May 09 04:16:02 Proxmox1 kernel:   Intel GenuineIntel
May 09 04:16:02 Proxmox1 kernel:   AMD AuthenticAMD
May 09 04:16:02 Proxmox1 kernel:   Hygon HygonGenuine
May 09 04:16:02 Proxmox1 kernel:   Centaur CentaurHauls
May 09 04:16:02 Proxmox1 kernel:   zhaoxin   Shanghai 
May 09 04:16:02 Proxmox1 kernel: BIOS-provided physical RAM map:
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x0000000000100000-0x000000007a088fff] usable
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007a089000-0x000000007af0afff] reserved
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007af0b000-0x000000007b93afff] ACPI NVS
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007b93b000-0x000000007bab2fff] ACPI data
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007bab3000-0x000000007bae8fff] usable
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007bae9000-0x000000007bafefff] ACPI data
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x000000007baff000-0x000000007bafffff] usable
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x0000000080000000-0x000000008fffffff] reserved
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x00000000feda8000-0x00000000fedabfff] reserved
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x00000000ff310000-0x00000000ffffffff] reserved
May 09 04:16:02 Proxmox1 kernel: BIOS-e820: [mem 0x0000000100000000-0x000000807fffffff] usable
May 09 04:16:02 Proxmox1 kernel: NX (Execute Disable) protection: active
 
If you have HA enabled and the node loses network on the corosync interface, then it will fence itself. This is expected behaviour, you can read more about it in our documentation [1]. It is recommended to have at least one backup link for the cluster network, if you haven't already configured one already.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#ha_manager_fencing
 
Hi,

There is 2 link from 2 seperate switch (2x10gb + 2x25gb) but it seem both switch did an autoupdate at the same time and rebooted.
I cancelled the autoupdate for the whole network.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!