Hello,
I'have 2 proxmox servers fresh installed and after 2 weeks of good state we have put it on production.
from 17 féb. 2023 they have 6 VM on charge.
Today they reboot and crash all production VM at ~17:03 and we are affraid of the next with this solution.
We don't understand why they crash ! The both !
We are not familiar with proxmox log and would like to call help from experts to analyse our logs.
Can a charitable soul help us ?
[/ICODE]
I'have 2 proxmox servers fresh installed and after 2 weeks of good state we have put it on production.
from 17 féb. 2023 they have 6 VM on charge.
Today they reboot and crash all production VM at ~17:03 and we are affraid of the next with this solution.
We don't understand why they crash ! The both !
We are not familiar with proxmox log and would like to call help from experts to analyse our logs.
Can a charitable soul help us ?
Feb 20 17:02:52 yoda kernel: x86/split lock detection: #AC: CPU 0/KVM/2613595 took a split_lock trap at address: 0x7730dfa3
Feb 20 17:02:52 yoda kernel: x86/split lock detection: #AC: CPU 0/KVM/2613595 took a split_lock trap at address: 0x7730dfa3
Feb 20 17:02:52 yoda kernel: x86/split lock detection: #AC: CPU 0/KVM/2613595 took a split_lock trap at address: 0x7730dfa3
Feb 20 17:02:53 yoda ceph-mgr[1509]: 2023-02-20T17:02:53.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:02:53.028010+0100)
Feb 20 17:02:54 yoda ceph-mgr[1509]: 2023-02-20T17:02:54.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:02:54.028119+0100)
Feb 20 17:02:55 yoda ceph-mgr[1509]: 2023-02-20T17:02:55.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:02:55.028256+0100)
Feb 20 17:02:56 yoda ceph-mgr[1509]: 2023-02-20T17:02:56.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:02:56.028434+0100)
Feb 20 17:02:57 yoda ceph-mgr[1509]: 2023-02-20T17:02:57.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:02:57.028581+0100)
Feb 20 17:02:58 yoda ceph-mgr[1509]: 2023-02-20T17:02:58.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:02:58.028734+0100)
Feb 20 17:02:59 yoda ceph-mgr[1509]: 2023-02-20T17:02:59.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:02:59.028902+0100)
Feb 20 17:03:00 yoda ceph-mgr[1509]: 2023-02-20T17:03:00.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:03:00.029048+0100)
Feb 20 17:03:01 yoda ceph-mgr[1509]: 2023-02-20T17:03:01.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:03:01.029224+0100)
Feb 20 17:03:02 yoda ceph-mgr[1509]: 2023-02-20T17:03:02.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:03:02.029367+0100)
Feb 20 17:03:03 yoda ceph-mgr[1509]: 2023-02-20T17:03:03.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:03:03.029512+0100)
Feb 20 17:03:04 yoda ceph-mgr[1509]: 2023-02-20T17:03:04.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:03:04.029686+0100)
Feb 20 17:03:05 yoda ceph-mgr[1509]: 2023-02-20T17:03:05.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:03:05.029859+0100)
Feb 20 17:03:06 yoda ceph-mgr[1509]: 2023-02-20T17:03:06.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:03:06.030041+0100)
Feb 20 17:03:07 yoda ceph-mgr[1509]: 2023-02-20T17:03:07.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:03:07.030209+0100)
Feb 20 17:03:08 yoda ceph-mgr[1509]: 2023-02-20T17:03:08.026+0100 7ff7e4efa700 -1 monclient: _check_auth_rotating possible clock skew, rotating keys expired way too early (before 2023-02-20T16:03:08.030379+0100)
Feb 20 17:03:41 yoda corosync[1511]: [KNET ] link: host: 2 link: 0 is down
Feb 20 17:03:41 yoda corosync[1511]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Feb 20 17:03:41 yoda corosync[1511]: [KNET ] host: host: 2 has no active links
Feb 20 17:03:42 yoda corosync[1511]: [TOTEM ] Token has not been received in 2250 ms
Feb 20 17:03:43 yoda corosync[1511]: [TOTEM ] A processor failed, forming new configuration: token timed out (3000ms), waiting 3600ms for consensus.
Feb 20 17:03:46 yoda corosync[1511]: [QUORUM] Sync members[1]: 1
Feb 20 17:03:46 yoda corosync[1511]: [QUORUM] Sync left[1]: 2
Feb 20 17:03:46 yoda corosync[1511]: [TOTEM ] A new membership (1.6c) was formed. Members left: 2
Feb 20 17:03:46 yoda corosync[1511]: [TOTEM ] Failed to receive the leave message. failed: 2
Feb 20 17:03:46 yoda pmxcfs[1485]: [dcdb] notice: members: 1/1485
Feb 20 17:03:46 yoda pmxcfs[1485]: [status] notice: members: 1/1485
Feb 20 17:03:46 yoda corosync[1511]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Feb 20 17:03:46 yoda corosync[1511]: [QUORUM] Members[1]: 1
Feb 20 17:03:46 yoda corosync[1511]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 20 17:03:46 yoda pmxcfs[1485]: [status] notice: node lost quorum
Feb 20 17:03:46 yoda pmxcfs[1485]: [dcdb] crit: received write while not quorate - trigger resync
Feb 20 17:03:46 yoda pmxcfs[1485]: [dcdb] crit: leaving CPG group
Feb 20 17:03:46 yoda pve-ha-lrm[1626]: lost lock 'ha_agent_yoda_lock - cfs lock update failed - Operation not permitted
Feb 20 17:03:47 yoda pmxcfs[1485]: [dcdb] notice: start cluster connection
Feb 20 17:03:47 yoda pmxcfs[1485]: [dcdb] crit: cpg_join failed: 14
Feb 20 17:03:47 yoda pmxcfs[1485]: [dcdb] crit: can't initialize service
Feb 20 17:03:48 yoda pve-ha-crm[1615]: lost lock 'ha_manager_lock - cfs lock update failed - Permission denied
Feb 20 17:03:51 yoda pve-ha-lrm[1626]: status change active => lost_agent_lock
Feb 20 17:03:53 yoda pmxcfs[1485]: [dcdb] notice: members: 1/1485
Feb 20 17:03:53 yoda pmxcfs[1485]: [dcdb] notice: all data is up to date
Feb 20 17:03:53 yoda pve-ha-crm[1615]: status change master => lost_manager_lock
Feb 20 17:03:53 yoda pve-ha-crm[1615]: watchdog closed (disabled)
Feb 20 17:03:53 yoda pve-ha-crm[1615]: status change lost_manager_lock => wait_for_quorum
Feb 20 17:04:04 yoda kernel: split_lock_warn: 14 callbacks suppressed
Feb 20 17:04:04 yoda kernel: x86/split lock detection: #AC: CPU 0/KVM/2613595 took a split_lock trap at address: 0xfffff8019af8334c
Feb 20 17:04:09 yoda pvescheduler[2811971]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Feb 20 17:04:09 yoda pvescheduler[2811970]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Feb 20 17:04:37 yoda watchdog-mux[1072]: client watchdog expired - disable watchdog updates
-- Reboot --:mad:
Feb 20 17:07:12 yoda kernel: Linux version 5.15.85-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.85-1 (2023-02-01T00:00Z) ()
Feb 20 17:07:12 yoda kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.85-1-pve root=/dev/mapper/pve-root ro quiet
Feb 20 17:07:12 yoda kernel: KERNEL supported cpus:
Feb 20 17:07:12 yoda kernel: Intel GenuineIntel
Feb 20 17:07:12 yoda kernel: AMD AuthenticAMD
Feb 20 17:07:12 yoda kernel: Hygon HygonGenuine
Feb 20 17:07:12 yoda kernel: Centaur CentaurHauls
Feb 20 17:07:12 yoda kernel: zhaoxin Shanghai
Feb 20 17:07:12 yoda kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Feb 20 17:07:12 yoda kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Feb 20 17:07:12 yoda kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Feb 20 17:07:12 yoda kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Feb 20 17:07:12 yoda kernel: x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
Feb 20 17:07:12 yoda kernel: x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
Feb 20 17:07:12 yoda kernel: x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
Feb 20 17:07:12 yoda kernel: x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
Feb 20 17:07:12 yoda kernel: x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
Feb 20 17:07:12 yoda kernel: x86/fpu: xstate_offset[5]: 832, xstate_sizes[5]: 64
Feb 20 17:07:12 yoda kernel: x86/fpu: xstate_offset[6]: 896, xstate_sizes[6]: 512
Feb 20 17:07:12 yoda kernel: x86/fpu: xstate_offset[7]: 1408, xstate_sizes[7]: 1024
Feb 20 17:07:12 yoda kernel: x86/fpu: xstate_offset[9]: 2432, xstate_sizes[9]: 8
Feb 20 17:07:12 yoda kernel: x86/fpu: Enabled xstate features 0x2e7, context size is 2440 bytes, using 'compacted' format.
Feb 20 17:07:12 yoda kernel: signal: max sigframe size: 3632
Feb 20 17:07:12 yoda kernel: BIOS-provided physical RAM map:
Feb 20 17:07:12 yoda kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009bfff] usable
Feb 20 17:07:12 yoda kernel: BIOS-e820: [mem 0x000000000009c000-0x000000000009ffff] reserved[ICODE][ICODE]
[/ICODE][/ICODE]