Einer unserer Nodes hat plötzlich den geist aufgegeben:
Vom betroffenen Node:
Das gint einige Zeit so dahin.
Danach kam:
Vom betroffenen Node:
Dec 10 12:52:21 c01-n03 ceph-osd[3327]: 2021-12-10T12:52:07.932+0100 7face6838700 -1 osd.8 3000 heartbeat_check: no reply from 10.20.1.222:6806 osd.5 since back 2021-12-10T12:52:20.392101+0100 front 2021-12-10T12:52:00.689320+0100 (oldest deadline 2021-12-10T12:52:21.189348+0100)
Dec 10 12:52:22 c01-n03 pmxcfs[3195]: [dcdb] notice: members: 3/3195
Dec 10 12:52:22 c01-n03 pmxcfs[3195]: [status] notice: members: 3/3195
Dec 10 12:52:22 c01-n03 pmxcfs[3195]: [status] notice: node lost quorum
Dec 10 12:52:22 c01-n03 ceph-osd[3327]: 2021-12-10T12:52:07.932+0100 7face6838700 -1 osd.8 3000 heartbeat_check: no reply from 10.20.1.222:6815 osd.4 since back 2021-12-10T12:52:01.189632+0100 front 2021-12-10T12:52:22.092342+0100 (oldest deadline 2021-12-10T12:52:22.289582+0100)
Dec 10 12:52:22 c01-n03 ceph-osd[3327]: 2021-12-10T12:52:07.932+0100 7face6838700 -1 osd.8 3000 heartbeat_check: no reply from 10.20.1.222:6806 osd.5 since back 2021-12-10T12:52:22.092326+0100 front 2021-12-10T12:52:00.689320+0100 (oldest deadline 2021-12-10T12:52:21.189348+0100)
Dec 10 12:52:22 c01-n03 pvestatd[3495]: storage 'EXPORT' is not online
Dec 10 12:52:23 c01-n03 ceph-osd[3331]: 2021-12-10T12:52:07.932+0100 7f4278f2a700 -1 osd.6 3000 heartbeat_check: no reply from 10.20.1.221:6816 osd.0 since back 2021-12-10T12:51:57.082448+0100 front 2021-12-10T12:52:20.184349+0100 (oldest deadline 2021-12-10T12:52:22.982259+0100)
Dec 10 12:52:23 c01-n03 ceph-osd[3331]: 2021-12-10T12:52:07.932+0100 7f4278f2a700 -1 osd.6 3000 heartbeat_check: no reply from 10.20.1.221:6817 osd.2 since back 2021-12-10T12:52:20.184409+0100 front 2021-12-10T12:51:57.082386+0100 (oldest deadline 2021-12-10T12:52:22.982259+0100)
Dec 10 12:52:23 c01-n03 pve-ha-crm[3654]: status change slave => wait_for_quorum
Dec 10 12:52:23 c01-n03 ceph-osd[3327]: 2021-12-10T12:52:07.932+0100 7face6838700 -1 osd.8 3000 heartbeat_check: no reply from 10.20.1.222:6815 osd.4 since back 2021-12-10T12:52:01.189632+0100 front 2021-12-10T12:52:22.092342+0100 (oldest deadline 2021-12-10T12:52:22.289582+0100)
Dec 10 12:52:23 c01-n03 ceph-osd[3327]: 2021-12-10T12:52:07.932+0100 7face6838700 -1 osd.8 3000 heartbeat_check: no reply from 10.20.1.222:6806 osd.5 since back 2021-12-10T12:52:22.092326+0100 front 2021-12-10T12:52:00.689320+0100 (oldest deadline 2021-12-10T12:52:21.189348+0100)
Dec 10 12:52:24 c01-n03 ceph-osd[3331]: 2021-12-10T12:52:07.932+0100 7f4278f2a700 -1 osd.6 3000 heartbeat_check: no reply from 10.20.1.221:6816 osd.0 since back 2021-12-10T12:51:57.082448+0100 front 2021-12-10T12:52:23.684565+0100 (oldest deadline 2021-12-10T12:52:22.982259+0100)
Dec 10 12:52:24 c01-n03 ceph-osd[3331]: 2021-12-10T12:52:07.932+0100 7f4278f2a700 -1 osd.6 3000 heartbeat_check: no reply from 10.20.1.221:6817 osd.2 since back 2021-12-10T12:52:23.684596+0100 front 2021-12-10T12:51:57.082386+0100 (oldest deadline 2021-12-10T12:52:22.982259+0100)
Dec 10 12:52:24 c01-n03 ceph-osd[3335]: 2021-12-10T12:52:07.932+0100 7f5639f4d700 -1 osd.7 3000 heartbeat_check: no reply from 10.20.1.222:6817 osd.3 since back 2021-12-10T12:52:21.669515+0100 front 2021-12-10T12:51:59.567512+0100 (oldest deadline 2021-12-10T12:52:23.667604+0100)
Dec 10 12:52:24 c01-n03 ceph-osd[3335]: 2021-12-10T12:52:07.932+0100 7f5639f4d700 -1 osd.7 3000 heartbeat_check: no reply from 10.20.1.222:6815 osd.4 since back 2021-12-10T12:51:59.567627+0100 front 2021-12-10T12:51:59.567527+0100 (oldest deadline 2021-12-10T12:52:23.667604+0100)
Dec 10 12:52:24 c01-n03 ceph-osd[3335]: 2021-12-10T12:52:07.932+0100 7f5639f4d700 -1 osd.7 3000 heartbeat_check: no reply from 10.20.1.222:6806 osd.5 since back 2021-12-10T12:51:59.567598+0100 front 2021-12-10T12:52:21.669544+0100 (oldest deadline 2021-12-10T12:52:23.667604+0100)
Dec 10 12:52:24 c01-n03 ceph-osd[3327]: 2021-12-10T12:52:07.932+0100 7face6838700 -1 osd.8 3000 heartbeat_check: no reply from 10.20.1.222:6815 osd.4 since back 2021-12-10T12:52:01.189632+0100 front 2021-12-10T12:52:22.092342+0100 (oldest deadline 2021-12-10T12:52:22.289582+0100)
Dec 10 12:52:24 c01-n03 ceph-osd[3327]: 2021-12-10T12:52:07.932+0100 7face6838700 -1 osd.8 3000 heartbeat_check: no reply from 10.20.1.222:6806 osd.5 since back 2021-12-10T12:52:22.092326+0100 front 2021-12-10T12:52:00.689320+0100 (oldest deadline 2021-12-10T12:52:21.189348+0100)
Dec 10 12:52:25 c01-n03 ceph-osd[3331]: 2021-12-10T12:52:07.932+0100 7f4278f2a700 -1 osd.6 3001 heartbeat_check: no reply from 10.20.1.221:6816 osd.0 since back 2021-12-10T12:51:57.082448+0100 front 2021-12-10T12:52:23.684565+0100 (oldest deadline 2021-12-10T12:52:22.982259+0100)
Dec 10 12:52:25 c01-n03 ceph-osd[3331]: 2021-12-10T12:52:07.932+0100 7f4278f2a700 -1 osd.6 3001 heartbeat_check: no reply from 10.20.1.221:6817 osd.2 since back 2021-12-10T12:52:23.684596+0100 front 2021-12-10T12:51:57.082386+0100 (oldest deadline 2021-12-10T12:52:22.982259+0100)
Das gint einige Zeit so dahin.
Danach kam:
Dec 10 12:52:48 c01-n03 kernel: ------------[ cut here ]------------
Dec 10 12:52:48 c01-n03 kernel: libceph: osd8 down
Dec 10 12:52:48 c01-n03 kernel: watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [fn_anonymous:3914]
Dec 10 12:52:48 c01-n03 kernel: Modules linked in: rbd libceph veth md4 cmac nls_utf8 cifs libarc4 fscache netfs libdes ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp ip6_udp_tunnel udp_tunnel iptable_filter bpfilter bonding softdog nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm ast drm_vram_helper irqbypass drm_ttm_helper crct10dif_pclmul ttm ghash_clmulni_intel drm_kms_helper aesni_intel cec crypto_simd wmi_bmof cryptd rc_core efi_pstore rapl rndis_host pcspkr i2c_algo_bit cdc_ether fb_sys_fops syscopyarea usbnet sysfillrect acpi_ipmi joydev input_leds mii sysimgblt ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq
Dec 10 12:52:48 c01-n03 kernel: libcrc32c mlx5_ib ib_uverbs ib_core hid_generic usbmouse usbhid hid mlx5_core psample ixgbe mlxfw xhci_pci crc32_pclmul xfrm_algo tls ahci xhci_pci_renesas nvme dca pci_hyperv_intf nvme_core mdio bnxt_en libahci xhci_hcd i2c_piix4 wmi
Dec 10 12:52:48 c01-n03 kernel: CPU: 28 PID: 3914 Comm: fn_anonymous Tainted: P O 5.13.19-1-pve #1
Dec 10 12:52:48 c01-n03 kernel: Hardware name: Supermicro AS -1114S-WN10RT/H12SSW-NTR, BIOS 2.3 09/24/2021
Dec 10 12:52:48 c01-n03 kernel: RIP: 0010:smp_call_function_single+0x97/0x120
Dec 10 12:52:48 c01-n03 kernel: Code: a9 00 01 ff 00 0f 85 9a 00 00 00 85 c9 75 48 48 c7 c6 40 de 02 00 65 48 03 35 65 62 cb 72 8b 46 08 a8 01 74 09 f3 90 8b 46 08 <a8> 01 75 f7 83 4e 08 01 4c 89 46 10 48 89 56 18 e8 84 fe ff ff 41
Dec 10 12:52:48 c01-n03 kernel: RSP: 0018:ffffaa70a6977ba0 EFLAGS: 00000202
Dec 10 12:52:48 c01-n03 kernel: RAX: 0000000000000001 RBX: 000000068e87a717 RCX: 0000000000000000
Dec 10 12:52:48 c01-n03 kernel: RDX: 0000000000000000 RSI: ffff936c4eb2de40 RDI: 0000000000000001
Dec 10 12:52:48 c01-n03 kernel: RBP: ffffaa70a6977be0 R08: ffffffff8d24d3d0 R09: 0000000000000000
Dec 10 12:52:48 c01-n03 kernel: R10: 0000000000000001 R11: 0000000000000007 R12: 0000000000000001
Dec 10 12:52:48 c01-n03 kernel: R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
Dec 10 12:52:48 c01-n03 kernel: FS: 00007face5035700(0000) GS:ffff936c4eb00000(0000) knlGS:0000000000000000
Dec 10 12:52:48 c01-n03 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 10 12:52:48 c01-n03 kernel: CR2: 000055b58ec98000 CR3: 0000000234bf6003 CR4: 0000000000770ee0
Dec 10 12:52:48 c01-n03 kernel: PKRU: 55555554
Dec 10 12:52:48 c01-n03 kernel: Call Trace:
Dec 10 12:52:48 c01-n03 kernel: aperfmperf_snapshot_cpu+0x5b/0x70
Dec 10 12:52:48 c01-n03 kernel: arch_freq_prepare_all+0x77/0xc0
Dec 10 12:52:48 c01-n03 kernel: ? proc_reg_write+0x90/0x90
Dec 10 12:52:48 c01-n03 kernel: cpuinfo_open+0x13/0x30
Dec 10 12:52:48 c01-n03 kernel: proc_reg_open+0x3b/0x150
Dec 10 12:52:48 c01-n03 kernel: ? proc_reg_write+0x90/0x90
Dec 10 12:52:48 c01-n03 kernel: do_dentry_open+0x156/0x370
Dec 10 12:52:48 c01-n03 kernel: vfs_open+0x2d/0x30
Dec 10 12:52:48 c01-n03 kernel: path_openat+0xb6e/0x1150
Dec 10 12:52:48 c01-n03 kernel: ? __mod_memcg_state.part.0+0x2a/0x30
Dec 10 12:52:48 c01-n03 kernel: ? __mod_memcg_lruvec_state+0x27/0xf0
Dec 10 12:52:48 c01-n03 kernel: ? close_pdeo+0xf6/0x110
Dec 10 12:52:48 c01-n03 kernel: do_filp_open+0xa2/0x150
Dec 10 12:52:48 c01-n03 kernel: ? __check_object_size+0x13f/0x150
Dec 10 12:52:48 c01-n03 kernel: do_sys_openat2+0x9b/0x150
Dec 10 12:52:48 c01-n03 kernel: __x64_sys_openat+0x56/0x90
Dec 10 12:52:48 c01-n03 kernel: do_syscall_64+0x61/0xb0
Dec 10 12:52:48 c01-n03 kernel: ? do_syscall_64+0x6e/0xb0
Dec 10 12:52:48 c01-n03 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Dec 10 12:52:48 c01-n03 kernel: RIP: 0033:0x7facecd02c64
Dec 10 12:52:48 c01-n03 kernel: Code: 84 00 00 00 00 00 44 89 54 24 0c e8 36 61 f9 ff 44 8b 54 24 0c 44 89 e2 48 89 ee 41 89 c0 bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 34 44 89 c7 89 44 24 0c e8 68 61 f9 ff 8b 44
Dec 10 12:52:48 c01-n03 kernel: RSP: 002b:00007face5031bb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000101
Dec 10 12:52:48 c01-n03 kernel: RAX: ffffffffffffffda RBX: 000055b58ecb03c0 RCX: 00007facecd02c64
Dec 10 12:52:48 c01-n03 kernel: RDX: 0000000000000000 RSI: 000055b562b01632 RDI: 00000000ffffff9c
Dec 10 12:52:48 c01-n03 kernel: RBP: 000055b562b01632 R08: 0000000000000000 R09: 0000000000000001
Dec 10 12:52:48 c01-n03 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Dec 10 12:52:48 c01-n03 kernel: R13: 000055b58ecb03c0 R14: 0000000000000001 R15: 000055b56d91d460
pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-1-pve: 5.13.19-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph: 16.2.6-pve2
ceph-fuse: 16.2.6-pve2
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
root@c01-n03:~#