Proxmox cluster configuring with shared storage on directory, which is mounted through multipath (SAN - multipathd - block device (sdb) - gfs2 - /mnt/storage)
When we opened VM Disks of storage, gfs2 crashing and server freezes.
Additionally, we use DLM with gfs2.
Versions of soft:
gfs2-utils 3.3.0-2
dlm-controld 4.1.0-1
multipath-tools 0.8.5-2+deb11u1
proxmox-ve 7.2-1
Why its happened and how we could solve this problem?
Thanks to all.
When we opened VM Disks of storage, gfs2 crashing and server freezes.
Code:
Jun 16 10:40:43 srv09 kernel: [2465185.770466] gfs2: fsid=Prodata-cls01:ssd2.2: G: s:SH n:2/e84a0b8d f:Iqob t:SH d:EX/0 a:0 v:0 r:4 m:50 p:3
Jun 16 10:40:43 srv09 kernel: [2465185.771337] gfs2: fsid=Prodata-cls01:ssd2.2: H: s:SH f:HD e:0 p:1908636 [qemu-img] gfs2_file_read_iter+0x105/0x400 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.772146] gfs2: fsid=Prodata-cls01:ssd2.2: I: n:12/3897166733 t:8 f:0x00 d:0x00000000 s:11144855552 p:21
Jun 16 10:40:43 srv09 kernel: [2465185.772903] ------------[ cut here ]------------
Jun 16 10:40:43 srv09 kernel: [2465185.772905] kernel BUG at fs/gfs2/glock.c:329!
Jun 16 10:40:43 srv09 kernel: [2465185.773658] invalid opcode: 0000 [#1] SMP PTI
Jun 16 10:40:43 srv09 kernel: [2465185.774382] CPU: 0 PID: 121290 Comm: kworker/0:1H Tainted: P O 5.15.85-1-pve #1
Jun 16 10:40:43 srv09 kernel: [2465185.775105] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 10/16/2020
Jun 16 10:40:43 srv09 kernel: [2465185.775924] Workqueue: dlm_callback dlm_callback_work [dlm]
Jun 16 10:40:43 srv09 kernel: [2465185.776687] RIP: 0010:demote_incompat_holders+0xe2/0x100 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.777467] Code: af 66 83 7b 22 01 75 a8 41 f6 45 20 20 74 a1 f6 43 20 20 0f 85 54 ff ff ff eb 95 ba 01 00 00 00 4c 89 f6 31 ff e8 de e5 ff ff <0f> 0b 83 ea 02 66 83 fa 01 0f 86 34 ff ff ff e9 72 ff ff ff 66 2e
Jun 16 10:40:43 srv09 kernel: [2465185.778992] RSP: 0018:ffffb22d33407cd0 EFLAGS: 00010246
Jun 16 10:40:43 srv09 kernel: [2465185.779772] RAX: 0000000000000000 RBX: ffffb22d59613da0 RCX: ffff934dbf620588
Jun 16 10:40:43 srv09 kernel: [2465185.780667] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff934dbf620580
Jun 16 10:40:43 srv09 kernel: [2465185.781539] RBP: ffffb22d33407cf0 R08: 0000000000000003 R09: 0000000000000001
Jun 16 10:40:43 srv09 kernel: [2465185.782307] R10: ffff92efddd008c0 R11: ffffffffc09b80c0 R12: ffff934f6b49c498
Jun 16 10:40:43 srv09 kernel: [2465185.783089] R13: ffffb22d33407d08 R14: ffff934f6b49c458 R15: 0000000000000032
Jun 16 10:40:43 srv09 kernel: [2465185.783919] FS: 0000000000000000(0000) GS:ffff934dbf600000(0000) knlGS:0000000000000000
Jun 16 10:40:43 srv09 kernel: [2465185.784768] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 16 10:40:43 srv09 kernel: [2465185.785571] CR2: 000055e1878b15d8 CR3: 0000007d3f210001 CR4: 00000000003726f0
Jun 16 10:40:43 srv09 kernel: [2465185.786325] Call Trace:
Jun 16 10:40:43 srv09 kernel: [2465185.787090] <TASK>
Jun 16 10:40:43 srv09 kernel: [2465185.787808] gfs2_glock_cb+0xcc/0x170 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.788566] ? sync_wait_cb+0x20/0x20 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.789320] gdlm_bast+0x1a/0x50 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.790034] dlm_callback_work+0x138/0x2c0 [dlm]
Jun 16 10:40:43 srv09 kernel: [2465185.790753] ? gdlm_put_lock+0x160/0x160 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.791481] process_one_work+0x22b/0x3d0
Jun 16 10:40:43 srv09 kernel: [2465185.792137] worker_thread+0x53/0x420
Jun 16 10:40:43 srv09 kernel: [2465185.792808] ? process_one_work+0x3d0/0x3d0
Jun 16 10:40:43 srv09 kernel: [2465185.793525] kthread+0x12a/0x150
Jun 16 10:40:43 srv09 kernel: [2465185.794155] ? set_kthread_struct+0x50/0x50
Jun 16 10:40:43 srv09 kernel: [2465185.794790] ret_from_fork+0x22/0x30
Jun 16 10:40:43 srv09 kernel: [2465185.795400] </TASK>
Jun 16 10:40:43 srv09 kernel: [2465185.796033] Modules linked in: udp_diag tcp_diag inet_diag ebt_arp xt_mac act_police cls_basic sch_ingress sch_htb gfs2 ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_tcpudp xt_multiport xt_conntrack iptable_filter bpfilter ip_set_hash_net ip_set dlm sctp ip6_udp_tunnel udp_tunnel nf_tables nfnetlink_cttimeout bonding tls openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c softdog nfnetlink_log nfnetlink dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intelaesni_intel crypto_simd cryptd mgag200 rapl drm_kms_helper cec rc_core i2c_algo_bit intel_cstate serio_raw pcspkr fb_sys_fops syscopyarea sysfillrect sysimgblt joydev input_leds zfs(PO) hpilo
Jun 16 10:40:43 srv09 kernel: [2465185.796093] ioatdma dca zunicode(PO) zzstd(O) zlua(O) acpi_ipmi zavl(PO) ipmi_si ipmi_devintf icp(PO) ipmi_msghandler acpi_tad mac_hid acpi_power_meter zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 simplefb uas usb_storage hid_generic usbkbd usbmouse usbhid hid crc32_pclmul psmouse lpfc nvmet_fc nvmet xhci_pci nvme_fc i2c_i801 xhci_pci_renesas nvme_fabrics uhci_hcd ehci_pci lpc_ich i2c_smbus tg3 xhci_hcd nvme_core ehci_hcd scsi_transport_fc hpsa scsi_transport_sas wmi
Jun 16 10:40:43 srv09 kernel: [2465185.805598] ---[ end trace 64d53356f2819600 ]---
Jun 16 10:40:43 srv09 kernel: [2465185.816566] RIP: 0010:demote_incompat_holders+0xe2/0x100 [gfs2]
Jun 16 10:40:43 srv09 kernel: [2465185.817473] Code: af 66 83 7b 22 01 75 a8 41 f6 45 20 20 74 a1 f6 43 20 20 0f 85 54 ff ff ff eb 95 ba 01 00 00 00 4c 89 f6 31 ff e8 de e5 ff ff <0f> 0b 83 ea 02 66 83 fa 01 0f 86 34 ff ff ff e9 72 ff ff ff 66 2e
Jun 16 10:40:43 srv09 kernel: [2465185.819174] RSP: 0018:ffffb22d33407cd0 EFLAGS: 00010246
Jun 16 10:40:43 srv09 kernel: [2465185.820054] RAX: 0000000000000000 RBX: ffffb22d59613da0 RCX: ffff934dbf620588
Jun 16 10:40:43 srv09 kernel: [2465185.820834] RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff934dbf620580
Jun 16 10:40:43 srv09 kernel: [2465185.821630] RBP: ffffb22d33407cf0 R08: 0000000000000003 R09: 0000000000000001
Jun 16 10:40:43 srv09 kernel: [2465185.822463] R10: ffff92efddd008c0 R11: ffffffffc09b80c0 R12: ffff934f6b49c498
Jun 16 10:40:43 srv09 kernel: [2465185.823295] R13: ffffb22d33407d08 R14: ffff934f6b49c458 R15: 0000000000000032
Jun 16 10:40:43 srv09 kernel: [2465185.824154] FS: 0000000000000000(0000) GS:ffff934dbf600000(0000) knlGS:0000000000000000
Jun 16 10:40:43 srv09 kernel: [2465185.825015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 16 10:40:43 srv09 kernel: [2465185.825768] CR2: 000055e1878b15d8 CR3: 0000007d3f210001 CR4: 00000000003726f0
Additionally, we use DLM with gfs2.
Versions of soft:
gfs2-utils 3.3.0-2
dlm-controld 4.1.0-1
multipath-tools 0.8.5-2+deb11u1
proxmox-ve 7.2-1
Why its happened and how we could solve this problem?
Thanks to all.